Add pg_search to the compute images

pg_search is an extension by ParadeDB offering text search capabilities. Link: https://paradedb.com Signed-off-by: Tristan Partin <tristan@neon.tech>
Remove unused compute_ctl HTTP routes (#10544 )
2026-05-27 01:50:38 +00:00 · 2025-01-29 15:48:16 -06:00 · 2025-01-29 19:22:01 +00:00 · 2025-01-29 18:43:39 +00:00 · 2025-01-29 18:09:25 +00:00 · 2025-01-29 17:08:25 +00:00
90 changed files with 1804 additions and 1012 deletions
--- a/.github/actionlint.yml
+++ b/.github/actionlint.yml
@@ -4,6 +4,7 @@ self-hosted-runner:
    - large
    - large-arm64
    - small
+    - small-metal
    - small-arm64
    - us-east-2
 config-variables:
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -242,7 +242,7 @@ jobs:
      statuses: write
      contents: write
      pull-requests: write
-    runs-on: [ self-hosted, small ]
+    runs-on: [ self-hosted, small-metal ]
    container:
      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
      credentials:
@@ -636,6 +636,8 @@ jobs:
          file: compute/compute-node.Dockerfile
          cache-from: type=registry,ref=cache.neon.build/compute-node-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/compute-node-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
+          secrets: |
+            PG_SEARCH_ENTERPRISE_GITHUB_PAT=${{ secrets.PG_SEARCH_ENTERPRISE_GITHUB_PAT }}
          tags: |
            neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-${{ matrix.arch }}

@@ -786,6 +788,17 @@ jobs:
          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

+      - name: Get the last compute release tag
+        id: get-last-compute-release-tag
+        env:
+          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
+        run: |
+          tag=$(gh api -q '[.[].tag_name | select(startswith("release-compute"))][0]'\
+            -H "Accept: application/vnd.github+json" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            "/repos/${{ github.repository }}/releases")
+          echo tag=${tag} >> ${GITHUB_OUTPUT}
+
      # `neondatabase/neon` contains multiple binaries, all of them use the same input for the version into the same version formatting library.
      # Pick pageserver as currently the only binary with extra "version" features printed in the string to verify.
      # Regular pageserver version string looks like
@@ -817,6 +830,20 @@ jobs:
          TEST_VERSION_ONLY: ${{ matrix.pg_version }}
        run: ./docker-compose/docker_compose_test.sh

+      - name: Print logs and clean up docker-compose test
+        if: always()
+        run: |
+          docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml logs || true
+          docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml down
+
+      - name: Test extension upgrade
+        timeout-minutes: 20
+        if: ${{ needs.tag.outputs.build-tag == github.run_id }}
+        env:
+          NEWTAG: ${{ needs.tag.outputs.build-tag }}
+          OLDTAG: ${{ steps.get-last-compute-release-tag.outputs.tag }}
+        run: ./docker-compose/test_extensions_upgrade.sh
+
      - name: Print logs and clean up
        if: always()
        run: |
@@ -1050,6 +1077,7 @@ jobs:
          retries: 5
          script: |
            const tag = "${{ needs.tag.outputs.build-tag }}";
+            const branch = "${{ github.ref_name }}";

            try {
              const existingRef = await github.rest.git.getRef({
@@ -1092,12 +1120,48 @@ jobs:
              }

              console.log(`Release for tag ${tag} does not exist. Creating it...`);
+
+              // Find the PR number using the commit SHA
+              const pullRequests = await github.rest.pulls.list({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                state: 'closed',
+                base: branch,
+              });
+
+              const pr = pullRequests.data.find(pr => pr.merge_commit_sha === context.sha);
+              const prNumber = pr ? pr.number : null;
+
+              // Find the previous release on the branch
+              const releases = await github.rest.repos.listReleases({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                per_page: 100,
+              });
+
+              const branchReleases = releases.data
+                .filter((release) => {
+                  const regex = new RegExp(`^${branch}-\\d+$`);
+                  return regex.test(release.tag_name) && !release.draft && !release.prerelease;
+                })
+                .sort((a, b) => new Date(b.created_at) - new Date(a.created_at));
+
+              const previousTag = branchReleases.length > 0 ? branchReleases[0].tag_name : null;
+
+              const releaseNotes = [
+                prNumber
+                  ? `Release PR https://github.com/${context.repo.owner}/${context.repo.repo}/pull/${prNumber}.`
+                  : 'Release PR not found.',
+                previousTag
+                  ? `Diff with the previous release https://github.com/${context.repo.owner}/${context.repo.repo}/compare/${previousTag}...${tag}.`
+                  : `No previous release found on branch ${branch}.`,
+              ].join('\n\n');
+
              await github.rest.repos.createRelease({
                owner: context.repo.owner,
                repo: context.repo.repo,
                tag_name: tag,
-                // TODO: Automate release notes properly
-                generate_release_notes: false,
+                body: releaseNotes,
              });
              console.log(`Release for tag ${tag} created successfully.`);
            }
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1312,7 +1312,7 @@ dependencies = [
 "tar",
 "thiserror 1.0.69",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-stream",
 "tokio-util",
 "tower 0.5.2",
@@ -1421,7 +1421,7 @@ dependencies = [
 "storage_broker",
 "thiserror 1.0.69",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-util",
 "toml",
 "toml_edit",
@@ -4060,8 +4060,8 @@ dependencies = [
 "pageserver_compaction",
 "pin-project-lite",
 "postgres",
- "postgres-protocol 0.6.4",
- "postgres-types 0.2.4",
+ "postgres-protocol 0.6.6",
+ "postgres-types 0.2.6",
 "postgres_backend",
 "postgres_connection",
 "postgres_ffi",
@@ -4092,7 +4092,7 @@ dependencies = [
 "tokio",
 "tokio-epoll-uring",
 "tokio-io-timeout",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-stream",
 "tokio-tar",
 "tokio-util",
@@ -4150,7 +4150,7 @@ dependencies = [
 "serde",
 "thiserror 1.0.69",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-stream",
 "tokio-util",
 "utils",
@@ -4448,23 +4448,23 @@ dependencies = [

 [[package]]
 name = "postgres"
-version = "0.19.4"
-source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#511f998c00148ab7c847bd7e6cfd3a906d0e7473"
+version = "0.19.6"
+source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#8b44892f7851e705810b2cb54504325699966070"
 dependencies = [
 "bytes",
 "fallible-iterator",
 "futures-util",
 "log",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 ]

 [[package]]
 name = "postgres-protocol"
-version = "0.6.4"
-source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#511f998c00148ab7c847bd7e6cfd3a906d0e7473"
+version = "0.6.6"
+source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#8b44892f7851e705810b2cb54504325699966070"
 dependencies = [
- "base64 0.20.0",
+ "base64 0.21.1",
 "byteorder",
 "bytes",
 "fallible-iterator",
@@ -4513,12 +4513,13 @@ dependencies = [

 [[package]]
 name = "postgres-types"
-version = "0.2.4"
-source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#511f998c00148ab7c847bd7e6cfd3a906d0e7473"
+version = "0.2.6"
+source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#8b44892f7851e705810b2cb54504325699966070"
 dependencies = [
 "bytes",
+ "chrono",
 "fallible-iterator",
- "postgres-protocol 0.6.4",
+ "postgres-protocol 0.6.6",
 ]

 [[package]]
@@ -4554,7 +4555,7 @@ dependencies = [
 "serde",
 "thiserror 1.0.69",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-postgres-rustls",
 "tokio-rustls 0.26.0",
 "tokio-util",
@@ -4569,7 +4570,7 @@ dependencies = [
 "itertools 0.10.5",
 "once_cell",
 "postgres",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "url",
 ]

@@ -4663,7 +4664,7 @@ dependencies = [
 "byteorder",
 "bytes",
 "itertools 0.10.5",
- "postgres-protocol 0.6.4",
+ "postgres-protocol 0.6.6",
 "rand 0.8.5",
 "serde",
 "thiserror 1.0.69",
@@ -4911,7 +4912,7 @@ dependencies = [
 "tikv-jemalloc-ctl",
 "tikv-jemallocator",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-postgres2",
 "tokio-rustls 0.26.0",
 "tokio-tungstenite 0.21.0",
@@ -5699,7 +5700,7 @@ dependencies = [
 "pageserver_api",
 "parking_lot 0.12.1",
 "postgres",
- "postgres-protocol 0.6.4",
+ "postgres-protocol 0.6.6",
 "postgres_backend",
 "postgres_ffi",
 "pprof",
@@ -5723,7 +5724,7 @@ dependencies = [
 "tikv-jemallocator",
 "tokio",
 "tokio-io-timeout",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-stream",
 "tokio-tar",
 "tokio-util",
@@ -6393,7 +6394,7 @@ dependencies = [
 "serde_json",
 "storage_controller_client",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-postgres-rustls",
 "tokio-stream",
 "tokio-util",
@@ -6858,8 +6859,8 @@ dependencies = [

 [[package]]
 name = "tokio-postgres"
-version = "0.7.7"
-source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#511f998c00148ab7c847bd7e6cfd3a906d0e7473"
+version = "0.7.9"
+source = "git+https://github.com/neondatabase/rust-postgres.git?branch=neon#8b44892f7851e705810b2cb54504325699966070"
 dependencies = [
 "async-trait",
 "byteorder",
@@ -6872,11 +6873,13 @@ dependencies = [
 "percent-encoding",
 "phf",
 "pin-project-lite",
- "postgres-protocol 0.6.4",
- "postgres-types 0.2.4",
+ "postgres-protocol 0.6.6",
+ "postgres-types 0.2.6",
+ "rand 0.8.5",
 "socket2",
 "tokio",
 "tokio-util",
+ "whoami",
 ]

 [[package]]
@@ -6914,7 +6917,7 @@ dependencies = [
 "ring",
 "rustls 0.23.18",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-rustls 0.26.0",
 "x509-certificate",
 ]
@@ -6935,6 +6938,7 @@ dependencies = [
 "pin-project-lite",
 "postgres-protocol2",
 "postgres-types2",
+ "serde",
 "tokio",
 "tokio-util",
 ]
@@ -7591,7 +7595,7 @@ dependencies = [
 "serde_json",
 "sysinfo",
 "tokio",
- "tokio-postgres 0.7.7",
+ "tokio-postgres 0.7.9",
 "tokio-util",
 "tracing",
 "tracing-subscriber",
--- a/compute/compute-node.Dockerfile
+++ b/compute/compute-node.Dockerfile
@@ -5,6 +5,7 @@ ARG TAG=pinned
 ARG BUILD_TAG
 ARG DEBIAN_VERSION=bookworm
 ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
+ARG ALPINE_CURL_VERSION=8.11.1

 #########################################################################################
 #
@@ -882,6 +883,39 @@ RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux

 USER root

+#########################################################################################
+#
+# Layer "rust extensions pgrx12.7"
+#
+# Essentially, this layer is the same as above, but instead of pgrx 0.12.6, it
+# uses 0.12.7. 0.12.7, specifically, is necessary for building pg_search from
+# ParadeDB, according to the ParadeDB team. Eventually, we can remove this layer
+# when ParadeDB gets various pgrx changes upstreamed.
+#
+#########################################################################################
+FROM build-deps AS rust-extensions-build-pgrx12_7
+ARG PG_VERSION
+COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
+
+RUN apt update && \
+    apt install --no-install-recommends --no-install-suggests -y curl libclang-dev && \
+    apt clean && rm -rf /var/lib/apt/lists/* && \
+    useradd -ms /bin/bash nonroot -b /home
+
+ENV HOME=/home/nonroot
+ENV PATH="/home/nonroot/.cargo/bin:/usr/local/pgsql/bin/:$PATH"
+USER nonroot
+WORKDIR /home/nonroot
+
+RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && \
+    chmod +x rustup-init && \
+    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain stable && \
+    rm rustup-init && \
+    cargo install --locked --version 0.12.7 cargo-pgrx && \
+    /bin/bash -c 'cargo pgrx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'
+
+USER root
+
 #########################################################################################
 #
 # Layers "pg-onnx-build" and "pgrag-pg-build"
@@ -1154,6 +1188,24 @@ RUN wget https://github.com/reorg/pg_repack/archive/refs/tags/ver_1.5.2.tar.gz -
    make -j $(getconf _NPROCESSORS_ONLN) && \
    make -j $(getconf _NPROCESSORS_ONLN) install

+#########################################################################################
+#
+# Layer "pg_search-build"
+# compile "pg_search" extension
+#
+#########################################################################################
+
+FROM rust-extensions-build-pgrx12_7 AS pg-search-build
+ARG PG_VERSION
+ARG PARADEDB_TAG=v0.14.1
+
+RUN --mount=type=secret,id=PG_SEARCH_ENTERPRISE_GITHUB_PAT \
+    git clone --recurse-submodules --depth 1 --branch ${PARADEDB_TAG} https://$(cat /run/secrets/PG_SEARCH_ENTERPRISE_GITHUB_PAT)@github.com/paradedb/paradedb-enterprise.git pg_search-src && \
+    cd pg_search-src && \
+    cargo pgrx install --package pg_search --no-default-features --features unsafe-postgres --release && \
+    sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_search.control && \
+    echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_search.control
+
 #########################################################################################
 #
 # Layer "neon-pg-ext-build"
@@ -1201,6 +1253,7 @@ COPY --from=pg-ivm-build /usr/local/pgsql/ /usr/local/pgsql/
 COPY --from=pg-partman-build /usr/local/pgsql/ /usr/local/pgsql/
 COPY --from=pg-mooncake-build /usr/local/pgsql/ /usr/local/pgsql/
 COPY --from=pg-repack-build /usr/local/pgsql/ /usr/local/pgsql/
+COPY --from=pg-search-build /usr/local/pgsql/ /usr/local/pgsql/
 COPY pgxn/ pgxn/

 RUN make -j $(getconf _NPROCESSORS_ONLN) \
@@ -1266,16 +1319,31 @@ RUN set -e \

 #########################################################################################
 #
-# Layers "postgres-exporter", "pgbouncer-exporter", and "sql-exporter"
+# Layer "exporters"
 #
 #########################################################################################
-
-FROM quay.io/prometheuscommunity/postgres-exporter:v0.16.0 AS postgres-exporter
-FROM quay.io/prometheuscommunity/pgbouncer-exporter:v0.10.2 AS pgbouncer-exporter
-
-# Keep the version the same as in build-tools.Dockerfile and
-# test_runner/regress/test_compute_metrics.py.
-FROM burningalchemist/sql_exporter:0.17.0 AS sql-exporter
+FROM alpine/curl:${ALPINE_CURL_VERSION} AS exporters
+ARG TARGETARCH
+# Keep sql_exporter version same as in build-tools.Dockerfile and
+# test_runner/regress/test_compute_metrics.py
+RUN if [ "$TARGETARCH" = "amd64" ]; then\
+        postgres_exporter_sha256='027e75dda7af621237ff8f5ac66b78a40b0093595f06768612b92b1374bd3105';\
+        pgbouncer_exporter_sha256='c9f7cf8dcff44f0472057e9bf52613d93f3ffbc381ad7547a959daa63c5e84ac';\
+        sql_exporter_sha256='38e439732bbf6e28ca4a94d7bc3686d3fa1abdb0050773d5617a9efdb9e64d08';\
+    else\
+        postgres_exporter_sha256='131a376d25778ff9701a4c81f703f179e0b58db5c2c496e66fa43f8179484786';\
+        pgbouncer_exporter_sha256='217c4afd7e6492ae904055bc14fe603552cf9bac458c063407e991d68c519da3';\
+        sql_exporter_sha256='11918b00be6e2c3a67564adfdb2414fdcbb15a5db76ea17d1d1a944237a893c6';\
+    fi\
+    && curl -sL https://github.com/prometheus-community/postgres_exporter/releases/download/v0.16.0/postgres_exporter-0.16.0.linux-${TARGETARCH}.tar.gz\
+     | tar xzf - --strip-components=1 -C.\
+    && curl -sL https://github.com/prometheus-community/pgbouncer_exporter/releases/download/v0.10.2/pgbouncer_exporter-0.10.2.linux-${TARGETARCH}.tar.gz\
+     | tar xzf - --strip-components=1 -C.\
+    && curl -sL https://github.com/burningalchemist/sql_exporter/releases/download/0.17.0/sql_exporter-0.17.0.linux-${TARGETARCH}.tar.gz\
+     | tar xzf - --strip-components=1 -C.\
+    && echo "${postgres_exporter_sha256} postgres_exporter" | sha256sum -c -\
+    && echo "${pgbouncer_exporter_sha256} pgbouncer_exporter" | sha256sum -c -\
+    && echo "${sql_exporter_sha256} sql_exporter" | sha256sum -c -

 #########################################################################################
 #
@@ -1330,7 +1398,8 @@ COPY --from=vector-pg-build /pgvector.patch /ext-src/
 COPY --from=pgjwt-pg-build /pgjwt.tar.gz /ext-src
 #COPY --from=pgrag-pg-build /usr/local/pgsql/ /usr/local/pgsql/
 #COPY --from=pg-jsonschema-pg-build /home/nonroot/pg_jsonschema.tar.gz /ext-src
-#COPY --from=pg-graphql-pg-build /home/nonroot/pg_graphql.tar.gz /ext-src
+COPY --from=pg-graphql-pg-build /home/nonroot/pg_graphql.tar.gz /ext-src
+COPY compute/patches/pg_graphql.patch /ext-src
 #COPY --from=pg-tiktoken-pg-build /home/nonroot/pg_tiktoken.tar.gz /ext-src
 COPY --from=hypopg-pg-build /hypopg.tar.gz /ext-src
 COPY --from=pg-hashids-pg-build /pg_hashids.tar.gz /ext-src
@@ -1364,6 +1433,7 @@ RUN cd /ext-src/pgvector-src && patch -p1 <../pgvector.patch
 RUN cd /ext-src/pg_hint_plan-src && patch -p1 < /ext-src/pg_hint_plan_${PG_VERSION}.patch
 COPY --chmod=755 docker-compose/run-tests.sh /run-tests.sh
 RUN patch -p1 </ext-src/pg_cron.patch
+RUN cd /ext-src/pg_graphql-src && patch -p1 </ext-src/pg_graphql.patch
 ENV PATH=/usr/local/pgsql/bin:$PATH
 ENV PGHOST=compute
 ENV PGPORT=55433
@@ -1401,10 +1471,10 @@ COPY --chmod=0666 --chown=postgres compute/etc/pgbouncer.ini /etc/pgbouncer.ini
 COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/local_proxy /usr/local/bin/local_proxy
 RUN mkdir -p /etc/local_proxy && chown postgres:postgres /etc/local_proxy

-# Metrics exporter binaries and  configuration files
-COPY --from=postgres-exporter /bin/postgres_exporter /bin/postgres_exporter
-COPY --from=pgbouncer-exporter /bin/pgbouncer_exporter /bin/pgbouncer_exporter
-COPY --from=sql-exporter      /bin/sql_exporter      /bin/sql_exporter
+# Metrics exporter binaries and configuration files
+COPY --from=exporters ./postgres_exporter /bin/postgres_exporter
+COPY --from=exporters ./pgbouncer_exporter /bin/pgbouncer_exporter
+COPY --from=exporters ./sql_exporter /bin/sql_exporter

 COPY --chown=postgres compute/etc/postgres_exporter.yml /etc/postgres_exporter.yml

--- a/compute/patches/pg_graphql.patch
+++ b/compute/patches/pg_graphql.patch
@@ -0,0 +1,19 @@
+commit ec6a491d126882966a696f9ad5d3698935361d55
+Author: Alexey Masterov <alexeymasterov@neon.tech>
+Date:   Tue Dec 17 10:25:00 2024 +0100
+
+    Changes required to run tests on Neon
+
+diff --git a/test/expected/permissions_functions.out b/test/expected/permissions_functions.out
+index 1e9fbc2..94cbe25 100644
+--- a/test/expected/permissions_functions.out
+++ b/test/expected/permissions_functions.out
+@@ -64,7 +64,7 @@ begin;
+     select current_user;
+  current_user 
+ --------------
+- postgres
+ cloud_admin
+ (1 row)
+ 
+     -- revoke default access from the public role for new functions
--- a/compute_tools/src/extension_server.rs
+++ b/compute_tools/src/extension_server.rs
@@ -85,6 +85,8 @@ use tracing::info;
 use tracing::log::warn;
 use zstd::stream::read::Decoder;

+use crate::metrics::{REMOTE_EXT_REQUESTS_TOTAL, UNKNOWN_HTTP_STATUS};
+
 fn get_pg_config(argument: &str, pgbin: &str) -> String {
    // gives the result of `pg_config [argument]`
    // where argument is a flag like `--version` or `--sharedir`
@@ -258,21 +260,58 @@ async fn download_extension_tar(ext_remote_storage: &str, ext_path: &str) -> Res

    info!("Download extension {:?} from uri {:?}", ext_path, uri);

-    let resp = reqwest::get(uri).await?;
+    match do_extension_server_request(&uri).await {
+        Ok(resp) => {
+            info!(
+                "Successfully downloaded remote extension data {:?}",
+                ext_path
+            );
+            REMOTE_EXT_REQUESTS_TOTAL
+                .with_label_values(&[&StatusCode::OK.to_string()])
+                .inc();
+            Ok(resp)
+        }
+        Err((msg, status)) => {
+            REMOTE_EXT_REQUESTS_TOTAL
+                .with_label_values(&[&status])
+                .inc();
+            bail!(msg);
+        }
+    }
+}

-    match resp.status() {
+// Do a single remote extensions server request.
+// Return result or (error message + stringified status code) in case of any failures.
+async fn do_extension_server_request(uri: &str) -> Result<Bytes, (String, String)> {
+    let resp = reqwest::get(uri).await.map_err(|e| {
+        (
+            format!("could not perform remote extensions server request: {}", e),
+            UNKNOWN_HTTP_STATUS.to_string(),
+        )
+    })?;
+    let status = resp.status();
+
+    match status {
        StatusCode::OK => match resp.bytes().await {
-            Ok(resp) => {
-                info!("Download extension {:?} completed successfully", ext_path);
-                Ok(resp)
-            }
-            Err(e) => bail!("could not deserialize remote extension response: {}", e),
+            Ok(resp) => Ok(resp),
+            Err(e) => Err((
+                format!("could not read remote extensions server response: {}", e),
+                // It's fine to return and report error with status as 200 OK,
+                // because we still failed to read the response.
+                status.to_string(),
+            )),
        },
-        StatusCode::SERVICE_UNAVAILABLE => bail!("remote extension is temporarily unavailable"),
-        _ => bail!(
-            "unexpected remote extension response status code: {}",
-            resp.status()
-        ),
+        StatusCode::SERVICE_UNAVAILABLE => Err((
+            "remote extensions server is temporarily unavailable".to_string(),
+            status.to_string(),
+        )),
+        _ => Err((
+            format!(
+                "unexpected remote extensions server response status code: {}",
+                status
+            ),
+            status.to_string(),
+        )),
    }
 }

--- a/compute_tools/src/http/openapi_spec.yaml
+++ b/compute_tools/src/http/openapi_spec.yaml
@@ -68,35 +68,6 @@ paths:
              schema:
                $ref: "#/components/schemas/ComputeInsights"

-  /installed_extensions:
-    get:
-      tags:
-      - Info
-      summary: Get installed extensions.
-      description: ""
-      operationId: getInstalledExtensions
-      responses:
-        200:
-          description: List of installed extensions
-          content:
-            application/json:
-              schema:
-                $ref: "#/components/schemas/InstalledExtensions"
-  /info:
-    get:
-      tags:
-      - Info
-      summary: Get info about the compute pod / VM.
-      description: ""
-      operationId: getInfo
-      responses:
-        200:
-          description: Info
-          content:
-            application/json:
-              schema:
-                $ref: "#/components/schemas/Info"
-
  /dbs_and_roles:
    get:
      tags:
--- a/compute_tools/src/http/routes/info.rs
+++ b/compute_tools/src/http/routes/info.rs
@@ -1,11 +0,0 @@
-use axum::response::Response;
-use compute_api::responses::InfoResponse;
-use http::StatusCode;
-
-use crate::http::JsonResponse;
-
-/// Get information about the physical characteristics about the compute.
-pub(in crate::http) async fn get_info() -> Response {
-    let num_cpus = num_cpus::get_physical();
-    JsonResponse::success(StatusCode::OK, &InfoResponse { num_cpus })
-}
--- a/compute_tools/src/http/routes/installed_extensions.rs
+++ b/compute_tools/src/http/routes/installed_extensions.rs
@@ -1,33 +0,0 @@
-use std::sync::Arc;
-
-use axum::{extract::State, response::Response};
-use compute_api::responses::ComputeStatus;
-use http::StatusCode;
-use tokio::task;
-
-use crate::{compute::ComputeNode, http::JsonResponse, installed_extensions};
-
-/// Get a list of installed extensions.
-pub(in crate::http) async fn get_installed_extensions(
-    State(compute): State<Arc<ComputeNode>>,
-) -> Response {
-    let status = compute.get_status();
-    if status != ComputeStatus::Running {
-        return JsonResponse::invalid_status(status);
-    }
-
-    let conf = compute.get_conn_conf(None);
-    let res = task::spawn_blocking(move || installed_extensions::get_installed_extensions(conf))
-        .await
-        .unwrap();
-
-    match res {
-        Ok(installed_extensions) => {
-            JsonResponse::success(StatusCode::OK, Some(installed_extensions))
-        }
-        Err(e) => JsonResponse::error(
-            StatusCode::INTERNAL_SERVER_ERROR,
-            format!("failed to get list of installed extensions: {e}"),
-        ),
-    }
-}
--- a/compute_tools/src/http/routes/metrics.rs
+++ b/compute_tools/src/http/routes/metrics.rs
@@ -2,17 +2,16 @@ use axum::{body::Body, response::Response};
 use http::header::CONTENT_TYPE;
 use http::StatusCode;
 use metrics::proto::MetricFamily;
-use metrics::Encoder;
-use metrics::TextEncoder;
+use metrics::{Encoder, TextEncoder};

-use crate::{http::JsonResponse, installed_extensions};
+use crate::{http::JsonResponse, metrics::collect};

 /// Expose Prometheus metrics.
 pub(in crate::http) async fn get_metrics() -> Response {
    // When we call TextEncoder::encode() below, it will immediately return an
    // error if a metric family has no metrics, so we need to preemptively
    // filter out metric families with no metrics.
-    let metrics = installed_extensions::collect()
+    let metrics = collect()
        .into_iter()
        .filter(|m| !m.get_metric().is_empty())
        .collect::<Vec<MetricFamily>>();
--- a/compute_tools/src/http/routes/mod.rs
+++ b/compute_tools/src/http/routes/mod.rs
@@ -10,9 +10,7 @@ pub(in crate::http) mod extension_server;
 pub(in crate::http) mod extensions;
 pub(in crate::http) mod failpoints;
 pub(in crate::http) mod grants;
-pub(in crate::http) mod info;
 pub(in crate::http) mod insights;
-pub(in crate::http) mod installed_extensions;
 pub(in crate::http) mod metrics;
 pub(in crate::http) mod metrics_json;
 pub(in crate::http) mod status;
--- a/compute_tools/src/http/server.rs
+++ b/compute_tools/src/http/server.rs
@@ -22,8 +22,7 @@ use uuid::Uuid;

 use super::routes::{
    check_writability, configure, database_schema, dbs_and_roles, extension_server, extensions,
-    grants, info as info_route, insights, installed_extensions, metrics, metrics_json, status,
-    terminate,
+    grants, insights, metrics, metrics_json, status, terminate,
 };
 use crate::compute::ComputeNode;

@@ -60,12 +59,7 @@ async fn serve(port: u16, compute: Arc<ComputeNode>) {
        )
        .route("/extensions", post(extensions::install_extension))
        .route("/grants", post(grants::add_grant))
-        .route("/info", get(info_route::get_info))
        .route("/insights", get(insights::get_insights))
-        .route(
-            "/installed_extensions",
-            get(installed_extensions::get_installed_extensions),
-        )
        .route("/metrics", get(metrics::get_metrics))
        .route("/metrics.json", get(metrics_json::get_metrics))
        .route("/status", get(status::get_status))
--- a/compute_tools/src/installed_extensions.rs
+++ b/compute_tools/src/installed_extensions.rs
@@ -1,13 +1,10 @@
 use compute_api::responses::{InstalledExtension, InstalledExtensions};
-use metrics::proto::MetricFamily;
 use std::collections::HashMap;

 use anyhow::Result;
 use postgres::{Client, NoTls};

-use metrics::core::Collector;
-use metrics::{register_uint_gauge_vec, UIntGaugeVec};
-use once_cell::sync::Lazy;
+use crate::metrics::INSTALLED_EXTENSIONS;

 /// We don't reuse get_existing_dbs() just for code clarity
 /// and to make database listing query here more explicit.
@@ -102,16 +99,3 @@ pub fn get_installed_extensions(mut conf: postgres::config::Config) -> Result<In
        extensions: extensions_map.into_values().collect(),
    })
 }
-
-static INSTALLED_EXTENSIONS: Lazy<UIntGaugeVec> = Lazy::new(|| {
-    register_uint_gauge_vec!(
-        "compute_installed_extensions",
-        "Number of databases where the version of extension is installed",
-        &["extension_name", "version", "owned_by_superuser"]
-    )
-    .expect("failed to define a metric")
-});
-
-pub fn collect() -> Vec<MetricFamily> {
-    INSTALLED_EXTENSIONS.collect()
-}
--- a/compute_tools/src/lib.rs
+++ b/compute_tools/src/lib.rs
@@ -16,6 +16,7 @@ pub mod extension_server;
 pub mod installed_extensions;
 pub mod local_proxy;
 pub mod lsn_lease;
+pub mod metrics;
 mod migration;
 pub mod monitor;
 pub mod params;
--- a/compute_tools/src/metrics.rs
+++ b/compute_tools/src/metrics.rs
@@ -0,0 +1,70 @@
+use metrics::core::Collector;
+use metrics::proto::MetricFamily;
+use metrics::{register_int_counter_vec, register_uint_gauge_vec, IntCounterVec, UIntGaugeVec};
+use once_cell::sync::Lazy;
+
+pub(crate) static INSTALLED_EXTENSIONS: Lazy<UIntGaugeVec> = Lazy::new(|| {
+    register_uint_gauge_vec!(
+        "compute_installed_extensions",
+        "Number of databases where the version of extension is installed",
+        &["extension_name", "version", "owned_by_superuser"]
+    )
+    .expect("failed to define a metric")
+});
+
+// Normally, any HTTP API request is described by METHOD (e.g. GET, POST, etc.) + PATH,
+// but for all our APIs we defined a 'slug'/method/operationId in the OpenAPI spec.
+// And it's fair to call it a 'RPC' (Remote Procedure Call).
+pub enum CPlaneRequestRPC {
+    GetSpec,
+}
+
+impl CPlaneRequestRPC {
+    pub fn as_str(&self) -> &str {
+        match self {
+            CPlaneRequestRPC::GetSpec => "GetSpec",
+        }
+    }
+}
+
+pub const UNKNOWN_HTTP_STATUS: &str = "unknown";
+
+pub(crate) static CPLANE_REQUESTS_TOTAL: Lazy<IntCounterVec> = Lazy::new(|| {
+    register_int_counter_vec!(
+        "compute_ctl_cplane_requests_total",
+        "Total number of control plane requests made by compute_ctl by status",
+        &["rpc", "http_status"]
+    )
+    .expect("failed to define a metric")
+});
+
+/// Total number of failed database migrations. Per-compute, this is actually a boolean metric,
+/// either empty or with a single value (1, migration_id) because we stop at the first failure.
+/// Yet, the sum over the fleet will provide the total number of failures.
+pub(crate) static DB_MIGRATION_FAILED: Lazy<IntCounterVec> = Lazy::new(|| {
+    register_int_counter_vec!(
+        "compute_ctl_db_migration_failed_total",
+        "Total number of failed database migrations",
+        &["migration_id"]
+    )
+    .expect("failed to define a metric")
+});
+
+pub(crate) static REMOTE_EXT_REQUESTS_TOTAL: Lazy<IntCounterVec> = Lazy::new(|| {
+    register_int_counter_vec!(
+        "compute_ctl_remote_ext_requests_total",
+        "Total number of requests made by compute_ctl to download extensions from S3 proxy by status",
+        // Do not use any labels like extension name yet.
+        // We can add them later if needed.
+        &["http_status"]
+    )
+    .expect("failed to define a metric")
+});
+
+pub fn collect() -> Vec<MetricFamily> {
+    let mut metrics = INSTALLED_EXTENSIONS.collect();
+    metrics.extend(CPLANE_REQUESTS_TOTAL.collect());
+    metrics.extend(REMOTE_EXT_REQUESTS_TOTAL.collect());
+    metrics.extend(DB_MIGRATION_FAILED.collect());
+    metrics
+}
--- a/compute_tools/src/migration.rs
+++ b/compute_tools/src/migration.rs
@@ -1,7 +1,9 @@
 use anyhow::{Context, Result};
 use fail::fail_point;
 use postgres::{Client, Transaction};
-use tracing::info;
+use tracing::{error, info};
+
+use crate::metrics::DB_MIGRATION_FAILED;

 /// Runs a series of migrations on a target database
 pub(crate) struct MigrationRunner<'m> {
@@ -78,24 +80,31 @@ impl<'m> MigrationRunner<'m> {
        Ok(())
    }

-    /// Run an individual migration
-    fn run_migration(txn: &mut Transaction, migration_id: i64, migration: &str) -> Result<()> {
+    /// Run an individual migration in a separate transaction block.
+    fn run_migration(client: &mut Client, migration_id: i64, migration: &str) -> Result<()> {
+        let mut txn = client
+            .transaction()
+            .with_context(|| format!("begin transaction for migration {migration_id}"))?;
+
        if migration.starts_with("-- SKIP") {
            info!("Skipping migration id={}", migration_id);

            // Even though we are skipping the migration, updating the
            // migration ID should help keep logic easy to understand when
            // trying to understand the state of a cluster.
-            Self::update_migration_id(txn, migration_id)?;
+            Self::update_migration_id(&mut txn, migration_id)?;
        } else {
            info!("Running migration id={}:\n{}\n", migration_id, migration);

            txn.simple_query(migration)
                .with_context(|| format!("apply migration {migration_id}"))?;

-            Self::update_migration_id(txn, migration_id)?;
+            Self::update_migration_id(&mut txn, migration_id)?;
        }

+        txn.commit()
+            .with_context(|| format!("commit transaction for migration {migration_id}"))?;
+
        Ok(())
    }

@@ -109,19 +118,20 @@ impl<'m> MigrationRunner<'m> {
            // The index lags the migration ID by 1, so the current migration
            // ID is also the next index
            let migration_id = (current_migration + 1) as i64;
+            let migration = self.migrations[current_migration];

-            let mut txn = self
-                .client
-                .transaction()
-                .with_context(|| format!("begin transaction for migration {migration_id}"))?;
-
-            Self::run_migration(&mut txn, migration_id, self.migrations[current_migration])
-                .with_context(|| format!("running migration {migration_id}"))?;
-
-            txn.commit()
-                .with_context(|| format!("commit transaction for migration {migration_id}"))?;
-
-            info!("Finished migration id={}", migration_id);
+            match Self::run_migration(self.client, migration_id, migration) {
+                Ok(_) => {
+                    info!("Finished migration id={}", migration_id);
+                }
+                Err(e) => {
+                    error!("Failed to run migration id={}: {}", migration_id, e);
+                    DB_MIGRATION_FAILED
+                        .with_label_values(&[migration_id.to_string().as_str()])
+                        .inc();
+                    return Err(e);
+                }
+            }

            current_migration += 1;
        }
--- a/compute_tools/src/spec.rs
+++ b/compute_tools/src/spec.rs
@@ -6,6 +6,7 @@ use std::path::Path;
 use tracing::{error, info, instrument, warn};

 use crate::config;
+use crate::metrics::{CPlaneRequestRPC, CPLANE_REQUESTS_TOTAL, UNKNOWN_HTTP_STATUS};
 use crate::migration::MigrationRunner;
 use crate::params::PG_HBA_ALL_MD5;
 use crate::pg_helpers::*;
@@ -19,7 +20,7 @@ use compute_api::spec::ComputeSpec;
 fn do_control_plane_request(
    uri: &str,
    jwt: &str,
-) -> Result<ControlPlaneSpecResponse, (bool, String)> {
+) -> Result<ControlPlaneSpecResponse, (bool, String, String)> {
    let resp = reqwest::blocking::Client::new()
        .get(uri)
        .header("Authorization", format!("Bearer {}", jwt))
@@ -28,34 +29,41 @@ fn do_control_plane_request(
            (
                true,
                format!("could not perform spec request to control plane: {}", e),
+                UNKNOWN_HTTP_STATUS.to_string(),
            )
        })?;

-    match resp.status() {
+    let status = resp.status();
+    match status {
        StatusCode::OK => match resp.json::<ControlPlaneSpecResponse>() {
            Ok(spec_resp) => Ok(spec_resp),
            Err(e) => Err((
                true,
                format!("could not deserialize control plane response: {}", e),
+                status.to_string(),
            )),
        },
-        StatusCode::SERVICE_UNAVAILABLE => {
-            Err((true, "control plane is temporarily unavailable".to_string()))
-        }
+        StatusCode::SERVICE_UNAVAILABLE => Err((
+            true,
+            "control plane is temporarily unavailable".to_string(),
+            status.to_string(),
+        )),
        StatusCode::BAD_GATEWAY => {
            // We have a problem with intermittent 502 errors now
            // https://github.com/neondatabase/cloud/issues/2353
            // It's fine to retry GET request in this case.
-            Err((true, "control plane request failed with 502".to_string()))
+            Err((
+                true,
+                "control plane request failed with 502".to_string(),
+                status.to_string(),
+            ))
        }
        // Another code, likely 500 or 404, means that compute is unknown to the control plane
        // or some internal failure happened. Doesn't make much sense to retry in this case.
        _ => Err((
            false,
-            format!(
-                "unexpected control plane response status code: {}",
-                resp.status()
-            ),
+            format!("unexpected control plane response status code: {}", status),
+            status.to_string(),
        )),
    }
 }
@@ -83,17 +91,28 @@ pub fn get_spec_from_control_plane(
    // - got spec -> return Ok(Some(spec))
    while attempt < 4 {
        spec = match do_control_plane_request(&cp_uri, &jwt) {
-            Ok(spec_resp) => match spec_resp.status {
-                ControlPlaneComputeStatus::Empty => Ok(None),
-                ControlPlaneComputeStatus::Attached => {
-                    if let Some(spec) = spec_resp.spec {
-                        Ok(Some(spec))
-                    } else {
-                        bail!("compute is attached, but spec is empty")
+            Ok(spec_resp) => {
+                CPLANE_REQUESTS_TOTAL
+                    .with_label_values(&[
+                        CPlaneRequestRPC::GetSpec.as_str(),
+                        &StatusCode::OK.to_string(),
+                    ])
+                    .inc();
+                match spec_resp.status {
+                    ControlPlaneComputeStatus::Empty => Ok(None),
+                    ControlPlaneComputeStatus::Attached => {
+                        if let Some(spec) = spec_resp.spec {
+                            Ok(Some(spec))
+                        } else {
+                            bail!("compute is attached, but spec is empty")
+                        }
                    }
                }
-            },
-            Err((retry, msg)) => {
+            }
+            Err((retry, msg, status)) => {
+                CPLANE_REQUESTS_TOTAL
+                    .with_label_values(&[CPlaneRequestRPC::GetSpec.as_str(), &status])
+                    .inc();
                if retry {
                    Err(anyhow!(msg))
                } else {
--- a/control_plane/src/pageserver.rs
+++ b/control_plane/src/pageserver.rs
@@ -347,6 +347,11 @@ impl PageServerNode {
                .map(|x| x.parse::<usize>())
                .transpose()
                .context("Failed to parse 'compaction_threshold' as an integer")?,
+            compaction_upper_limit: settings
+                .remove("compaction_upper_limit")
+                .map(|x| x.parse::<usize>())
+                .transpose()
+                .context("Failed to parse 'compaction_upper_limit' as an integer")?,
            compaction_algorithm: settings
                .remove("compaction_algorithm")
                .map(serde_json::from_str)
--- a/docker-compose/compute_wrapper/shell/compute.sh
+++ b/docker-compose/compute_wrapper/shell/compute.sh
@@ -20,30 +20,55 @@ while ! nc -z pageserver 6400; do
 done
 echo "Page server is ready."

-echo "Create a tenant and timeline"
-generate_id tenant_id
-PARAMS=(
-     -X PUT
-     -H "Content-Type: application/json"
-     -d "{\"mode\": \"AttachedSingle\", \"generation\": 1, \"tenant_conf\": {}}"
-     "http://pageserver:9898/v1/tenant/${tenant_id}/location_config"
-)
-result=$(curl "${PARAMS[@]}")
-echo $result | jq .
+cp ${SPEC_FILE_ORG} ${SPEC_FILE}

-generate_id timeline_id
-PARAMS=(
-     -sbf
-     -X POST
-     -H "Content-Type: application/json"
-     -d "{\"new_timeline_id\": \"${timeline_id}\", \"pg_version\": ${PG_VERSION}}"
-     "http://pageserver:9898/v1/tenant/${tenant_id}/timeline/"
-)
-result=$(curl "${PARAMS[@]}")
-echo $result | jq .
+ if [ -n "${TENANT_ID:-}" ] && [ -n "${TIMELINE_ID:-}" ]; then
+   tenant_id=${TENANT_ID}
+   timeline_id=${TIMELINE_ID}
+else
+  echo "Check if a tenant present"
+  PARAMS=(
+       -X GET
+       -H "Content-Type: application/json"
+       "http://pageserver:9898/v1/tenant"
+  )
+  tenant_id=$(curl "${PARAMS[@]}" | jq -r .[0].id)
+  if [ -z "${tenant_id}" ] || [ "${tenant_id}" = null ]; then
+    echo "Create a tenant"
+    generate_id tenant_id
+    PARAMS=(
+         -X PUT
+         -H "Content-Type: application/json"
+         -d "{\"mode\": \"AttachedSingle\", \"generation\": 1, \"tenant_conf\": {}}"
+        "http://pageserver:9898/v1/tenant/${tenant_id}/location_config"
+    )
+    result=$(curl "${PARAMS[@]}")
+    echo $result | jq .
+  fi
+
+  echo "Check if a timeline present"
+  PARAMS=(
+       -X GET
+       -H "Content-Type: application/json"
+       "http://pageserver:9898/v1/tenant/${tenant_id}/timeline"
+  )
+  timeline_id=$(curl "${PARAMS[@]}" | jq -r .[0].timeline_id)
+  if [ -z "${timeline_id}" ] || [ "${timeline_id}" = null ]; then
+    generate_id timeline_id
+    PARAMS=(
+        -sbf
+        -X POST
+        -H "Content-Type: application/json"
+        -d "{\"new_timeline_id\": \"${timeline_id}\", \"pg_version\": ${PG_VERSION}}"
+        "http://pageserver:9898/v1/tenant/${tenant_id}/timeline/"
+    )
+    result=$(curl "${PARAMS[@]}")
+    echo $result | jq .
+  fi
+fi

 echo "Overwrite tenant id and timeline id in spec file"
-sed "s/TENANT_ID/${tenant_id}/" ${SPEC_FILE_ORG} > ${SPEC_FILE}
+sed -i "s/TENANT_ID/${tenant_id}/" ${SPEC_FILE}
 sed -i "s/TIMELINE_ID/${timeline_id}/" ${SPEC_FILE}

 cat ${SPEC_FILE}
--- a/docker-compose/docker-compose.yml
+++ b/docker-compose/docker-compose.yml
@@ -149,11 +149,13 @@ services:
      args:
        - REPOSITORY=${REPOSITORY:-neondatabase}
        - COMPUTE_IMAGE=compute-node-v${PG_VERSION:-16}
-        - TAG=${TAG:-latest}
+        - TAG=${COMPUTE_TAG:-${TAG:-latest}}
        - http_proxy=${http_proxy:-}
        - https_proxy=${https_proxy:-}
    environment:
      - PG_VERSION=${PG_VERSION:-16}
+      - TENANT_ID=${TENANT_ID:-}
+      - TIMELINE_ID=${TIMELINE_ID:-}
      #- RUST_BACKTRACE=1
    # Mount the test files directly, for faster editing cycle.
    volumes:
--- a/docker-compose/docker_compose_test.sh
+++ b/docker-compose/docker_compose_test.sh
@@ -31,7 +31,7 @@ for pg_version in ${TEST_VERSION_ONLY-14 15 16 17}; do
    echo "clean up containers if exists"
    cleanup
    PG_TEST_VERSION=$((pg_version < 16 ? 16 : pg_version))
-    PG_VERSION=$pg_version PG_TEST_VERSION=$PG_TEST_VERSION docker compose --profile test-extensions -f $COMPOSE_FILE up --build -d
+    PG_VERSION=$pg_version PG_TEST_VERSION=$PG_TEST_VERSION docker compose --profile test-extensions -f $COMPOSE_FILE up --quiet-pull --build -d

    echo "wait until the compute is ready. timeout after 60s. "
    cnt=0
@@ -51,6 +51,7 @@ for pg_version in ${TEST_VERSION_ONLY-14 15 16 17}; do
    done

    if [ $pg_version -ge 16 ]; then
+        docker cp ext-src $TEST_CONTAINER_NAME:/
        # This is required for the pg_hint_plan test, to prevent flaky log message causing the test to fail
        # It cannot be moved to Dockerfile now because the database directory is created after the start of the container
        echo Adding dummy config
@@ -61,7 +62,7 @@ for pg_version in ${TEST_VERSION_ONLY-14 15 16 17}; do
        docker cp $TMPDIR/data $COMPUTE_CONTAINER_NAME:/ext-src/pg_hint_plan-src/
        rm -rf $TMPDIR
        # We are running tests now
-        if ! docker exec -e SKIP=timescaledb-src,rdkit-src,postgis-src,pgx_ulid-src,pgtap-src,pg_tiktoken-src,pg_jsonschema-src,pg_graphql-src,kq_imcx-src,wal2json_2_5-src \
+        if ! docker exec -e SKIP=timescaledb-src,rdkit-src,postgis-src,pgx_ulid-src,pgtap-src,pg_tiktoken-src,pg_jsonschema-src,kq_imcx-src,wal2json_2_5-src \
            $TEST_CONTAINER_NAME /run-tests.sh | tee testout.txt
        then
            FAILED=$(tail -1 testout.txt)
--- a/docker-compose/ext-src/hll-src/test-upgrade.sh
+++ b/docker-compose/ext-src/hll-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression add_agg agg_oob auto_sparse card_op cast_shape copy_binary cumulative_add_cardinality_correction cumulative_add_comprehensive_promotion cumulative_add_sparse_edge cumulative_add_sparse_random cumulative_add_sparse_step cumulative_union_comprehensive cumulative_union_explicit_explicit cumulative_union_explicit_promotion cumulative_union_probabilistic_probabilistic cumulative_union_sparse_full_representation cumulative_union_sparse_promotion cumulative_union_sparse_sparse disable_hashagg equal explicit_thresh hash hash_any meta_func murmur_bigint murmur_bytea nosparse notequal scalar_oob storedproc transaction typmod typmod_insert union_op
--- a/docker-compose/ext-src/hypopg-src/test-upgrade.patch
+++ b/docker-compose/ext-src/hypopg-src/test-upgrade.patch
@@ -0,0 +1,27 @@
+diff --git a/expected/hypopg.out b/expected/hypopg.out
+index 90121d0..859260b 100644
+--- a/expected/hypopg.out
+++ b/expected/hypopg.out
+@@ -11,7 +11,8 @@ BEGIN
+ END;
+ $_$
+ LANGUAGE plpgsql;
+-CREATE EXTENSION hypopg;
+CREATE EXTENSION IF NOT EXISTS hypopg;
+NOTICE:  extension "hypopg" already exists, skipping
+ CREATE TABLE hypo (id integer, val text, "Id2" bigint);
+ INSERT INTO hypo SELECT i, 'line ' || i
+ FROM generate_series(1,100000) f(i);
+diff --git a/test/sql/hypopg.sql b/test/sql/hypopg.sql
+index 99722b0..8d6bacb 100644
+--- a/test/sql/hypopg.sql
+++ b/test/sql/hypopg.sql
+@@ -12,7 +12,7 @@ END;
+ $_$
+ LANGUAGE plpgsql;
+
+-CREATE EXTENSION hypopg;
+CREATE EXTENSION IF NOT EXISTS hypopg;
+
+ CREATE TABLE hypo (id integer, val text, "Id2" bigint);
+
--- a/docker-compose/ext-src/hypopg-src/test-upgrade.sh
+++ b/docker-compose/ext-src/hypopg-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --inputdir=test --dbname=contrib_regression hypopg hypo_brin hypo_index_part hypo_include hypo_hash hypo_hide_index
--- a/docker-compose/ext-src/ip4r-src/test-upgrade.patch
+++ b/docker-compose/ext-src/ip4r-src/test-upgrade.patch
@@ -0,0 +1,23 @@
+diff --git a/expected/ip4r.out b/expected/ip4r.out
+index 7527af3..b38ed29 100644
+--- a/expected/ip4r.out
+++ b/expected/ip4r.out
+@@ -1,6 +1,5 @@
+ --
+ /*CUT-HERE*/
+-CREATE EXTENSION ip4r;
+ -- Check whether any of our opclasses fail amvalidate
+ DO $d$
+   DECLARE
+diff --git a/sql/ip4r.sql b/sql/ip4r.sql
+index 65c49ec..24ade09 100644
+--- a/sql/ip4r.sql
+++ b/sql/ip4r.sql
+@@ -1,7 +1,6 @@
+ --
+
+ /*CUT-HERE*/
+-CREATE EXTENSION ip4r;
+
+ -- Check whether any of our opclasses fail amvalidate
+
--- a/docker-compose/ext-src/ip4r-src/test-upgrade.sh
+++ b/docker-compose/ext-src/ip4r-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression ip4r ip4r-softerr ip4r-v11
--- a/docker-compose/ext-src/pg_cron-src/test-upgrade.patch
+++ b/docker-compose/ext-src/pg_cron-src/test-upgrade.patch
@@ -0,0 +1,75 @@
+diff --git a/expected/pg_cron-test.out b/expected/pg_cron-test.out
+index d79d542..1663886 100644
+--- a/expected/pg_cron-test.out
+++ b/expected/pg_cron-test.out
+@@ -1,30 +1,3 @@
+-CREATE EXTENSION pg_cron VERSION '1.0';
+-SELECT extversion FROM pg_extension WHERE extname='pg_cron';
+- extversion 
+-------------
+- 1.0
+-(1 row)
+-
+--- Test binary compatibility with v1.4 function signature.
+-ALTER EXTENSION pg_cron UPDATE TO '1.4';
+-SELECT cron.unschedule(job_name := 'no_such_job');
+-ERROR:  could not find valid entry for job 'no_such_job'
+-SELECT cron.schedule('testjob', '* * * * *', 'SELECT 1');
+- schedule 
+-----------
+-        1
+-(1 row)
+-
+-SELECT cron.unschedule('testjob');
+- unschedule 
+-------------
+- t
+-(1 row)
+-
+--- Test cache invalidation
+-DROP EXTENSION pg_cron;
+-CREATE EXTENSION pg_cron VERSION '1.4';
+-ALTER EXTENSION pg_cron UPDATE;
+ -- Vacuum every day at 10:00am (GMT)
+ SELECT cron.schedule('0 10 * * *', 'VACUUM');
+  schedule 
+@@ -300,8 +273,3 @@ SELECT jobid, jobname, schedule, command FROM cron.job ORDER BY jobid;
+ SELECT cron.schedule('bad-last-dom-job1', '0 11 $foo * *', 'VACUUM FULL');
+ ERROR:  invalid schedule: 0 11 $foo * *
+ HINT:  Use cron format (e.g. 5 4 * * *), or interval format '[1-59] seconds'
+--- cleaning
+-DROP EXTENSION pg_cron;
+-drop user pgcron_cront;
+-drop database pgcron_dbno;
+-drop database pgcron_dbyes;
+diff --git a/sql/pg_cron-test.sql b/sql/pg_cron-test.sql
+index 45f94d9..241cf73 100644
+--- a/sql/pg_cron-test.sql
+++ b/sql/pg_cron-test.sql
+@@ -1,17 +1,3 @@
+-CREATE EXTENSION pg_cron VERSION '1.0';
+-SELECT extversion FROM pg_extension WHERE extname='pg_cron';
+--- Test binary compatibility with v1.4 function signature.
+-ALTER EXTENSION pg_cron UPDATE TO '1.4';
+-SELECT cron.unschedule(job_name := 'no_such_job');
+-SELECT cron.schedule('testjob', '* * * * *', 'SELECT 1');
+-SELECT cron.unschedule('testjob');
+-
+--- Test cache invalidation
+-DROP EXTENSION pg_cron;
+-CREATE EXTENSION pg_cron VERSION '1.4';
+-
+-ALTER EXTENSION pg_cron UPDATE;
+-
+ -- Vacuum every day at 10:00am (GMT)
+ SELECT cron.schedule('0 10 * * *', 'VACUUM');
+ 
+@@ -156,8 +142,3 @@ SELECT jobid, jobname, schedule, command FROM cron.job ORDER BY jobid;
+ -- invalid last of day job
+ SELECT cron.schedule('bad-last-dom-job1', '0 11 $foo * *', 'VACUUM FULL');
+ 
+--- cleaning
+-DROP EXTENSION pg_cron;
+-drop user pgcron_cront;
+-drop database pgcron_dbno;
+-drop database pgcron_dbyes;
--- a/docker-compose/ext-src/pg_cron-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pg_cron-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression pg_cron-test
--- a/docker-compose/ext-src/pg_graphql-src/neon-test.sh
+++ b/docker-compose/ext-src/pg_graphql-src/neon-test.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+set -ex
+cd "$(dirname "${0}")"
+dropdb --if-exists contrib_regression
+createdb contrib_regression
+PGXS="$(dirname "$(pg_config --pgxs)" )"
+REGRESS="${PGXS}/../test/regress/pg_regress"
+TESTDIR="test"
+TESTS=$(ls "${TESTDIR}/sql" | sort )
+TESTS=${TESTS//\.sql/}
+psql -v ON_ERROR_STOP=1 -f test/fixtures.sql -d contrib_regression
+${REGRESS} --use-existing --dbname=contrib_regression --inputdir=${TESTDIR} ${TESTS}
+
--- a/docker-compose/ext-src/pg_ivm-src/test-upgrade.patch
+++ b/docker-compose/ext-src/pg_ivm-src/test-upgrade.patch
@@ -0,0 +1,18 @@
+diff --git a/expected/pg_ivm.out b/expected/pg_ivm.out
+index e8798ee..cca58d0 100644
+--- a/expected/pg_ivm.out
+++ b/expected/pg_ivm.out
+@@ -1,4 +1,3 @@
+-CREATE EXTENSION pg_ivm;
+ GRANT ALL ON SCHEMA public TO public;
+ -- create a table to use as a basis for views and materialized views in various combinations
+ CREATE TABLE mv_base_a (i int, j int);
+diff --git a/sql/pg_ivm.sql b/sql/pg_ivm.sql
+index d3c1a01..9382d7f 100644
+--- a/sql/pg_ivm.sql
+++ b/sql/pg_ivm.sql
+@@ -1,4 +1,3 @@
+-CREATE EXTENSION pg_ivm;
+ GRANT ALL ON SCHEMA public TO public;
+ 
+ -- create a table to use as a basis for views and materialized views in various combinations
--- a/docker-compose/ext-src/pg_ivm-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pg_ivm-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin' --dbname=contrib_regression pg_ivm create_immv refresh_immv
--- a/docker-compose/ext-src/pg_roaringbitmap-src/test-upgrade.patch
+++ b/docker-compose/ext-src/pg_roaringbitmap-src/test-upgrade.patch
@@ -0,0 +1,25 @@
+diff --git a/expected/roaringbitmap.out b/expected/roaringbitmap.out
+index de70531..a5f7c15 100644
+--- a/expected/roaringbitmap.out
+++ b/expected/roaringbitmap.out
+@@ -1,7 +1,6 @@
+ --
+ --  Test roaringbitmap extension
+ --
+-CREATE EXTENSION if not exists roaringbitmap;
+ -- Test input and output
+ set roaringbitmap.output_format='array';
+ set extra_float_digits = 0;
+diff --git a/sql/roaringbitmap.sql b/sql/roaringbitmap.sql
+index a0e9c74..84bc966 100644
+--- a/sql/roaringbitmap.sql
+++ b/sql/roaringbitmap.sql
+@@ -2,8 +2,6 @@
+ --  Test roaringbitmap extension
+ --
+ 
+-CREATE EXTENSION if not exists roaringbitmap;
+-
+ -- Test input and output
+ 
+ set roaringbitmap.output_format='array';
--- a/docker-compose/ext-src/pg_roaringbitmap-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pg_roaringbitmap-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression roaringbitmap
--- a/docker-compose/ext-src/pg_semver-src/test-upgrade.patch
+++ b/docker-compose/ext-src/pg_semver-src/test-upgrade.patch
@@ -0,0 +1,24 @@
+diff --git a/test/sql/base.sql b/test/sql/base.sql
+index af599d8..2eed91b 100644
+--- a/test/sql/base.sql
+++ b/test/sql/base.sql
+@@ -2,7 +2,6 @@
+ BEGIN;
+ 
+ \i test/pgtap-core.sql
+-\i sql/semver.sql
+ 
+ SELECT plan(334);
+ --SELECT * FROM no_plan();
+diff --git a/test/sql/corpus.sql b/test/sql/corpus.sql
+index 1f5f637..a519905 100644
+--- a/test/sql/corpus.sql
+++ b/test/sql/corpus.sql
+@@ -4,7 +4,6 @@ BEGIN;
+ -- Test the SemVer corpus from https://regex101.com/r/Ly7O1x/3/.
+ 
+ \i test/pgtap-core.sql
+-\i sql/semver.sql
+ 
+ SELECT plan(71);
+ --SELECT * FROM no_plan();
--- a/docker-compose/ext-src/pg_semver-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pg_semver-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --inputdir=test --dbname=contrib_regression base corpus
--- a/docker-compose/ext-src/pg_uuidv7-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pg_uuidv7-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --inputdir=test --dbname=contrib_regression  002_uuid_generate_v7 003_uuid_v7_to_timestamptz 004_uuid_timestamptz_to_v7 005_uuid_v7_to_timestamp 006_uuid_timestamp_to_v7
--- a/docker-compose/ext-src/pgvector-src/test-upgrade.sh
+++ b/docker-compose/ext-src/pgvector-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --inputdir=test --use-existing --dbname=contrib_regression bit btree cast copy halfvec hnsw_bit hnsw_halfvec hnsw_sparsevec hnsw_vector ivfflat_bit ivfflat_halfvec ivfflat_vector sparsevec vector_type
--- a/docker-compose/ext-src/plv8-src/test-upgrade.sh
+++ b/docker-compose/ext-src/plv8-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin'  --use-existing --dbname=contrib_regression plv8 plv8-errors scalar_args inline json startup_pre startup varparam json_conv jsonb_conv window guc es6 arraybuffer composites currentresource startup_perms bytea find_function_perms memory_limits reset show array_spread regression dialect bigint procedure
--- a/docker-compose/ext-src/postgresql-unit-src/test-upgrade.sh
+++ b/docker-compose/ext-src/postgresql-unit-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --dbname=contrib_regression extension tables unit binary unicode prefix units time temperature functions language_functions round derived compare aggregate iec custom crosstab convert
--- a/docker-compose/ext-src/prefix-src/test-upgrade.sh
+++ b/docker-compose/ext-src/prefix-src/test-upgrade.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression prefix falcon explain queries
--- a/docker-compose/ext-src/rum-src/test-upgrade.patch
+++ b/docker-compose/ext-src/rum-src/test-upgrade.patch
@@ -0,0 +1,19 @@
+diff --git a/expected/rum.out b/expected/rum.out
+index 5966d19..8860b79 100644
+--- a/expected/rum.out
+++ b/expected/rum.out
+@@ -1,4 +1,3 @@
+-CREATE EXTENSION rum;
+ CREATE TABLE test_rum( t text, a tsvector );
+ CREATE TRIGGER tsvectorupdate
+ BEFORE UPDATE OR INSERT ON test_rum
+diff --git a/sql/rum.sql b/sql/rum.sql
+index 8414bb9..898e6ab 100644
+--- a/sql/rum.sql
+++ b/sql/rum.sql
+@@ -1,5 +1,3 @@
+-CREATE EXTENSION rum;
+-
+ CREATE TABLE test_rum( t text, a tsvector );
+
+ CREATE TRIGGER tsvectorupdate
--- a/docker-compose/ext-src/rum-src/test-upgrade.sh
+++ b/docker-compose/ext-src/rum-src/test-upgrade.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+patch -p1 <test-upgrade.patch
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --dbname=contrib_regression rum rum_validate rum_hash ruminv timestamp orderby orderby_hash altorder altorder_hash limits int2 int4 int8 float4 float8 money oid time timetz date interval macaddr inet cidr text varchar char bytea bit varbit numeric rum_weight expr array
--- a/docker-compose/run-tests.sh
+++ b/docker-compose/run-tests.sh
@@ -4,14 +4,17 @@ set -x
 cd /ext-src || exit 2
 FAILED=
 LIST=$( (echo -e "${SKIP//","/"\n"}"; ls -d -- *-src) | sort | uniq -u)
-for d in ${LIST}
-do
-       [ -d "${d}" ] || continue
-       if ! psql -w -c "select 1" >/dev/null; then
-          FAILED="${d} ${FAILED}"
-          break
-       fi
+for d in ${LIST}; do
+    [ -d "${d}" ] || continue
+    if ! psql -w -c "select 1" >/dev/null; then
+      FAILED="${d} ${FAILED}"
+      break
+    fi
+    if [ -f "${d}/neon-test.sh" ]; then
+       "${d}/neon-test.sh" || FAILED="${d} ${FAILED}"
+    else
       USE_PGXS=1 make -C "${d}" installcheck || FAILED="${d} ${FAILED}"
+    fi
 done
 [ -z "${FAILED}" ] && exit 0
 echo "${FAILED}"
--- a/docker-compose/test_extensions_upgrade.sh
+++ b/docker-compose/test_extensions_upgrade.sh
@@ -0,0 +1,93 @@
+#!/bin/bash
+set -eux -o pipefail
+cd "$(dirname "${0}")"
+# Takes a variable name as argument. The result is stored in that variable.
+generate_id() {
+    local -n resvar=$1
+    printf -v resvar '%08x%08x%08x%08x' $SRANDOM $SRANDOM $SRANDOM $SRANDOM
+}
+if [ -z ${OLDTAG+x} ] || [ -z ${NEWTAG+x} ] || [ -z "${OLDTAG}" ] || [ -z "${NEWTAG}" ]; then
+  echo OLDTAG and NEWTAG must be defined
+  exit 1
+fi
+export PG_VERSION=${PG_VERSION:-16}
+function wait_for_ready {
+  TIME=0
+  while ! docker compose logs compute_is_ready | grep -q "accepting connections" && [ ${TIME} -le 300 ] ; do
+    ((TIME += 1 ))
+    sleep 1
+  done
+  if [ ${TIME} -gt 300 ]; then
+    echo Time is out.
+    exit 2
+  fi
+}
+function create_extensions() {
+  for ext in ${1}; do
+    docker compose exec neon-test-extensions psql -X -v ON_ERROR_STOP=1 -d contrib_regression -c "CREATE EXTENSION IF NOT EXISTS ${ext}"
+  done
+}
+EXTENSIONS='[
+{"extname": "plv8", "extdir": "plv8-src"},
+{"extname": "vector", "extdir": "pgvector-src"},
+{"extname": "unit", "extdir": "postgresql-unit-src"},
+{"extname": "hypopg", "extdir": "hypopg-src"},
+{"extname": "rum", "extdir": "rum-src"},
+{"extname": "ip4r", "extdir": "ip4r-src"},
+{"extname": "prefix", "extdir": "prefix-src"},
+{"extname": "hll", "extdir": "hll-src"},
+{"extname": "pg_cron", "extdir": "pg_cron-src"},
+{"extname": "pg_uuidv7", "extdir": "pg_uuidv7-src"},
+{"extname": "roaringbitmap", "extdir": "pg_roaringbitmap-src"},
+{"extname": "semver", "extdir": "pg_semver-src"},
+{"extname": "pg_ivm", "extdir": "pg_ivm-src"}
+]'
+EXTNAMES=$(echo ${EXTENSIONS} | jq -r '.[].extname' | paste -sd ' ' -)
+TAG=${NEWTAG} docker compose --profile test-extensions up --quiet-pull --build -d
+wait_for_ready
+docker compose exec neon-test-extensions psql -c "DROP DATABASE IF EXISTS contrib_regression"
+docker compose exec neon-test-extensions psql -c "CREATE DATABASE contrib_regression"
+create_extensions "${EXTNAMES}"
+query="select json_object_agg(extname,extversion) from pg_extension where extname in ('${EXTNAMES// /\',\'}')"
+new_vers=$(docker compose exec neon-test-extensions psql -Aqt -d contrib_regression -c "$query")
+docker compose --profile test-extensions down
+TAG=${OLDTAG} docker compose --profile test-extensions up --quiet-pull --build -d --force-recreate
+wait_for_ready
+docker compose cp  ext-src neon-test-extensions:/
+docker compose exec neon-test-extensions psql -c "DROP DATABASE IF EXISTS contrib_regression"
+docker compose exec neon-test-extensions psql -c "CREATE DATABASE contrib_regression"
+create_extensions "${EXTNAMES}"
+query="select pge.extname from pg_extension pge join (select key as extname, value as extversion from json_each_text('${new_vers}')) x on pge.extname=x.extname and pge.extversion <> x.extversion"
+exts=$(docker compose exec neon-test-extensions psql -Aqt -d contrib_regression -c "$query")
+if [ -z "${exts}" ]; then
+  echo "No extensions were upgraded"
+else
+  tenant_id=$(docker compose exec neon-test-extensions psql -Aqt -c "SHOW neon.tenant_id")
+  timeline_id=$(docker compose exec neon-test-extensions psql -Aqt -c "SHOW neon.timeline_id")
+  for ext in ${exts}; do
+    echo Testing ${ext}...
+    EXTDIR=$(echo ${EXTENSIONS} | jq -r '.[] | select(.extname=="'${ext}'") | .extdir')
+    generate_id new_timeline_id
+    PARAMS=(
+        -sbf
+        -X POST
+        -H "Content-Type: application/json"
+        -d "{\"new_timeline_id\": \"${new_timeline_id}\", \"pg_version\": ${PG_VERSION}, \"ancestor_timeline_id\": \"${timeline_id}\"}"
+        "http://127.0.0.1:9898/v1/tenant/${tenant_id}/timeline/"
+    )
+    result=$(curl "${PARAMS[@]}")
+    echo $result | jq .
+    TENANT_ID=${tenant_id} TIMELINE_ID=${new_timeline_id} TAG=${OLDTAG} docker compose down compute compute_is_ready
+    COMPUTE_TAG=${NEWTAG} TAG=${OLDTAG} TENANT_ID=${tenant_id} TIMELINE_ID=${new_timeline_id} docker compose up --quiet-pull -d --build compute compute_is_ready
+    wait_for_ready
+    TID=$(docker compose exec neon-test-extensions psql -Aqt -c "SHOW neon.timeline_id")
+    if [ ${TID} != ${new_timeline_id} ]; then
+      echo Timeline mismatch
+      exit 1
+    fi
+    docker compose exec neon-test-extensions psql -d contrib_regression -c "\dx ${ext}"
+    docker compose exec neon-test-extensions sh -c /ext-src/${EXTDIR}/test-upgrade.sh
+    docker compose exec neon-test-extensions psql -d contrib_regression -c "alter extension ${ext} update"
+    docker compose exec neon-test-extensions psql -d contrib_regression -c "\dx ${ext}"
+  done
+fi
--- a/libs/compute_api/src/responses.rs
+++ b/libs/compute_api/src/responses.rs
@@ -15,11 +15,6 @@ pub struct GenericAPIError {
    pub error: String,
 }

-#[derive(Debug, Clone, Serialize)]
-pub struct InfoResponse {
-    pub num_cpus: usize,
-}
-
 #[derive(Debug, Clone, Serialize)]
 pub struct ExtensionInstallResponse {
    pub extension: PgIdent,
--- a/libs/pageserver_api/src/config.rs
+++ b/libs/pageserver_api/src/config.rs
@@ -256,6 +256,11 @@ pub struct TenantConfigToml {
    pub compaction_period: Duration,
    /// Level0 delta layer threshold for compaction.
    pub compaction_threshold: usize,
+    /// Controls the amount of L0 included in a single compaction iteration.
+    /// The unit is `checkpoint_distance`, i.e., a size.
+    /// We add L0s to the set of layers to compact until their cumulative
+    /// size exceeds `compaction_upper_limit * checkpoint_distance`.
+    pub compaction_upper_limit: usize,
    pub compaction_algorithm: crate::models::CompactionAlgorithmSettings,
    /// Level0 delta layer threshold at which to delay layer flushes for compaction backpressure,
    /// such that they take 2x as long, and start waiting for layer flushes during ephemeral layer
@@ -523,6 +528,12 @@ pub mod tenant_conf_defaults {

    pub const DEFAULT_COMPACTION_PERIOD: &str = "20 s";
    pub const DEFAULT_COMPACTION_THRESHOLD: usize = 10;
+
+    // This value needs to be tuned to avoid OOM. We have 3/4 of the total CPU threads to do background works, that's 16*3/4=9 on
+    // most of our pageservers. Compaction ~50 layers requires about 2GB memory (could be reduced later by optimizing L0 hole
+    // calculation to avoid loading all keys into the memory). So with this config, we can get a maximum peak compaction usage of 18GB.
+    pub const DEFAULT_COMPACTION_UPPER_LIMIT: usize = 50;
+
    pub const DEFAULT_COMPACTION_ALGORITHM: crate::models::CompactionAlgorithm =
        crate::models::CompactionAlgorithm::Legacy;

@@ -563,6 +574,7 @@ impl Default for TenantConfigToml {
            compaction_period: humantime::parse_duration(DEFAULT_COMPACTION_PERIOD)
                .expect("cannot parse default compaction period"),
            compaction_threshold: DEFAULT_COMPACTION_THRESHOLD,
+            compaction_upper_limit: DEFAULT_COMPACTION_UPPER_LIMIT,
            compaction_algorithm: crate::models::CompactionAlgorithmSettings {
                kind: DEFAULT_COMPACTION_ALGORITHM,
            },
--- a/libs/pageserver_api/src/models.rs
+++ b/libs/pageserver_api/src/models.rs
@@ -458,6 +458,8 @@ pub struct TenantConfigPatch {
    pub compaction_period: FieldPatch<String>,
    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
    pub compaction_threshold: FieldPatch<usize>,
+    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
+    pub compaction_upper_limit: FieldPatch<usize>,
    // defer parsing compaction_algorithm, like eviction_policy
    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
    pub compaction_algorithm: FieldPatch<CompactionAlgorithmSettings>,
@@ -522,6 +524,7 @@ pub struct TenantConfig {
    pub compaction_target_size: Option<u64>,
    pub compaction_period: Option<String>,
    pub compaction_threshold: Option<usize>,
+    pub compaction_upper_limit: Option<usize>,
    // defer parsing compaction_algorithm, like eviction_policy
    pub compaction_algorithm: Option<CompactionAlgorithmSettings>,
    pub l0_flush_delay_threshold: Option<usize>,
@@ -559,6 +562,7 @@ impl TenantConfig {
            mut compaction_target_size,
            mut compaction_period,
            mut compaction_threshold,
+            mut compaction_upper_limit,
            mut compaction_algorithm,
            mut l0_flush_delay_threshold,
            mut l0_flush_stall_threshold,
@@ -594,6 +598,9 @@ impl TenantConfig {
            .apply(&mut compaction_target_size);
        patch.compaction_period.apply(&mut compaction_period);
        patch.compaction_threshold.apply(&mut compaction_threshold);
+        patch
+            .compaction_upper_limit
+            .apply(&mut compaction_upper_limit);
        patch.compaction_algorithm.apply(&mut compaction_algorithm);
        patch
            .l0_flush_delay_threshold
@@ -653,6 +660,7 @@ impl TenantConfig {
            compaction_target_size,
            compaction_period,
            compaction_threshold,
+            compaction_upper_limit,
            compaction_algorithm,
            l0_flush_delay_threshold,
            l0_flush_stall_threshold,
--- a/libs/pq_proto/src/lib.rs
+++ b/libs/pq_proto/src/lib.rs
@@ -182,6 +182,13 @@ pub struct CancelKeyData {
    pub cancel_key: i32,
 }

+pub fn id_to_cancel_key(id: u64) -> CancelKeyData {
+    CancelKeyData {
+        backend_pid: (id >> 32) as i32,
+        cancel_key: (id & 0xffffffff) as i32,
+    }
+}
+
 impl fmt::Display for CancelKeyData {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let hi = (self.backend_pid as u64) << 32;
--- a/libs/proxy/tokio-postgres2/Cargo.toml
+++ b/libs/proxy/tokio-postgres2/Cargo.toml
@@ -19,3 +19,4 @@ postgres-protocol2 = { path = "../postgres-protocol2" }
 postgres-types2 = { path = "../postgres-types2" }
 tokio = { workspace = true, features = ["io-util", "time", "net"] }
 tokio-util = { workspace = true, features = ["codec"] }
+serde = { workspace = true, features = ["derive"] }
--- a/libs/proxy/tokio-postgres2/src/cancel_token.rs
+++ b/libs/proxy/tokio-postgres2/src/cancel_token.rs
@@ -3,12 +3,13 @@ use crate::tls::TlsConnect;

 use crate::{cancel_query, client::SocketConfig, tls::MakeTlsConnect};
 use crate::{cancel_query_raw, Error};
+use serde::{Deserialize, Serialize};
 use tokio::io::{AsyncRead, AsyncWrite};
 use tokio::net::TcpStream;

 /// The capability to request cancellation of in-progress queries on a
 /// connection.
-#[derive(Clone)]
+#[derive(Clone, Serialize, Deserialize)]
 pub struct CancelToken {
    pub socket_config: Option<SocketConfig>,
    pub ssl_mode: SslMode,
--- a/libs/proxy/tokio-postgres2/src/client.rs
+++ b/libs/proxy/tokio-postgres2/src/client.rs
@@ -18,6 +18,7 @@ use fallible_iterator::FallibleIterator;
 use futures_util::{future, ready, TryStreamExt};
 use parking_lot::Mutex;
 use postgres_protocol2::message::{backend::Message, frontend};
+use serde::{Deserialize, Serialize};
 use std::collections::HashMap;
 use std::fmt;
 use std::sync::Arc;
@@ -137,7 +138,7 @@ impl InnerClient {
    }
 }

-#[derive(Clone)]
+#[derive(Clone, Serialize, Deserialize)]
 pub struct SocketConfig {
    pub host: Host,
    pub port: u16,
--- a/libs/proxy/tokio-postgres2/src/config.rs
+++ b/libs/proxy/tokio-postgres2/src/config.rs
@@ -7,6 +7,7 @@ use crate::tls::MakeTlsConnect;
 use crate::tls::TlsConnect;
 use crate::{Client, Connection, Error};
 use postgres_protocol2::message::frontend::StartupMessageParams;
+use serde::{Deserialize, Serialize};
 use std::fmt;
 use std::str;
 use std::time::Duration;
@@ -16,7 +17,7 @@ pub use postgres_protocol2::authentication::sasl::ScramKeys;
 use tokio::net::TcpStream;

 /// TLS configuration.
-#[derive(Debug, Copy, Clone, PartialEq, Eq)]
+#[derive(Debug, Copy, Clone, PartialEq, Eq, Serialize, Deserialize)]
 #[non_exhaustive]
 pub enum SslMode {
    /// Do not use TLS.
@@ -50,7 +51,7 @@ pub enum ReplicationMode {
 }

 /// A host specification.
-#[derive(Debug, Clone, PartialEq, Eq)]
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 pub enum Host {
    /// A TCP hostname.
    Tcp(String),
--- a/pageserver/Cargo.toml
+++ b/pageserver/Cargo.toml
@@ -36,7 +36,7 @@ itertools.workspace = true
 md5.workspace = true
 nix.workspace = true
 # hack to get the number of worker threads tokio uses
-num_cpus = { version = "1.15" }
+num_cpus.workspace = true
 num-traits.workspace = true
 once_cell.workspace = true
 pin-project-lite.workspace = true
--- a/pageserver/src/http/openapi_spec.yml
+++ b/pageserver/src/http/openapi_spec.yml
@@ -984,6 +984,8 @@ components:
          type: string
        compaction_threshold:
          type: string
+        compaction_upper_limit:
+          type: string
        image_creation_threshold:
          type: integer
        walreceiver_connect_timeout:
--- a/pageserver/src/tenant.rs
+++ b/pageserver/src/tenant.rs
@@ -3816,6 +3816,13 @@ impl Tenant {
            .unwrap_or(self.conf.default_tenant_conf.compaction_threshold)
    }

+    pub fn get_compaction_upper_limit(&self) -> usize {
+        let tenant_conf = self.tenant_conf.load().tenant_conf.clone();
+        tenant_conf
+            .compaction_upper_limit
+            .unwrap_or(self.conf.default_tenant_conf.compaction_upper_limit)
+    }
+
    pub fn get_gc_horizon(&self) -> u64 {
        let tenant_conf = self.tenant_conf.load().tenant_conf.clone();
        tenant_conf
@@ -5469,6 +5476,7 @@ pub(crate) mod harness {
                compaction_target_size: Some(tenant_conf.compaction_target_size),
                compaction_period: Some(tenant_conf.compaction_period),
                compaction_threshold: Some(tenant_conf.compaction_threshold),
+                compaction_upper_limit: Some(tenant_conf.compaction_upper_limit),
                compaction_algorithm: Some(tenant_conf.compaction_algorithm),
                l0_flush_delay_threshold: tenant_conf.l0_flush_delay_threshold,
                l0_flush_stall_threshold: tenant_conf.l0_flush_stall_threshold,
--- a/pageserver/src/tenant/config.rs
+++ b/pageserver/src/tenant/config.rs
@@ -277,6 +277,10 @@ pub struct TenantConfOpt {
    #[serde(default)]
    pub compaction_threshold: Option<usize>,

+    #[serde(skip_serializing_if = "Option::is_none")]
+    #[serde(default)]
+    pub compaction_upper_limit: Option<usize>,
+
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde(default)]
    pub compaction_algorithm: Option<CompactionAlgorithmSettings>,
@@ -401,6 +405,9 @@ impl TenantConfOpt {
            compaction_threshold: self
                .compaction_threshold
                .unwrap_or(global_conf.compaction_threshold),
+            compaction_upper_limit: self
+                .compaction_upper_limit
+                .unwrap_or(global_conf.compaction_upper_limit),
            compaction_algorithm: self
                .compaction_algorithm
                .as_ref()
@@ -478,6 +485,7 @@ impl TenantConfOpt {
            mut compaction_target_size,
            mut compaction_period,
            mut compaction_threshold,
+            mut compaction_upper_limit,
            mut compaction_algorithm,
            mut l0_flush_delay_threshold,
            mut l0_flush_stall_threshold,
@@ -519,6 +527,9 @@ impl TenantConfOpt {
            .map(|v| humantime::parse_duration(&v))?
            .apply(&mut compaction_period);
        patch.compaction_threshold.apply(&mut compaction_threshold);
+        patch
+            .compaction_upper_limit
+            .apply(&mut compaction_upper_limit);
        patch.compaction_algorithm.apply(&mut compaction_algorithm);
        patch
            .l0_flush_delay_threshold
@@ -596,6 +607,7 @@ impl TenantConfOpt {
            compaction_target_size,
            compaction_period,
            compaction_threshold,
+            compaction_upper_limit,
            compaction_algorithm,
            l0_flush_delay_threshold,
            l0_flush_stall_threshold,
@@ -657,6 +669,7 @@ impl From<TenantConfOpt> for models::TenantConfig {
            compaction_target_size: value.compaction_target_size,
            compaction_period: value.compaction_period.map(humantime),
            compaction_threshold: value.compaction_threshold,
+            compaction_upper_limit: value.compaction_upper_limit,
            l0_flush_delay_threshold: value.l0_flush_delay_threshold,
            l0_flush_stall_threshold: value.l0_flush_stall_threshold,
            l0_flush_wait_upload: value.l0_flush_wait_upload,
--- a/pageserver/src/tenant/storage_layer.rs
+++ b/pageserver/src/tenant/storage_layer.rs
@@ -33,6 +33,7 @@ use utils::sync::gate::GateGuard;

 use utils::lsn::Lsn;

+pub use batch_split_writer::{BatchLayerWriter, SplitDeltaLayerWriter, SplitImageLayerWriter};
 pub use delta_layer::{DeltaLayer, DeltaLayerWriter, ValueRef};
 pub use image_layer::{ImageLayer, ImageLayerWriter};
 pub use inmemory_layer::InMemoryLayer;
--- a/pageserver/src/tenant/storage_layer/batch_split_writer.rs
+++ b/pageserver/src/tenant/storage_layer/batch_split_writer.rs
@@ -87,6 +87,23 @@ impl BatchLayerWriter {
        ));
    }

+    pub(crate) async fn finish(
+        self,
+        tline: &Arc<Timeline>,
+        ctx: &RequestContext,
+    ) -> anyhow::Result<Vec<ResidentLayer>> {
+        let res = self
+            .finish_with_discard_fn(tline, ctx, |_| async { false })
+            .await?;
+        let mut output = Vec::new();
+        for r in res {
+            if let BatchWriterResult::Produced(layer) = r {
+                output.push(layer);
+            }
+        }
+        Ok(output)
+    }
+
    pub(crate) async fn finish_with_discard_fn<D, F>(
        self,
        tline: &Arc<Timeline>,
--- a/pageserver/src/tenant/timeline.rs
+++ b/pageserver/src/tenant/timeline.rs
@@ -70,6 +70,7 @@ use std::sync::{Arc, Mutex, OnceLock, RwLock, Weak};
 use std::time::{Duration, Instant, SystemTime};

 use crate::l0_flush::{self, L0FlushGlobalState};
+use crate::tenant::storage_layer::ImageLayerName;
 use crate::{
    aux_file::AuxFileSizeEstimator,
    page_service::TenantManagerTypes,
@@ -78,7 +79,7 @@ use crate::{
        layer_map::{LayerMap, SearchResult},
        metadata::TimelineMetadata,
        storage_layer::{
-            inmemory_layer::IndexEntry, IoConcurrency, PersistentLayerDesc,
+            inmemory_layer::IndexEntry, BatchLayerWriter, IoConcurrency, PersistentLayerDesc,
            ValueReconstructSituation,
        },
    },
@@ -933,7 +934,7 @@ pub(crate) enum ShutdownMode {
 }

 struct ImageLayerCreationOutcome {
-    image: Option<ResidentLayer>,
+    unfinished_image_layer: Option<ImageLayerWriter>,
    next_start_key: Key,
 }

@@ -2180,6 +2181,14 @@ impl Timeline {
            .unwrap_or(self.conf.default_tenant_conf.compaction_threshold)
    }

+    fn get_compaction_upper_limit(&self) -> usize {
+        let tenant_conf = self.tenant_conf.load();
+        tenant_conf
+            .tenant_conf
+            .compaction_upper_limit
+            .unwrap_or(self.conf.default_tenant_conf.compaction_upper_limit)
+    }
+
    fn get_l0_flush_delay_threshold(&self) -> Option<usize> {
        // Disable L0 flushes by default. This and compaction needs further tuning.
        const DEFAULT_L0_FLUSH_DELAY_FACTOR: usize = 0; // TODO: default to e.g. 3
@@ -3459,6 +3468,13 @@ impl Timeline {
        let mut completed_keyspace = KeySpace::default();
        let mut image_covered_keyspace = KeySpaceRandomAccum::new();

+        // Prevent GC from progressing while visiting the current timeline.
+        // If we are GC-ing because a new image layer was added while traversing
+        // the timeline, then it will remove layers that are required for fulfilling
+        // the current get request (read-path cannot "look back" and notice the new
+        // image layer).
+        let _gc_cutoff_holder = timeline.get_latest_gc_cutoff_lsn();
+
        loop {
            if cancel.is_cancelled() {
                return Err(GetVectoredError::Cancelled);
@@ -4405,11 +4421,15 @@ impl Timeline {
        if wrote_keys {
            // Normal path: we have written some data into the new image layer for this
            // partition, so flush it to disk.
-            let (desc, path) = image_layer_writer.finish(ctx).await?;
-            let image_layer = Layer::finish_creating(self.conf, self, desc, &path)?;
-            info!("created image layer for rel {}", image_layer.local_path());
+            info!(
+                "produced image layer for rel {}",
+                ImageLayerName {
+                    key_range: img_range.clone(),
+                    lsn
+                },
+            );
            Ok(ImageLayerCreationOutcome {
-                image: Some(image_layer),
+                unfinished_image_layer: Some(image_layer_writer),
                next_start_key: img_range.end,
            })
        } else {
@@ -4419,7 +4439,7 @@ impl Timeline {
            // layer we write will cover the key range that we just scanned.
            tracing::debug!("no data in range {}-{}", img_range.start, img_range.end);
            Ok(ImageLayerCreationOutcome {
-                image: None,
+                unfinished_image_layer: None,
                next_start_key: start,
            })
        }
@@ -4468,7 +4488,7 @@ impl Timeline {

        if !trigger_generation && mode == ImageLayerCreationMode::Try {
            return Ok(ImageLayerCreationOutcome {
-                image: None,
+                unfinished_image_layer: None,
                next_start_key: img_range.end,
            });
        }
@@ -4494,14 +4514,15 @@ impl Timeline {
        if wrote_any_image {
            // Normal path: we have written some data into the new image layer for this
            // partition, so flush it to disk.
-            let (desc, path) = image_layer_writer.finish(ctx).await?;
-            let image_layer = Layer::finish_creating(self.conf, self, desc, &path)?;
            info!(
                "created image layer for metadata {}",
-                image_layer.local_path()
+                ImageLayerName {
+                    key_range: img_range.clone(),
+                    lsn
+                }
            );
            Ok(ImageLayerCreationOutcome {
-                image: Some(image_layer),
+                unfinished_image_layer: Some(image_layer_writer),
                next_start_key: img_range.end,
            })
        } else {
@@ -4511,7 +4532,7 @@ impl Timeline {
            // layer we write will cover the key range that we just scanned.
            tracing::debug!("no data in range {}-{}", img_range.start, img_range.end);
            Ok(ImageLayerCreationOutcome {
-                image: None,
+                unfinished_image_layer: None,
                next_start_key: start,
            })
        }
@@ -4578,7 +4599,6 @@ impl Timeline {
        ctx: &RequestContext,
    ) -> Result<Vec<ResidentLayer>, CreateImageLayersError> {
        let timer = self.metrics.create_images_time_histo.start_timer();
-        let mut image_layers = Vec::new();

        // We need to avoid holes between generated image layers.
        // Otherwise LayerMap::image_layer_exists will return false if key range of some layer is covered by more than one
@@ -4593,6 +4613,8 @@ impl Timeline {

        let check_for_image_layers = self.should_check_if_image_layers_required(lsn);

+        let mut batch_image_writer = BatchLayerWriter::new(self.conf).await?;
+
        for partition in partitioning.parts.iter() {
            if self.cancel.is_cancelled() {
                return Err(CreateImageLayersError::Cancelled);
@@ -4665,45 +4687,45 @@ impl Timeline {
                    .map_err(|_| CreateImageLayersError::Cancelled)?,
            );

-            if !compact_metadata {
-                let ImageLayerCreationOutcome {
-                    image,
-                    next_start_key,
-                } = self
-                    .create_image_layer_for_rel_blocks(
-                        partition,
-                        image_layer_writer,
-                        lsn,
-                        ctx,
-                        img_range,
-                        start,
-                        io_concurrency,
-                    )
-                    .await?;
-
-                start = next_start_key;
-                image_layers.extend(image);
+            let ImageLayerCreationOutcome {
+                unfinished_image_layer,
+                next_start_key,
+            } = if !compact_metadata {
+                self.create_image_layer_for_rel_blocks(
+                    partition,
+                    image_layer_writer,
+                    lsn,
+                    ctx,
+                    img_range.clone(),
+                    start,
+                    io_concurrency,
+                )
+                .await?
            } else {
-                let ImageLayerCreationOutcome {
-                    image,
-                    next_start_key,
-                } = self
-                    .create_image_layer_for_metadata_keys(
-                        partition,
-                        image_layer_writer,
-                        lsn,
-                        ctx,
-                        img_range,
-                        mode,
-                        start,
-                        io_concurrency,
-                    )
-                    .await?;
-                start = next_start_key;
-                image_layers.extend(image);
+                self.create_image_layer_for_metadata_keys(
+                    partition,
+                    image_layer_writer,
+                    lsn,
+                    ctx,
+                    img_range.clone(),
+                    mode,
+                    start,
+                    io_concurrency,
+                )
+                .await?
+            };
+            start = next_start_key;
+            if let Some(unfinished_image_layer) = unfinished_image_layer {
+                batch_image_writer.add_unfinished_image_writer(
+                    unfinished_image_layer,
+                    img_range,
+                    lsn,
+                );
            }
        }

+        let image_layers = batch_image_writer.finish(self, ctx).await?;
+
        let mut guard = self.layers.write().await;

        // FIXME: we could add the images to be uploaded *before* returning from here, but right
--- a/pageserver/src/tenant/timeline/compaction.rs
+++ b/pageserver/src/tenant/timeline/compaction.rs
@@ -47,9 +47,7 @@ use crate::tenant::timeline::{ImageLayerCreationOutcome, IoConcurrency};
 use crate::tenant::timeline::{Layer, ResidentLayer};
 use crate::tenant::{gc_block, DeltaLayer, MaybeOffloaded};
 use crate::virtual_file::{MaybeFatalIo, VirtualFile};
-use pageserver_api::config::tenant_conf_defaults::{
-    DEFAULT_CHECKPOINT_DISTANCE, DEFAULT_COMPACTION_THRESHOLD,
-};
+use pageserver_api::config::tenant_conf_defaults::DEFAULT_CHECKPOINT_DISTANCE;

 use pageserver_api::key::Key;
 use pageserver_api::keyspace::KeySpace;
@@ -1114,17 +1112,10 @@ impl Timeline {
        // Accumulate the size of layers in `deltas_to_compact`
        let mut deltas_to_compact_bytes = 0;

-        // Under normal circumstances, we will accumulate up to compaction_interval L0s of size
+        // Under normal circumstances, we will accumulate up to compaction_upper_limit L0s of size
        // checkpoint_distance each.  To avoid edge cases using extra system resources, bound our
        // work in this function to only operate on this much delta data at once.
-        //
-        // Take the max of the configured value & the default, so that tests that configure tiny values
-        // can still use a sensible amount of memory, but if a deployed system configures bigger values we
-        // still let them compact a full stack of L0s in one go.
-        let delta_size_limit = std::cmp::max(
-            self.get_compaction_threshold(),
-            DEFAULT_COMPACTION_THRESHOLD,
-        ) as u64
+        let delta_size_limit = self.get_compaction_upper_limit() as u64
            * std::cmp::max(self.get_checkpoint_distance(), DEFAULT_CHECKPOINT_DISTANCE);

        let mut fully_compacted = true;
@@ -3197,7 +3188,7 @@ impl TimelineAdaptor {
        // TODO set proper (stateful) start. The create_image_layer_for_rel_blocks function mostly
        let start = Key::MIN;
        let ImageLayerCreationOutcome {
-            image,
+            unfinished_image_layer,
            next_start_key: _,
        } = self
            .timeline
@@ -3212,7 +3203,10 @@ impl TimelineAdaptor {
            )
            .await?;

-        if let Some(image_layer) = image {
+        if let Some(image_layer_writer) = unfinished_image_layer {
+            let (desc, path) = image_layer_writer.finish(ctx).await?;
+            let image_layer =
+                Layer::finish_creating(self.timeline.conf, &self.timeline, desc, &path)?;
            self.new_images.push(image_layer);
        }

--- a/pgxn/neon/file_cache.c
+++ b/pgxn/neon/file_cache.c
@@ -480,7 +480,7 @@ lfc_cache_contains(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno)
 	if (LFC_ENABLED())
 	{
 		entry = hash_search_with_hash_value(lfc_hash, &tag, hash, HASH_FIND, NULL);
-		found = entry != NULL && (entry->bitmap[chunk_offs >> 5] & (1 << (chunk_offs & 31))) != 0;
+		found = entry != NULL && (entry->bitmap[chunk_offs >> 5] & ((uint32)1 << (chunk_offs & 31))) != 0;
 	}
 	LWLockRelease(lfc_lock);
 	return found;
@@ -527,7 +527,7 @@ lfc_cache_containsv(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
 				for (; chunk_offs < BLOCKS_PER_CHUNK && i < nblocks; chunk_offs++, i++)
 				{
 					if ((entry->bitmap[chunk_offs >> 5] & 
-						(1 << (chunk_offs & 31))) != 0)
+						((uint32)1 << (chunk_offs & 31))) != 0)
 					{
 						BITMAP_SET(bitmap, i);
 						found++;
@@ -620,7 +620,7 @@ lfc_evict(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno)
 	}

 	/* remove the page from the cache */
-	entry->bitmap[chunk_offs >> 5] &= ~(1 << (chunk_offs & (32 - 1)));
+	entry->bitmap[chunk_offs >> 5] &= ~((uint32)1 << (chunk_offs & (32 - 1)));

 	if (entry->access_count == 0)
 	{
@@ -774,7 +774,7 @@ lfc_readv_select(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
 			 * If the page is valid, we consider it "read".
 			 * All other pages will be fetched separately by the next cache
 			 */
-			if (entry->bitmap[(chunk_offs + i) / 32] & (1 << ((chunk_offs + i) % 32)))
+			if (entry->bitmap[(chunk_offs + i) / 32] & ((uint32)1 << ((chunk_offs + i) % 32)))
 			{
 				BITMAP_SET(mask, buf_offset + i);
 				iteration_hits++;
@@ -1034,7 +1034,7 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
 				{
 					lfc_ctl->used_pages += 1 - ((entry->bitmap[(chunk_offs + i) >> 5] >> ((chunk_offs + i) & 31)) & 1);
 					entry->bitmap[(chunk_offs + i) >> 5] |=
-						(1 << ((chunk_offs + i) & 31));
+						((uint32)1 << ((chunk_offs + i) & 31));
 				}
 			}

@@ -1282,7 +1282,7 @@ local_cache_pages(PG_FUNCTION_ARGS)
 			{
 				for (int i = 0; i < BLOCKS_PER_CHUNK; i++)
 				{
-					if (entry->bitmap[i >> 5] & (1 << (i & 31)))
+					if (entry->bitmap[i >> 5] & ((uint32)1 << (i & 31)))
 					{
 						fctx->record[n].pageoffs = entry->offset * BLOCKS_PER_CHUNK + i;
 						fctx->record[n].relfilenode = NInfoGetRelNumber(BufTagGetNRelFileInfo(entry->key));
--- a/pgxn/neon/walproposer.c
+++ b/pgxn/neon/walproposer.c
@@ -1024,7 +1024,8 @@ DetermineEpochStartLsn(WalProposer *wp)
 	dth = &wp->safekeeper[wp->donor].voteResponse.termHistory;
 	wp->propTermHistory.n_entries = dth->n_entries + 1;
 	wp->propTermHistory.entries = palloc(sizeof(TermSwitchEntry) * wp->propTermHistory.n_entries);
-	memcpy(wp->propTermHistory.entries, dth->entries, sizeof(TermSwitchEntry) * dth->n_entries);
+	if (dth->n_entries > 0)
+		memcpy(wp->propTermHistory.entries, dth->entries, sizeof(TermSwitchEntry) * dth->n_entries);
 	wp->propTermHistory.entries[wp->propTermHistory.n_entries - 1].term = wp->propTerm;
 	wp->propTermHistory.entries[wp->propTermHistory.n_entries - 1].lsn = wp->propEpochStartLsn;

--- a/proxy/src/auth/backend/mod.rs
+++ b/proxy/src/auth/backend/mod.rs
@@ -12,6 +12,7 @@ pub(crate) use console_redirect::ConsoleRedirectError;
 use ipnet::{Ipv4Net, Ipv6Net};
 use local::LocalBackend;
 use postgres_client::config::AuthKeys;
+use serde::{Deserialize, Serialize};
 use tokio::io::{AsyncRead, AsyncWrite};
 use tracing::{debug, info, warn};

@@ -133,7 +134,7 @@ pub(crate) struct ComputeUserInfoNoEndpoint {
    pub(crate) options: NeonOptions,
 }

-#[derive(Debug, Clone, Default)]
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
 pub(crate) struct ComputeUserInfo {
    pub(crate) endpoint: EndpointId,
    pub(crate) user: RoleName,
--- a/proxy/src/bin/local_proxy.rs
+++ b/proxy/src/bin/local_proxy.rs
@@ -7,12 +7,11 @@ use std::time::Duration;
 use anyhow::{bail, ensure, Context};
 use camino::{Utf8Path, Utf8PathBuf};
 use compute_api::spec::LocalProxySpec;
-use dashmap::DashMap;
 use futures::future::Either;
 use proxy::auth::backend::jwt::JwkCache;
 use proxy::auth::backend::local::{LocalBackend, JWKS_ROLE_MAP};
 use proxy::auth::{self};
-use proxy::cancellation::CancellationHandlerMain;
+use proxy::cancellation::CancellationHandler;
 use proxy::config::{
    self, AuthenticationConfig, ComputeConfig, HttpConfig, ProxyConfig, RetryConfig,
 };
@@ -211,12 +210,7 @@ async fn main() -> anyhow::Result<()> {
        auth_backend,
        http_listener,
        shutdown.clone(),
-        Arc::new(CancellationHandlerMain::new(
-            &config.connect_to_compute,
-            Arc::new(DashMap::new()),
-            None,
-            proxy::metrics::CancellationSource::Local,
-        )),
+        Arc::new(CancellationHandler::new(&config.connect_to_compute, None)),
        endpoint_rate_limiter,
    );

--- a/proxy/src/bin/proxy.rs
+++ b/proxy/src/bin/proxy.rs
@@ -7,7 +7,7 @@ use anyhow::bail;
 use futures::future::Either;
 use proxy::auth::backend::jwt::JwkCache;
 use proxy::auth::backend::{AuthRateLimiter, ConsoleRedirectBackend, MaybeOwned};
-use proxy::cancellation::{CancelMap, CancellationHandler};
+use proxy::cancellation::{handle_cancel_messages, CancellationHandler};
 use proxy::config::{
    self, remote_storage_from_toml, AuthenticationConfig, CacheOptions, ComputeConfig, HttpConfig,
    ProjectInfoCacheOptions, ProxyConfig, ProxyProtocolV2,
@@ -18,8 +18,8 @@ use proxy::metrics::Metrics;
 use proxy::rate_limiter::{
    EndpointRateLimiter, LeakyBucketConfig, RateBucketInfo, WakeComputeRateLimiter,
 };
-use proxy::redis::cancellation_publisher::RedisPublisherClient;
 use proxy::redis::connection_with_credentials_provider::ConnectionWithCredentialsProvider;
+use proxy::redis::kv_ops::RedisKVClient;
 use proxy::redis::{elasticache, notifications};
 use proxy::scram::threadpool::ThreadPool;
 use proxy::serverless::cancel_set::CancelSet;
@@ -28,7 +28,6 @@ use proxy::tls::client_config::compute_client_config_with_root_certs;
 use proxy::{auth, control_plane, http, serverless, usage_metrics};
 use remote_storage::RemoteStorageConfig;
 use tokio::net::TcpListener;
-use tokio::sync::Mutex;
 use tokio::task::JoinSet;
 use tokio_util::sync::CancellationToken;
 use tracing::{info, warn, Instrument};
@@ -158,8 +157,11 @@ struct ProxyCliArgs {
    #[clap(long, default_value_t = 64)]
    auth_rate_limit_ip_subnet: u8,
    /// Redis rate limiter max number of requests per second.
-    #[clap(long, default_values_t = RateBucketInfo::DEFAULT_SET)]
+    #[clap(long, default_values_t = RateBucketInfo::DEFAULT_REDIS_SET)]
    redis_rps_limit: Vec<RateBucketInfo>,
+    /// Cancellation channel size (max queue size for redis kv client)
+    #[clap(long, default_value = "1024")]
+    cancellation_ch_size: usize,
    /// cache for `allowed_ips` (use `size=0` to disable)
    #[clap(long, default_value = config::CacheOptions::CACHE_DEFAULT_OPTIONS)]
    allowed_ips_cache: String,
@@ -382,27 +384,19 @@ async fn main() -> anyhow::Result<()> {

    let cancellation_token = CancellationToken::new();

-    let cancel_map = CancelMap::default();
-
    let redis_rps_limit = Vec::leak(args.redis_rps_limit.clone());
    RateBucketInfo::validate(redis_rps_limit)?;

-    let redis_publisher = match &regional_redis_client {
-        Some(redis_publisher) => Some(Arc::new(Mutex::new(RedisPublisherClient::new(
-            redis_publisher.clone(),
-            args.region.clone(),
-            redis_rps_limit,
-        )?))),
-        None => None,
-    };
+    let redis_kv_client = regional_redis_client
+        .as_ref()
+        .map(|redis_publisher| RedisKVClient::new(redis_publisher.clone(), redis_rps_limit));

-    let cancellation_handler = Arc::new(CancellationHandler::<
-        Option<Arc<Mutex<RedisPublisherClient>>>,
-    >::new(
+    // channel size should be higher than redis client limit to avoid blocking
+    let cancel_ch_size = args.cancellation_ch_size;
+    let (tx_cancel, rx_cancel) = tokio::sync::mpsc::channel(cancel_ch_size);
+    let cancellation_handler = Arc::new(CancellationHandler::new(
        &config.connect_to_compute,
-        cancel_map.clone(),
-        redis_publisher,
-        proxy::metrics::CancellationSource::FromClient,
+        Some(tx_cancel),
    ));

    // bit of a hack - find the min rps and max rps supported and turn it into
@@ -495,25 +489,29 @@ async fn main() -> anyhow::Result<()> {
                    let cache = api.caches.project_info.clone();
                    if let Some(client) = client1 {
                        maintenance_tasks.spawn(notifications::task_main(
-                            config,
                            client,
                            cache.clone(),
-                            cancel_map.clone(),
                            args.region.clone(),
                        ));
                    }
                    if let Some(client) = client2 {
                        maintenance_tasks.spawn(notifications::task_main(
-                            config,
                            client,
                            cache.clone(),
-                            cancel_map.clone(),
                            args.region.clone(),
                        ));
                    }
                    maintenance_tasks.spawn(async move { cache.clone().gc_worker().await });
                }
            }
+
+            if let Some(mut redis_kv_client) = redis_kv_client {
+                maintenance_tasks.spawn(async move {
+                    redis_kv_client.try_connect().await?;
+                    handle_cancel_messages(&mut redis_kv_client, rx_cancel).await
+                });
+            }
+
            if let Some(regional_redis_client) = regional_redis_client {
                let cache = api.caches.endpoints_cache.clone();
                let con = regional_redis_client;
--- a/proxy/src/cancellation.rs
+++ b/proxy/src/cancellation.rs
@@ -1,48 +1,124 @@
 use std::net::{IpAddr, SocketAddr};
 use std::sync::Arc;

-use dashmap::DashMap;
 use ipnet::{IpNet, Ipv4Net, Ipv6Net};
 use postgres_client::tls::MakeTlsConnect;
 use postgres_client::CancelToken;
 use pq_proto::CancelKeyData;
+use serde::{Deserialize, Serialize};
 use thiserror::Error;
 use tokio::net::TcpStream;
-use tokio::sync::Mutex;
+use tokio::sync::mpsc;
 use tracing::{debug, info};
-use uuid::Uuid;

 use crate::auth::backend::{BackendIpAllowlist, ComputeUserInfo};
-use crate::auth::{check_peer_addr_is_in_list, AuthError, IpPattern};
+use crate::auth::{check_peer_addr_is_in_list, AuthError};
 use crate::config::ComputeConfig;
 use crate::context::RequestContext;
 use crate::error::ReportableError;
 use crate::ext::LockExt;
-use crate::metrics::{CancellationRequest, CancellationSource, Metrics};
+use crate::metrics::CancelChannelSizeGuard;
+use crate::metrics::{CancellationRequest, Metrics, RedisMsgKind};
 use crate::rate_limiter::LeakyBucketRateLimiter;
-use crate::redis::cancellation_publisher::{
-    CancellationPublisher, CancellationPublisherMut, RedisPublisherClient,
-};
+use crate::redis::keys::KeyPrefix;
+use crate::redis::kv_ops::RedisKVClient;
 use crate::tls::postgres_rustls::MakeRustlsConnect;
-
-pub type CancelMap = Arc<DashMap<CancelKeyData, Option<CancelClosure>>>;
-pub type CancellationHandlerMain = CancellationHandler<Option<Arc<Mutex<RedisPublisherClient>>>>;
-pub(crate) type CancellationHandlerMainInternal = Option<Arc<Mutex<RedisPublisherClient>>>;
+use std::convert::Infallible;
+use tokio::sync::oneshot;

 type IpSubnetKey = IpNet;

+const CANCEL_KEY_TTL: i64 = 1_209_600; // 2 weeks cancellation key expire time
+const REDIS_SEND_TIMEOUT: std::time::Duration = std::time::Duration::from_millis(10);
+
+// Message types for sending through mpsc channel
+pub enum CancelKeyOp {
+    StoreCancelKey {
+        key: String,
+        field: String,
+        value: String,
+        resp_tx: Option<oneshot::Sender<anyhow::Result<()>>>,
+        _guard: CancelChannelSizeGuard<'static>,
+        expire: i64, // TTL for key
+    },
+    GetCancelData {
+        key: String,
+        resp_tx: oneshot::Sender<anyhow::Result<Vec<(String, String)>>>,
+        _guard: CancelChannelSizeGuard<'static>,
+    },
+    RemoveCancelKey {
+        key: String,
+        field: String,
+        resp_tx: Option<oneshot::Sender<anyhow::Result<()>>>,
+        _guard: CancelChannelSizeGuard<'static>,
+    },
+}
+
+// Running as a separate task to accept messages through the rx channel
+// In case of problems with RTT: switch to recv_many() + redis pipeline
+pub async fn handle_cancel_messages(
+    client: &mut RedisKVClient,
+    mut rx: mpsc::Receiver<CancelKeyOp>,
+) -> anyhow::Result<Infallible> {
+    loop {
+        if let Some(msg) = rx.recv().await {
+            match msg {
+                CancelKeyOp::StoreCancelKey {
+                    key,
+                    field,
+                    value,
+                    resp_tx,
+                    _guard,
+                    expire: _,
+                } => {
+                    if let Some(resp_tx) = resp_tx {
+                        resp_tx
+                            .send(client.hset(key, field, value).await)
+                            .inspect_err(|e| {
+                                tracing::debug!("failed to send StoreCancelKey response: {:?}", e);
+                            })
+                            .ok();
+                    } else {
+                        drop(client.hset(key, field, value).await);
+                    }
+                }
+                CancelKeyOp::GetCancelData {
+                    key,
+                    resp_tx,
+                    _guard,
+                } => {
+                    drop(resp_tx.send(client.hget_all(key).await));
+                }
+                CancelKeyOp::RemoveCancelKey {
+                    key,
+                    field,
+                    resp_tx,
+                    _guard,
+                } => {
+                    if let Some(resp_tx) = resp_tx {
+                        resp_tx
+                            .send(client.hdel(key, field).await)
+                            .inspect_err(|e| {
+                                tracing::debug!("failed to send StoreCancelKey response: {:?}", e);
+                            })
+                            .ok();
+                    } else {
+                        drop(client.hdel(key, field).await);
+                    }
+                }
+            }
+        }
+    }
+}
+
 /// Enables serving `CancelRequest`s.
 ///
 /// If `CancellationPublisher` is available, cancel request will be used to publish the cancellation key to other proxy instances.
-pub struct CancellationHandler<P> {
+pub struct CancellationHandler {
    compute_config: &'static ComputeConfig,
-    map: CancelMap,
-    client: P,
-    /// This field used for the monitoring purposes.
-    /// Represents the source of the cancellation request.
-    from: CancellationSource,
    // rate limiter of cancellation requests
    limiter: Arc<std::sync::Mutex<LeakyBucketRateLimiter<IpSubnetKey>>>,
+    tx: Option<mpsc::Sender<CancelKeyOp>>, // send messages to the redis KV client task
 }

 #[derive(Debug, Error)]
@@ -61,6 +137,12 @@ pub(crate) enum CancelError {

    #[error("Authentication backend error")]
    AuthError(#[from] AuthError),
+
+    #[error("key not found")]
+    NotFound,
+
+    #[error("proxy service error")]
+    InternalError,
 }

 impl ReportableError for CancelError {
@@ -73,274 +155,191 @@ impl ReportableError for CancelError {
            CancelError::Postgres(_) => crate::error::ErrorKind::Compute,
            CancelError::RateLimit => crate::error::ErrorKind::RateLimit,
            CancelError::IpNotAllowed => crate::error::ErrorKind::User,
+            CancelError::NotFound => crate::error::ErrorKind::User,
            CancelError::AuthError(_) => crate::error::ErrorKind::ControlPlane,
+            CancelError::InternalError => crate::error::ErrorKind::Service,
        }
    }
 }

-impl<P: CancellationPublisher> CancellationHandler<P> {
-    /// Run async action within an ephemeral session identified by [`CancelKeyData`].
-    pub(crate) fn get_session(self: Arc<Self>) -> Session<P> {
+impl CancellationHandler {
+    pub fn new(
+        compute_config: &'static ComputeConfig,
+        tx: Option<mpsc::Sender<CancelKeyOp>>,
+    ) -> Self {
+        Self {
+            compute_config,
+            tx,
+            limiter: Arc::new(std::sync::Mutex::new(
+                LeakyBucketRateLimiter::<IpSubnetKey>::new_with_shards(
+                    LeakyBucketRateLimiter::<IpSubnetKey>::DEFAULT,
+                    64,
+                ),
+            )),
+        }
+    }
+
+    pub(crate) fn get_key(self: &Arc<Self>) -> Session {
        // we intentionally generate a random "backend pid" and "secret key" here.
        // we use the corresponding u64 as an identifier for the
        // actual endpoint+pid+secret for postgres/pgbouncer.
        //
        // if we forwarded the backend_pid from postgres to the client, there would be a lot
        // of overlap between our computes as most pids are small (~100).
-        let key = loop {
-            let key = rand::random();

-            // Random key collisions are unlikely to happen here, but they're still possible,
-            // which is why we have to take care not to rewrite an existing key.
-            match self.map.entry(key) {
-                dashmap::mapref::entry::Entry::Occupied(_) => continue,
-                dashmap::mapref::entry::Entry::Vacant(e) => {
-                    e.insert(None);
-                }
-            }
-            break key;
-        };
+        let key: CancelKeyData = rand::random();
+
+        let prefix_key: KeyPrefix = KeyPrefix::Cancel(key);
+        let redis_key = prefix_key.build_redis_key();

        debug!("registered new query cancellation key {key}");
        Session {
            key,
-            cancellation_handler: self,
+            redis_key,
+            cancellation_handler: Arc::clone(self),
        }
    }

-    /// Cancelling only in notification, will be removed
-    pub(crate) async fn cancel_session(
+    async fn get_cancel_key(
        &self,
        key: CancelKeyData,
-        session_id: Uuid,
-        peer_addr: IpAddr,
-        check_allowed: bool,
-    ) -> Result<(), CancelError> {
-        // TODO: check for unspecified address is only for backward compatibility, should be removed
-        if !peer_addr.is_unspecified() {
-            let subnet_key = match peer_addr {
-                IpAddr::V4(ip) => IpNet::V4(Ipv4Net::new_assert(ip, 24).trunc()), // use defaut mask here
-                IpAddr::V6(ip) => IpNet::V6(Ipv6Net::new_assert(ip, 64).trunc()),
-            };
-            if !self.limiter.lock_propagate_poison().check(subnet_key, 1) {
-                // log only the subnet part of the IP address to know which subnet is rate limited
-                tracing::warn!("Rate limit exceeded. Skipping cancellation message, {subnet_key}");
-                Metrics::get()
-                    .proxy
-                    .cancellation_requests_total
-                    .inc(CancellationRequest {
-                        source: self.from,
-                        kind: crate::metrics::CancellationOutcome::RateLimitExceeded,
-                    });
-                return Err(CancelError::RateLimit);
-            }
-        }
+    ) -> Result<Option<CancelClosure>, CancelError> {
+        let prefix_key: KeyPrefix = KeyPrefix::Cancel(key);
+        let redis_key = prefix_key.build_redis_key();

-        // NB: we should immediately release the lock after cloning the token.
-        let cancel_state = self.map.get(&key).and_then(|x| x.clone());
-        let Some(cancel_closure) = cancel_state else {
-            tracing::warn!("query cancellation key not found: {key}");
-            Metrics::get()
+        let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
+        let op = CancelKeyOp::GetCancelData {
+            key: redis_key,
+            resp_tx,
+            _guard: Metrics::get()
                .proxy
-                .cancellation_requests_total
-                .inc(CancellationRequest {
-                    source: self.from,
-                    kind: crate::metrics::CancellationOutcome::NotFound,
-                });
-
-            if session_id == Uuid::nil() {
-                // was already published, do not publish it again
-                return Ok(());
-            }
-
-            match self.client.try_publish(key, session_id, peer_addr).await {
-                Ok(()) => {} // do nothing
-                Err(e) => {
-                    // log it here since cancel_session could be spawned in a task
-                    tracing::error!("failed to publish cancellation key: {key}, error: {e}");
-                    return Err(CancelError::IO(std::io::Error::new(
-                        std::io::ErrorKind::Other,
-                        e.to_string(),
-                    )));
-                }
-            }
-            return Ok(());
+                .cancel_channel_size
+                .guard(RedisMsgKind::HGetAll),
        };

-        if check_allowed
-            && !check_peer_addr_is_in_list(&peer_addr, cancel_closure.ip_allowlist.as_slice())
-        {
-            // log it here since cancel_session could be spawned in a task
-            tracing::warn!("IP is not allowed to cancel the query: {key}");
-            return Err(CancelError::IpNotAllowed);
-        }
+        let Some(tx) = &self.tx else {
+            tracing::warn!("cancellation handler is not available");
+            return Err(CancelError::InternalError);
+        };

-        Metrics::get()
-            .proxy
-            .cancellation_requests_total
-            .inc(CancellationRequest {
-                source: self.from,
-                kind: crate::metrics::CancellationOutcome::Found,
-            });
-        info!(
-            "cancelling query per user's request using key {key}, hostname {}, address: {}",
-            cancel_closure.hostname, cancel_closure.socket_addr
-        );
-        cancel_closure.try_cancel_query(self.compute_config).await
+        tx.send_timeout(op, REDIS_SEND_TIMEOUT)
+            .await
+            .map_err(|e| {
+                tracing::warn!("failed to send GetCancelData for {key}: {e}");
+            })
+            .map_err(|()| CancelError::InternalError)?;
+
+        let result = resp_rx.await.map_err(|e| {
+            tracing::warn!("failed to receive GetCancelData response: {e}");
+            CancelError::InternalError
+        })?;
+
+        let cancel_state_str: Option<String> = match result {
+            Ok(mut state) => {
+                if state.len() == 1 {
+                    Some(state.remove(0).1)
+                } else {
+                    tracing::warn!("unexpected number of entries in cancel state: {state:?}");
+                    return Err(CancelError::InternalError);
+                }
+            }
+            Err(e) => {
+                tracing::warn!("failed to receive cancel state from redis: {e}");
+                return Err(CancelError::InternalError);
+            }
+        };
+
+        let cancel_state: Option<CancelClosure> = match cancel_state_str {
+            Some(state) => {
+                let cancel_closure: CancelClosure = serde_json::from_str(&state).map_err(|e| {
+                    tracing::warn!("failed to deserialize cancel state: {e}");
+                    CancelError::InternalError
+                })?;
+                Some(cancel_closure)
+            }
+            None => None,
+        };
+        Ok(cancel_state)
    }
-
    /// Try to cancel a running query for the corresponding connection.
    /// If the cancellation key is not found, it will be published to Redis.
    /// check_allowed - if true, check if the IP is allowed to cancel the query.
    /// Will fetch IP allowlist internally.
    ///
    /// return Result primarily for tests
-    pub(crate) async fn cancel_session_auth<T: BackendIpAllowlist>(
+    pub(crate) async fn cancel_session<T: BackendIpAllowlist>(
        &self,
        key: CancelKeyData,
        ctx: RequestContext,
        check_allowed: bool,
        auth_backend: &T,
    ) -> Result<(), CancelError> {
-        // TODO: check for unspecified address is only for backward compatibility, should be removed
-        if !ctx.peer_addr().is_unspecified() {
-            let subnet_key = match ctx.peer_addr() {
-                IpAddr::V4(ip) => IpNet::V4(Ipv4Net::new_assert(ip, 24).trunc()), // use defaut mask here
-                IpAddr::V6(ip) => IpNet::V6(Ipv6Net::new_assert(ip, 64).trunc()),
-            };
-            if !self.limiter.lock_propagate_poison().check(subnet_key, 1) {
-                // log only the subnet part of the IP address to know which subnet is rate limited
-                tracing::warn!("Rate limit exceeded. Skipping cancellation message, {subnet_key}");
-                Metrics::get()
-                    .proxy
-                    .cancellation_requests_total
-                    .inc(CancellationRequest {
-                        source: self.from,
-                        kind: crate::metrics::CancellationOutcome::RateLimitExceeded,
-                    });
-                return Err(CancelError::RateLimit);
-            }
+        let subnet_key = match ctx.peer_addr() {
+            IpAddr::V4(ip) => IpNet::V4(Ipv4Net::new_assert(ip, 24).trunc()), // use defaut mask here
+            IpAddr::V6(ip) => IpNet::V6(Ipv6Net::new_assert(ip, 64).trunc()),
+        };
+        if !self.limiter.lock_propagate_poison().check(subnet_key, 1) {
+            // log only the subnet part of the IP address to know which subnet is rate limited
+            tracing::warn!("Rate limit exceeded. Skipping cancellation message, {subnet_key}");
+            Metrics::get()
+                .proxy
+                .cancellation_requests_total
+                .inc(CancellationRequest {
+                    kind: crate::metrics::CancellationOutcome::RateLimitExceeded,
+                });
+            return Err(CancelError::RateLimit);
        }

-        // NB: we should immediately release the lock after cloning the token.
-        let cancel_state = self.map.get(&key).and_then(|x| x.clone());
+        let cancel_state = self.get_cancel_key(key).await.map_err(|e| {
+            tracing::warn!("failed to receive RedisOp response: {e}");
+            CancelError::InternalError
+        })?;
+
        let Some(cancel_closure) = cancel_state else {
            tracing::warn!("query cancellation key not found: {key}");
            Metrics::get()
                .proxy
                .cancellation_requests_total
                .inc(CancellationRequest {
-                    source: self.from,
                    kind: crate::metrics::CancellationOutcome::NotFound,
                });
-
-            if ctx.session_id() == Uuid::nil() {
-                // was already published, do not publish it again
-                return Ok(());
-            }
-
-            match self
-                .client
-                .try_publish(key, ctx.session_id(), ctx.peer_addr())
-                .await
-            {
-                Ok(()) => {} // do nothing
-                Err(e) => {
-                    // log it here since cancel_session could be spawned in a task
-                    tracing::error!("failed to publish cancellation key: {key}, error: {e}");
-                    return Err(CancelError::IO(std::io::Error::new(
-                        std::io::ErrorKind::Other,
-                        e.to_string(),
-                    )));
-                }
-            }
-            return Ok(());
+            return Err(CancelError::NotFound);
        };

-        let ip_allowlist = auth_backend
-            .get_allowed_ips(&ctx, &cancel_closure.user_info)
-            .await
-            .map_err(CancelError::AuthError)?;
+        if check_allowed {
+            let ip_allowlist = auth_backend
+                .get_allowed_ips(&ctx, &cancel_closure.user_info)
+                .await
+                .map_err(CancelError::AuthError)?;

-        if check_allowed && !check_peer_addr_is_in_list(&ctx.peer_addr(), &ip_allowlist) {
-            // log it here since cancel_session could be spawned in a task
-            tracing::warn!("IP is not allowed to cancel the query: {key}");
-            return Err(CancelError::IpNotAllowed);
+            if !check_peer_addr_is_in_list(&ctx.peer_addr(), &ip_allowlist) {
+                // log it here since cancel_session could be spawned in a task
+                tracing::warn!(
+                    "IP is not allowed to cancel the query: {key}, address: {}",
+                    ctx.peer_addr()
+                );
+                return Err(CancelError::IpNotAllowed);
+            }
        }

        Metrics::get()
            .proxy
            .cancellation_requests_total
            .inc(CancellationRequest {
-                source: self.from,
                kind: crate::metrics::CancellationOutcome::Found,
            });
        info!("cancelling query per user's request using key {key}");
        cancel_closure.try_cancel_query(self.compute_config).await
    }
-
-    #[cfg(test)]
-    fn contains(&self, session: &Session<P>) -> bool {
-        self.map.contains_key(&session.key)
-    }
-
-    #[cfg(test)]
-    fn is_empty(&self) -> bool {
-        self.map.is_empty()
-    }
-}
-
-impl CancellationHandler<()> {
-    pub fn new(
-        compute_config: &'static ComputeConfig,
-        map: CancelMap,
-        from: CancellationSource,
-    ) -> Self {
-        Self {
-            compute_config,
-            map,
-            client: (),
-            from,
-            limiter: Arc::new(std::sync::Mutex::new(
-                LeakyBucketRateLimiter::<IpSubnetKey>::new_with_shards(
-                    LeakyBucketRateLimiter::<IpSubnetKey>::DEFAULT,
-                    64,
-                ),
-            )),
-        }
-    }
-}
-
-impl<P: CancellationPublisherMut> CancellationHandler<Option<Arc<Mutex<P>>>> {
-    pub fn new(
-        compute_config: &'static ComputeConfig,
-        map: CancelMap,
-        client: Option<Arc<Mutex<P>>>,
-        from: CancellationSource,
-    ) -> Self {
-        Self {
-            compute_config,
-            map,
-            client,
-            from,
-            limiter: Arc::new(std::sync::Mutex::new(
-                LeakyBucketRateLimiter::<IpSubnetKey>::new_with_shards(
-                    LeakyBucketRateLimiter::<IpSubnetKey>::DEFAULT,
-                    64,
-                ),
-            )),
-        }
-    }
 }

 /// This should've been a [`std::future::Future`], but
 /// it's impossible to name a type of an unboxed future
 /// (we'd need something like `#![feature(type_alias_impl_trait)]`).
-#[derive(Clone)]
+#[derive(Clone, Serialize, Deserialize)]
 pub struct CancelClosure {
    socket_addr: SocketAddr,
    cancel_token: CancelToken,
-    ip_allowlist: Vec<IpPattern>,
    hostname: String, // for pg_sni router
    user_info: ComputeUserInfo,
 }
@@ -349,14 +348,12 @@ impl CancelClosure {
    pub(crate) fn new(
        socket_addr: SocketAddr,
        cancel_token: CancelToken,
-        ip_allowlist: Vec<IpPattern>,
        hostname: String,
        user_info: ComputeUserInfo,
    ) -> Self {
        Self {
            socket_addr,
            cancel_token,
-            ip_allowlist,
            hostname,
            user_info,
        }
@@ -385,99 +382,75 @@ impl CancelClosure {
        debug!("query was cancelled");
        Ok(())
    }
-
-    /// Obsolete (will be removed after moving CancelMap to Redis), only for notifications
-    pub(crate) fn set_ip_allowlist(&mut self, ip_allowlist: Vec<IpPattern>) {
-        self.ip_allowlist = ip_allowlist;
-    }
 }

 /// Helper for registering query cancellation tokens.
-pub(crate) struct Session<P> {
+pub(crate) struct Session {
    /// The user-facing key identifying this session.
    key: CancelKeyData,
-    /// The [`CancelMap`] this session belongs to.
-    cancellation_handler: Arc<CancellationHandler<P>>,
+    redis_key: String,
+    cancellation_handler: Arc<CancellationHandler>,
 }

-impl<P> Session<P> {
-    /// Store the cancel token for the given session.
-    /// This enables query cancellation in `crate::proxy::prepare_client_connection`.
-    pub(crate) fn enable_query_cancellation(&self, cancel_closure: CancelClosure) -> CancelKeyData {
-        debug!("enabling query cancellation for this session");
-        self.cancellation_handler
-            .map
-            .insert(self.key, Some(cancel_closure));
-
-        self.key
+impl Session {
+    pub(crate) fn key(&self) -> &CancelKeyData {
+        &self.key
    }
-}

-impl<P> Drop for Session<P> {
-    fn drop(&mut self) {
-        self.cancellation_handler.map.remove(&self.key);
-        debug!("dropped query cancellation key {}", &self.key);
-    }
-}
-
-#[cfg(test)]
-#[expect(clippy::unwrap_used)]
-mod tests {
-    use std::time::Duration;
-
-    use super::*;
-    use crate::config::RetryConfig;
-    use crate::tls::client_config::compute_client_config_with_certs;
-
-    fn config() -> ComputeConfig {
-        let retry = RetryConfig {
-            base_delay: Duration::from_secs(1),
-            max_retries: 5,
-            backoff_factor: 2.0,
+    // Send the store key op to the cancellation handler
+    pub(crate) async fn write_cancel_key(
+        &self,
+        cancel_closure: CancelClosure,
+    ) -> Result<(), CancelError> {
+        let Some(tx) = &self.cancellation_handler.tx else {
+            tracing::warn!("cancellation handler is not available");
+            return Err(CancelError::InternalError);
        };

-        ComputeConfig {
-            retry,
-            tls: Arc::new(compute_client_config_with_certs(std::iter::empty())),
-            timeout: Duration::from_secs(2),
-        }
-    }
+        let closure_json = serde_json::to_string(&cancel_closure).map_err(|e| {
+            tracing::warn!("failed to serialize cancel closure: {e}");
+            CancelError::InternalError
+        })?;

-    #[tokio::test]
-    async fn check_session_drop() -> anyhow::Result<()> {
-        let cancellation_handler = Arc::new(CancellationHandler::<()>::new(
-            Box::leak(Box::new(config())),
-            CancelMap::default(),
-            CancellationSource::FromRedis,
-        ));
-
-        let session = cancellation_handler.clone().get_session();
-        assert!(cancellation_handler.contains(&session));
-        drop(session);
-        // Check that the session has been dropped.
-        assert!(cancellation_handler.is_empty());
+        let op = CancelKeyOp::StoreCancelKey {
+            key: self.redis_key.clone(),
+            field: "data".to_string(),
+            value: closure_json,
+            resp_tx: None,
+            _guard: Metrics::get()
+                .proxy
+                .cancel_channel_size
+                .guard(RedisMsgKind::HSet),
+            expire: CANCEL_KEY_TTL,
+        };

+        let _ = tx.send_timeout(op, REDIS_SEND_TIMEOUT).await.map_err(|e| {
+            let key = self.key;
+            tracing::warn!("failed to send StoreCancelKey for {key}: {e}");
+        });
        Ok(())
    }

-    #[tokio::test]
-    async fn cancel_session_noop_regression() {
-        let handler = CancellationHandler::<()>::new(
-            Box::leak(Box::new(config())),
-            CancelMap::default(),
-            CancellationSource::Local,
-        );
-        handler
-            .cancel_session(
-                CancelKeyData {
-                    backend_pid: 0,
-                    cancel_key: 0,
-                },
-                Uuid::new_v4(),
-                "127.0.0.1".parse().unwrap(),
-                true,
-            )
-            .await
-            .unwrap();
+    pub(crate) async fn remove_cancel_key(&self) -> Result<(), CancelError> {
+        let Some(tx) = &self.cancellation_handler.tx else {
+            tracing::warn!("cancellation handler is not available");
+            return Err(CancelError::InternalError);
+        };
+
+        let op = CancelKeyOp::RemoveCancelKey {
+            key: self.redis_key.clone(),
+            field: "data".to_string(),
+            resp_tx: None,
+            _guard: Metrics::get()
+                .proxy
+                .cancel_channel_size
+                .guard(RedisMsgKind::HSet),
+        };
+
+        let _ = tx.send_timeout(op, REDIS_SEND_TIMEOUT).await.map_err(|e| {
+            let key = self.key;
+            tracing::warn!("failed to send RemoveCancelKey for {key}: {e}");
+        });
+        Ok(())
    }
 }
--- a/proxy/src/compute.rs
+++ b/proxy/src/compute.rs
@@ -296,7 +296,6 @@ impl ConnCfg {
                process_id,
                secret_key,
            },
-            vec![], // TODO: deprecated, will be removed
            host.to_string(),
            user_info,
        );
--- a/proxy/src/console_redirect_proxy.rs
+++ b/proxy/src/console_redirect_proxy.rs
@@ -6,7 +6,7 @@ use tokio_util::sync::CancellationToken;
 use tracing::{debug, error, info, Instrument};

 use crate::auth::backend::ConsoleRedirectBackend;
-use crate::cancellation::{CancellationHandlerMain, CancellationHandlerMainInternal};
+use crate::cancellation::CancellationHandler;
 use crate::config::{ProxyConfig, ProxyProtocolV2};
 use crate::context::RequestContext;
 use crate::error::ReportableError;
@@ -24,7 +24,7 @@ pub async fn task_main(
    backend: &'static ConsoleRedirectBackend,
    listener: tokio::net::TcpListener,
    cancellation_token: CancellationToken,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
 ) -> anyhow::Result<()> {
    scopeguard::defer! {
        info!("proxy has shut down");
@@ -140,15 +140,16 @@ pub async fn task_main(
    Ok(())
 }

+#[allow(clippy::too_many_arguments)]
 pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
    config: &'static ProxyConfig,
    backend: &'static ConsoleRedirectBackend,
    ctx: &RequestContext,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    stream: S,
    conn_gauge: NumClientConnectionsGuard<'static>,
    cancellations: tokio_util::task::task_tracker::TaskTracker,
-) -> Result<Option<ProxyPassthrough<CancellationHandlerMainInternal, S>>, ClientRequestError> {
+) -> Result<Option<ProxyPassthrough<S>>, ClientRequestError> {
    debug!(
        protocol = %ctx.protocol(),
        "handling interactive connection from client"
@@ -171,13 +172,13 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
        HandshakeData::Cancel(cancel_key_data) => {
            // spawn a task to cancel the session, but don't wait for it
            cancellations.spawn({
-                let cancellation_handler_clone = Arc::clone(&cancellation_handler);
+                let cancellation_handler_clone  = Arc::clone(&cancellation_handler);
                let ctx = ctx.clone();
                let cancel_span = tracing::span!(parent: None, tracing::Level::INFO, "cancel_session", session_id = ?ctx.session_id());
                cancel_span.follows_from(tracing::Span::current());
                async move {
                    cancellation_handler_clone
-                        .cancel_session_auth(
+                        .cancel_session(
                            cancel_key_data,
                            ctx,
                            config.authentication_config.ip_allowlist_check_enabled,
@@ -195,7 +196,7 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(

    ctx.set_db_options(params.clone());

-    let (node_info, user_info, ip_allowlist) = match backend
+    let (node_info, user_info, _ip_allowlist) = match backend
        .authenticate(ctx, &config.authentication_config, &mut stream)
        .await
    {
@@ -220,10 +221,14 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
    .or_else(|e| stream.throw_error(e))
    .await?;

-    node.cancel_closure
-        .set_ip_allowlist(ip_allowlist.unwrap_or_default());
-    let session = cancellation_handler.get_session();
-    prepare_client_connection(&node, &session, &mut stream).await?;
+    let cancellation_handler_clone = Arc::clone(&cancellation_handler);
+    let session = cancellation_handler_clone.get_key();
+
+    session
+        .write_cancel_key(node.cancel_closure.clone())
+        .await?;
+
+    prepare_client_connection(&node, *session.key(), &mut stream).await?;

    // Before proxy passing, forward to compute whatever data is left in the
    // PqStream input buffer. Normally there is none, but our serverless npm
@@ -237,8 +242,8 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
        aux: node.aux.clone(),
        compute: node,
        session_id: ctx.session_id(),
+        cancel: session,
        _req: request_gauge,
        _conn: conn_gauge,
-        _cancel: session,
    }))
 }
--- a/proxy/src/metrics.rs
+++ b/proxy/src/metrics.rs
@@ -56,6 +56,8 @@ pub struct ProxyMetrics {
    pub connection_requests: CounterPairVec<NumConnectionRequestsGauge>,
    #[metric(flatten)]
    pub http_endpoint_pools: HttpEndpointPools,
+    #[metric(flatten)]
+    pub cancel_channel_size: CounterPairVec<CancelChannelSizeGauge>,

    /// Time it took for proxy to establish a connection to the compute endpoint.
    // largest bucket = 2^16 * 0.5ms = 32s
@@ -294,6 +296,16 @@ impl CounterPairAssoc for NumConnectionRequestsGauge {
 pub type NumConnectionRequestsGuard<'a> =
    metrics::MeasuredCounterPairGuard<'a, NumConnectionRequestsGauge>;

+pub struct CancelChannelSizeGauge;
+impl CounterPairAssoc for CancelChannelSizeGauge {
+    const INC_NAME: &'static MetricName = MetricName::from_str("opened_msgs_cancel_channel_total");
+    const DEC_NAME: &'static MetricName = MetricName::from_str("closed_msgs_cancel_channel_total");
+    const INC_HELP: &'static str = "Number of processing messages in the cancellation channel.";
+    const DEC_HELP: &'static str = "Number of closed messages in the cancellation channel.";
+    type LabelGroupSet = StaticLabelSet<RedisMsgKind>;
+}
+pub type CancelChannelSizeGuard<'a> = metrics::MeasuredCounterPairGuard<'a, CancelChannelSizeGauge>;
+
 #[derive(LabelGroup)]
 #[label(set = ComputeConnectionLatencySet)]
 pub struct ComputeConnectionLatencyGroup {
@@ -340,13 +352,6 @@ pub struct RedisErrors<'a> {
    pub channel: &'a str,
 }

-#[derive(FixedCardinalityLabel, Copy, Clone)]
-pub enum CancellationSource {
-    FromClient,
-    FromRedis,
-    Local,
-}
-
 #[derive(FixedCardinalityLabel, Copy, Clone)]
 pub enum CancellationOutcome {
    NotFound,
@@ -357,7 +362,6 @@ pub enum CancellationOutcome {
 #[derive(LabelGroup)]
 #[label(set = CancellationRequestSet)]
 pub struct CancellationRequest {
-    pub source: CancellationSource,
    pub kind: CancellationOutcome,
 }

@@ -369,6 +373,16 @@ pub enum Waiting {
    RetryTimeout,
 }

+#[derive(FixedCardinalityLabel, Copy, Clone)]
+#[label(singleton = "kind")]
+pub enum RedisMsgKind {
+    HSet,
+    HSetMultiple,
+    HGet,
+    HGetAll,
+    HDel,
+}
+
 #[derive(Default)]
 struct Accumulated {
    cplane: time::Duration,
--- a/proxy/src/proxy/mod.rs
+++ b/proxy/src/proxy/mod.rs
@@ -13,8 +13,9 @@ pub use copy_bidirectional::{copy_bidirectional_client_compute, ErrorSource};
 use futures::{FutureExt, TryFutureExt};
 use itertools::Itertools;
 use once_cell::sync::OnceCell;
-use pq_proto::{BeMessage as Be, StartupMessageParams};
+use pq_proto::{BeMessage as Be, CancelKeyData, StartupMessageParams};
 use regex::Regex;
+use serde::{Deserialize, Serialize};
 use smol_str::{format_smolstr, SmolStr};
 use thiserror::Error;
 use tokio::io::{AsyncRead, AsyncWrite, AsyncWriteExt};
@@ -23,7 +24,7 @@ use tracing::{debug, error, info, warn, Instrument};

 use self::connect_compute::{connect_to_compute, TcpMechanism};
 use self::passthrough::ProxyPassthrough;
-use crate::cancellation::{self, CancellationHandlerMain, CancellationHandlerMainInternal};
+use crate::cancellation::{self, CancellationHandler};
 use crate::config::{ProxyConfig, ProxyProtocolV2, TlsConfig};
 use crate::context::RequestContext;
 use crate::error::ReportableError;
@@ -57,7 +58,7 @@ pub async fn task_main(
    auth_backend: &'static auth::Backend<'static, ()>,
    listener: tokio::net::TcpListener,
    cancellation_token: CancellationToken,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    endpoint_rate_limiter: Arc<EndpointRateLimiter>,
 ) -> anyhow::Result<()> {
    scopeguard::defer! {
@@ -243,13 +244,13 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
    config: &'static ProxyConfig,
    auth_backend: &'static auth::Backend<'static, ()>,
    ctx: &RequestContext,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    stream: S,
    mode: ClientMode,
    endpoint_rate_limiter: Arc<EndpointRateLimiter>,
    conn_gauge: NumClientConnectionsGuard<'static>,
    cancellations: tokio_util::task::task_tracker::TaskTracker,
-) -> Result<Option<ProxyPassthrough<CancellationHandlerMainInternal, S>>, ClientRequestError> {
+) -> Result<Option<ProxyPassthrough<S>>, ClientRequestError> {
    debug!(
        protocol = %ctx.protocol(),
        "handling interactive connection from client"
@@ -278,7 +279,7 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
                cancel_span.follows_from(tracing::Span::current());
                async move {
                    cancellation_handler_clone
-                        .cancel_session_auth(
+                        .cancel_session(
                            cancel_key_data,
                            ctx,
                            config.authentication_config.ip_allowlist_check_enabled,
@@ -312,7 +313,7 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
    };

    let user = user_info.get_user().to_owned();
-    let (user_info, ip_allowlist) = match user_info
+    let (user_info, _ip_allowlist) = match user_info
        .authenticate(
            ctx,
            &mut stream,
@@ -356,10 +357,14 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
    .or_else(|e| stream.throw_error(e))
    .await?;

-    node.cancel_closure
-        .set_ip_allowlist(ip_allowlist.unwrap_or_default());
-    let session = cancellation_handler.get_session();
-    prepare_client_connection(&node, &session, &mut stream).await?;
+    let cancellation_handler_clone = Arc::clone(&cancellation_handler);
+    let session = cancellation_handler_clone.get_key();
+
+    session
+        .write_cancel_key(node.cancel_closure.clone())
+        .await?;
+
+    prepare_client_connection(&node, *session.key(), &mut stream).await?;

    // Before proxy passing, forward to compute whatever data is left in the
    // PqStream input buffer. Normally there is none, but our serverless npm
@@ -373,23 +378,19 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
        aux: node.aux.clone(),
        compute: node,
        session_id: ctx.session_id(),
+        cancel: session,
        _req: request_gauge,
        _conn: conn_gauge,
-        _cancel: session,
    }))
 }

 /// Finish client connection initialization: confirm auth success, send params, etc.
 #[tracing::instrument(skip_all)]
-pub(crate) async fn prepare_client_connection<P>(
+pub(crate) async fn prepare_client_connection(
    node: &compute::PostgresConnection,
-    session: &cancellation::Session<P>,
+    cancel_key_data: CancelKeyData,
    stream: &mut PqStream<impl AsyncRead + AsyncWrite + Unpin>,
 ) -> Result<(), std::io::Error> {
-    // Register compute's query cancellation token and produce a new, unique one.
-    // The new token (cancel_key_data) will be sent to the client.
-    let cancel_key_data = session.enable_query_cancellation(node.cancel_closure.clone());
-
    // Forward all deferred notices to the client.
    for notice in &node.delayed_notice {
        stream.write_message_noflush(&Be::Raw(b'N', notice.as_bytes()))?;
@@ -411,7 +412,7 @@ pub(crate) async fn prepare_client_connection<P>(
    Ok(())
 }

-#[derive(Debug, Clone, PartialEq, Eq, Default)]
+#[derive(Debug, Clone, PartialEq, Eq, Default, Serialize, Deserialize)]
 pub(crate) struct NeonOptions(Vec<(SmolStr, SmolStr)>);

 impl NeonOptions {
--- a/proxy/src/proxy/passthrough.rs
+++ b/proxy/src/proxy/passthrough.rs
@@ -56,18 +56,18 @@ pub(crate) async fn proxy_pass(
    Ok(())
 }

-pub(crate) struct ProxyPassthrough<P, S> {
+pub(crate) struct ProxyPassthrough<S> {
    pub(crate) client: Stream<S>,
    pub(crate) compute: PostgresConnection,
    pub(crate) aux: MetricsAuxInfo,
    pub(crate) session_id: uuid::Uuid,
+    pub(crate) cancel: cancellation::Session,

    pub(crate) _req: NumConnectionRequestsGuard<'static>,
    pub(crate) _conn: NumClientConnectionsGuard<'static>,
-    pub(crate) _cancel: cancellation::Session<P>,
 }

-impl<P, S: AsyncRead + AsyncWrite + Unpin> ProxyPassthrough<P, S> {
+impl<S: AsyncRead + AsyncWrite + Unpin> ProxyPassthrough<S> {
    pub(crate) async fn proxy_pass(
        self,
        compute_config: &ComputeConfig,
@@ -81,6 +81,9 @@ impl<P, S: AsyncRead + AsyncWrite + Unpin> ProxyPassthrough<P, S> {
        {
            tracing::warn!(session_id = ?self.session_id, ?err, "could not cancel the query in the database");
        }
+
+        drop(self.cancel.remove_cancel_key().await); // we don't need a result. If the queue is full, we just log the error
+
        res
    }
 }
--- a/proxy/src/rate_limiter/limiter.rs
+++ b/proxy/src/rate_limiter/limiter.rs
@@ -138,6 +138,12 @@ impl RateBucketInfo {
        Self::new(200, Duration::from_secs(600)),
    ];

+    // For all the sessions will be cancel key. So this limit is essentially global proxy limit.
+    pub const DEFAULT_REDIS_SET: [Self; 2] = [
+        Self::new(100_000, Duration::from_secs(1)),
+        Self::new(50_000, Duration::from_secs(10)),
+    ];
+
    /// All of these are per endpoint-maskedip pair.
    /// Context: 4096 rounds of pbkdf2 take about 1ms of cpu time to execute (1 milli-cpu-second or 1mcpus).
    ///
--- a/proxy/src/redis/cancellation_publisher.rs
+++ b/proxy/src/redis/cancellation_publisher.rs
@@ -2,12 +2,10 @@ use core::net::IpAddr;
 use std::sync::Arc;

 use pq_proto::CancelKeyData;
-use redis::AsyncCommands;
 use tokio::sync::Mutex;
 use uuid::Uuid;

 use super::connection_with_credentials_provider::ConnectionWithCredentialsProvider;
-use super::notifications::{CancelSession, Notification, PROXY_CHANNEL_NAME};
 use crate::rate_limiter::{GlobalRateLimiter, RateBucketInfo};

 pub trait CancellationPublisherMut: Send + Sync + 'static {
@@ -83,9 +81,10 @@ impl<P: CancellationPublisherMut> CancellationPublisher for Arc<Mutex<P>> {
 }

 pub struct RedisPublisherClient {
+    #[allow(dead_code)]
    client: ConnectionWithCredentialsProvider,
-    region_id: String,
-    limiter: GlobalRateLimiter,
+    _region_id: String,
+    _limiter: GlobalRateLimiter,
 }

 impl RedisPublisherClient {
@@ -96,26 +95,12 @@ impl RedisPublisherClient {
    ) -> anyhow::Result<Self> {
        Ok(Self {
            client,
-            region_id,
-            limiter: GlobalRateLimiter::new(info.into()),
+            _region_id: region_id,
+            _limiter: GlobalRateLimiter::new(info.into()),
        })
    }

-    async fn publish(
-        &mut self,
-        cancel_key_data: CancelKeyData,
-        session_id: Uuid,
-        peer_addr: IpAddr,
-    ) -> anyhow::Result<()> {
-        let payload = serde_json::to_string(&Notification::Cancel(CancelSession {
-            region_id: Some(self.region_id.clone()),
-            cancel_key_data,
-            session_id,
-            peer_addr: Some(peer_addr),
-        }))?;
-        let _: () = self.client.publish(PROXY_CHANNEL_NAME, payload).await?;
-        Ok(())
-    }
+    #[allow(dead_code)]
    pub(crate) async fn try_connect(&mut self) -> anyhow::Result<()> {
        match self.client.connect().await {
            Ok(()) => {}
@@ -126,49 +111,4 @@ impl RedisPublisherClient {
        }
        Ok(())
    }
-    async fn try_publish_internal(
-        &mut self,
-        cancel_key_data: CancelKeyData,
-        session_id: Uuid,
-        peer_addr: IpAddr,
-    ) -> anyhow::Result<()> {
-        // TODO: review redundant error duplication logs.
-        if !self.limiter.check() {
-            tracing::info!("Rate limit exceeded. Skipping cancellation message");
-            return Err(anyhow::anyhow!("Rate limit exceeded"));
-        }
-        match self.publish(cancel_key_data, session_id, peer_addr).await {
-            Ok(()) => return Ok(()),
-            Err(e) => {
-                tracing::error!("failed to publish a message: {e}");
-            }
-        }
-        tracing::info!("Publisher is disconnected. Reconnectiong...");
-        self.try_connect().await?;
-        self.publish(cancel_key_data, session_id, peer_addr).await
-    }
-}
-
-impl CancellationPublisherMut for RedisPublisherClient {
-    async fn try_publish(
-        &mut self,
-        cancel_key_data: CancelKeyData,
-        session_id: Uuid,
-        peer_addr: IpAddr,
-    ) -> anyhow::Result<()> {
-        tracing::info!("publishing cancellation key to Redis");
-        match self
-            .try_publish_internal(cancel_key_data, session_id, peer_addr)
-            .await
-        {
-            Ok(()) => {
-                tracing::debug!("cancellation key successfuly published to Redis");
-                Ok(())
-            }
-            Err(e) => {
-                tracing::error!("failed to publish a message: {e}");
-                Err(e)
-            }
-        }
-    }
 }
--- a/proxy/src/redis/connection_with_credentials_provider.rs
+++ b/proxy/src/redis/connection_with_credentials_provider.rs
@@ -29,6 +29,7 @@ impl Clone for Credentials {
 /// Provides PubSub connection without credentials refresh.
 pub struct ConnectionWithCredentialsProvider {
    credentials: Credentials,
+    // TODO: with more load on the connection, we should consider using a connection pool
    con: Option<MultiplexedConnection>,
    refresh_token_task: Option<JoinHandle<()>>,
    mutex: tokio::sync::Mutex<()>,
--- a/proxy/src/redis/keys.rs
+++ b/proxy/src/redis/keys.rs
@@ -0,0 +1,88 @@
+use anyhow::Ok;
+use pq_proto::{id_to_cancel_key, CancelKeyData};
+use serde::{Deserialize, Serialize};
+use std::io::ErrorKind;
+
+pub mod keyspace {
+    pub const CANCEL_PREFIX: &str = "cancel";
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
+pub(crate) enum KeyPrefix {
+    #[serde(untagged)]
+    Cancel(CancelKeyData),
+}
+
+impl KeyPrefix {
+    pub(crate) fn build_redis_key(&self) -> String {
+        match self {
+            KeyPrefix::Cancel(key) => {
+                let hi = (key.backend_pid as u64) << 32;
+                let lo = (key.cancel_key as u64) & 0xffff_ffff;
+                let id = hi | lo;
+                let keyspace = keyspace::CANCEL_PREFIX;
+                format!("{keyspace}:{id:x}")
+            }
+        }
+    }
+
+    #[allow(dead_code)]
+    pub(crate) fn as_str(&self) -> &'static str {
+        match self {
+            KeyPrefix::Cancel(_) => keyspace::CANCEL_PREFIX,
+        }
+    }
+}
+
+#[allow(dead_code)]
+pub(crate) fn parse_redis_key(key: &str) -> anyhow::Result<KeyPrefix> {
+    let (prefix, key_str) = key.split_once(':').ok_or_else(|| {
+        anyhow::anyhow!(std::io::Error::new(
+            ErrorKind::InvalidData,
+            "missing prefix"
+        ))
+    })?;
+
+    match prefix {
+        keyspace::CANCEL_PREFIX => {
+            let id = u64::from_str_radix(key_str, 16)?;
+
+            Ok(KeyPrefix::Cancel(id_to_cancel_key(id)))
+        }
+        _ => Err(anyhow::anyhow!(std::io::Error::new(
+            ErrorKind::InvalidData,
+            "unknown prefix"
+        ))),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_build_redis_key() {
+        let cancel_key: KeyPrefix = KeyPrefix::Cancel(CancelKeyData {
+            backend_pid: 12345,
+            cancel_key: 54321,
+        });
+
+        let redis_key = cancel_key.build_redis_key();
+        assert_eq!(redis_key, "cancel:30390000d431");
+    }
+
+    #[test]
+    fn test_parse_redis_key() {
+        let redis_key = "cancel:30390000d431";
+        let key: KeyPrefix = parse_redis_key(redis_key).expect("Failed to parse key");
+
+        let ref_key = CancelKeyData {
+            backend_pid: 12345,
+            cancel_key: 54321,
+        };
+
+        assert_eq!(key.as_str(), KeyPrefix::Cancel(ref_key).as_str());
+        let KeyPrefix::Cancel(cancel_key) = key;
+        assert_eq!(ref_key, cancel_key);
+    }
+}
--- a/proxy/src/redis/kv_ops.rs
+++ b/proxy/src/redis/kv_ops.rs
@@ -0,0 +1,185 @@
+use redis::{AsyncCommands, ToRedisArgs};
+
+use super::connection_with_credentials_provider::ConnectionWithCredentialsProvider;
+
+use crate::rate_limiter::{GlobalRateLimiter, RateBucketInfo};
+
+pub struct RedisKVClient {
+    client: ConnectionWithCredentialsProvider,
+    limiter: GlobalRateLimiter,
+}
+
+impl RedisKVClient {
+    pub fn new(client: ConnectionWithCredentialsProvider, info: &'static [RateBucketInfo]) -> Self {
+        Self {
+            client,
+            limiter: GlobalRateLimiter::new(info.into()),
+        }
+    }
+
+    pub async fn try_connect(&mut self) -> anyhow::Result<()> {
+        match self.client.connect().await {
+            Ok(()) => {}
+            Err(e) => {
+                tracing::error!("failed to connect to redis: {e}");
+                return Err(e);
+            }
+        }
+        Ok(())
+    }
+
+    pub(crate) async fn hset<K, F, V>(&mut self, key: K, field: F, value: V) -> anyhow::Result<()>
+    where
+        K: ToRedisArgs + Send + Sync,
+        F: ToRedisArgs + Send + Sync,
+        V: ToRedisArgs + Send + Sync,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping hset");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.hset(&key, &field, &value).await {
+            Ok(()) => return Ok(()),
+            Err(e) => {
+                tracing::error!("failed to set a key-value pair: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client
+            .hset(key, field, value)
+            .await
+            .map_err(anyhow::Error::new)
+    }
+
+    #[allow(dead_code)]
+    pub(crate) async fn hset_multiple<K, V>(
+        &mut self,
+        key: &str,
+        items: &[(K, V)],
+    ) -> anyhow::Result<()>
+    where
+        K: ToRedisArgs + Send + Sync,
+        V: ToRedisArgs + Send + Sync,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping hset_multiple");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.hset_multiple(key, items).await {
+            Ok(()) => return Ok(()),
+            Err(e) => {
+                tracing::error!("failed to set a key-value pair: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client
+            .hset_multiple(key, items)
+            .await
+            .map_err(anyhow::Error::new)
+    }
+
+    #[allow(dead_code)]
+    pub(crate) async fn expire<K>(&mut self, key: K, seconds: i64) -> anyhow::Result<()>
+    where
+        K: ToRedisArgs + Send + Sync,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping expire");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.expire(&key, seconds).await {
+            Ok(()) => return Ok(()),
+            Err(e) => {
+                tracing::error!("failed to set a key-value pair: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client
+            .expire(key, seconds)
+            .await
+            .map_err(anyhow::Error::new)
+    }
+
+    #[allow(dead_code)]
+    pub(crate) async fn hget<K, F, V>(&mut self, key: K, field: F) -> anyhow::Result<V>
+    where
+        K: ToRedisArgs + Send + Sync,
+        F: ToRedisArgs + Send + Sync,
+        V: redis::FromRedisValue,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping hget");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.hget(&key, &field).await {
+            Ok(value) => return Ok(value),
+            Err(e) => {
+                tracing::error!("failed to get a value: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client
+            .hget(key, field)
+            .await
+            .map_err(anyhow::Error::new)
+    }
+
+    pub(crate) async fn hget_all<K, V>(&mut self, key: K) -> anyhow::Result<V>
+    where
+        K: ToRedisArgs + Send + Sync,
+        V: redis::FromRedisValue,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping hgetall");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.hgetall(&key).await {
+            Ok(value) => return Ok(value),
+            Err(e) => {
+                tracing::error!("failed to get a value: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client.hgetall(key).await.map_err(anyhow::Error::new)
+    }
+
+    pub(crate) async fn hdel<K, F>(&mut self, key: K, field: F) -> anyhow::Result<()>
+    where
+        K: ToRedisArgs + Send + Sync,
+        F: ToRedisArgs + Send + Sync,
+    {
+        if !self.limiter.check() {
+            tracing::info!("Rate limit exceeded. Skipping hdel");
+            return Err(anyhow::anyhow!("Rate limit exceeded"));
+        }
+
+        match self.client.hdel(&key, &field).await {
+            Ok(()) => return Ok(()),
+            Err(e) => {
+                tracing::error!("failed to delete a key-value pair: {e}");
+            }
+        }
+
+        tracing::info!("Redis client is disconnected. Reconnectiong...");
+        self.try_connect().await?;
+        self.client
+            .hdel(key, field)
+            .await
+            .map_err(anyhow::Error::new)
+    }
+}
--- a/proxy/src/redis/mod.rs
+++ b/proxy/src/redis/mod.rs
@@ -1,4 +1,6 @@
 pub mod cancellation_publisher;
 pub mod connection_with_credentials_provider;
 pub mod elasticache;
+pub mod keys;
+pub mod kv_ops;
 pub mod notifications;
--- a/proxy/src/redis/notifications.rs
+++ b/proxy/src/redis/notifications.rs
@@ -6,18 +6,14 @@ use pq_proto::CancelKeyData;
 use redis::aio::PubSub;
 use serde::{Deserialize, Serialize};
 use tokio_util::sync::CancellationToken;
-use tracing::Instrument;
 use uuid::Uuid;

 use super::connection_with_credentials_provider::ConnectionWithCredentialsProvider;
 use crate::cache::project_info::ProjectInfoCache;
-use crate::cancellation::{CancelMap, CancellationHandler};
-use crate::config::ProxyConfig;
 use crate::intern::{ProjectIdInt, RoleNameInt};
 use crate::metrics::{Metrics, RedisErrors, RedisEventsCount};

 const CPLANE_CHANNEL_NAME: &str = "neondb-proxy-ws-updates";
-pub(crate) const PROXY_CHANNEL_NAME: &str = "neondb-proxy-to-proxy-updates";
 const RECONNECT_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(20);
 const INVALIDATION_LAG: std::time::Duration = std::time::Duration::from_secs(20);

@@ -25,8 +21,6 @@ async fn try_connect(client: &ConnectionWithCredentialsProvider) -> anyhow::Resu
    let mut conn = client.get_async_pubsub().await?;
    tracing::info!("subscribing to a channel `{CPLANE_CHANNEL_NAME}`");
    conn.subscribe(CPLANE_CHANNEL_NAME).await?;
-    tracing::info!("subscribing to a channel `{PROXY_CHANNEL_NAME}`");
-    conn.subscribe(PROXY_CHANNEL_NAME).await?;
    Ok(conn)
 }

@@ -71,8 +65,6 @@ pub(crate) enum Notification {
        deserialize_with = "deserialize_json_string"
    )]
    PasswordUpdate { password_update: PasswordUpdate },
-    #[serde(rename = "/cancel_session")]
-    Cancel(CancelSession),

    #[serde(
        other,
@@ -138,7 +130,6 @@ where

 struct MessageHandler<C: ProjectInfoCache + Send + Sync + 'static> {
    cache: Arc<C>,
-    cancellation_handler: Arc<CancellationHandler<()>>,
    region_id: String,
 }

@@ -146,23 +137,14 @@ impl<C: ProjectInfoCache + Send + Sync + 'static> Clone for MessageHandler<C> {
    fn clone(&self) -> Self {
        Self {
            cache: self.cache.clone(),
-            cancellation_handler: self.cancellation_handler.clone(),
            region_id: self.region_id.clone(),
        }
    }
 }

 impl<C: ProjectInfoCache + Send + Sync + 'static> MessageHandler<C> {
-    pub(crate) fn new(
-        cache: Arc<C>,
-        cancellation_handler: Arc<CancellationHandler<()>>,
-        region_id: String,
-    ) -> Self {
-        Self {
-            cache,
-            cancellation_handler,
-            region_id,
-        }
+    pub(crate) fn new(cache: Arc<C>, region_id: String) -> Self {
+        Self { cache, region_id }
    }

    pub(crate) async fn increment_active_listeners(&self) {
@@ -207,46 +189,6 @@ impl<C: ProjectInfoCache + Send + Sync + 'static> MessageHandler<C> {

        tracing::debug!(?msg, "received a message");
        match msg {
-            Notification::Cancel(cancel_session) => {
-                tracing::Span::current().record(
-                    "session_id",
-                    tracing::field::display(cancel_session.session_id),
-                );
-                Metrics::get()
-                    .proxy
-                    .redis_events_count
-                    .inc(RedisEventsCount::CancelSession);
-                if let Some(cancel_region) = cancel_session.region_id {
-                    // If the message is not for this region, ignore it.
-                    if cancel_region != self.region_id {
-                        return Ok(());
-                    }
-                }
-
-                // TODO: Remove unspecified peer_addr after the complete migration to the new format
-                let peer_addr = cancel_session
-                    .peer_addr
-                    .unwrap_or(std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED));
-                let cancel_span = tracing::span!(parent: None, tracing::Level::INFO, "cancel_session", session_id = ?cancel_session.session_id);
-                cancel_span.follows_from(tracing::Span::current());
-                // This instance of cancellation_handler doesn't have a RedisPublisherClient so it can't publish the message.
-                match self
-                    .cancellation_handler
-                    .cancel_session(
-                        cancel_session.cancel_key_data,
-                        uuid::Uuid::nil(),
-                        peer_addr,
-                        cancel_session.peer_addr.is_some(),
-                    )
-                    .instrument(cancel_span)
-                    .await
-                {
-                    Ok(()) => {}
-                    Err(e) => {
-                        tracing::warn!("failed to cancel session: {e}");
-                    }
-                }
-            }
            Notification::AllowedIpsUpdate { .. }
            | Notification::PasswordUpdate { .. }
            | Notification::BlockPublicOrVpcAccessUpdated { .. }
@@ -293,7 +235,6 @@ fn invalidate_cache<C: ProjectInfoCache>(cache: Arc<C>, msg: Notification) {
                password_update.project_id,
                password_update.role_name,
            ),
-        Notification::Cancel(_) => unreachable!("cancel message should be handled separately"),
        Notification::BlockPublicOrVpcAccessUpdated { .. } => {
            // https://github.com/neondatabase/neon/pull/10073
        }
@@ -323,8 +264,8 @@ async fn handle_messages<C: ProjectInfoCache + Send + Sync + 'static>(
            }
            Err(e) => {
                tracing::error!(
-            "failed to connect to redis: {e}, will try to reconnect in {RECONNECT_TIMEOUT:#?}"
-        );
+                    "failed to connect to redis: {e}, will try to reconnect in {RECONNECT_TIMEOUT:#?}"
+                );
                tokio::time::sleep(RECONNECT_TIMEOUT).await;
                continue;
            }
@@ -350,21 +291,14 @@ async fn handle_messages<C: ProjectInfoCache + Send + Sync + 'static>(
 /// Handle console's invalidation messages.
 #[tracing::instrument(name = "redis_notifications", skip_all)]
 pub async fn task_main<C>(
-    config: &'static ProxyConfig,
    redis: ConnectionWithCredentialsProvider,
    cache: Arc<C>,
-    cancel_map: CancelMap,
    region_id: String,
 ) -> anyhow::Result<Infallible>
 where
    C: ProjectInfoCache + Send + Sync + 'static,
 {
-    let cancellation_handler = Arc::new(CancellationHandler::<()>::new(
-        &config.connect_to_compute,
-        cancel_map,
-        crate::metrics::CancellationSource::FromRedis,
-    ));
-    let handler = MessageHandler::new(cache, cancellation_handler, region_id);
+    let handler = MessageHandler::new(cache, region_id);
    // 6h - 1m.
    // There will be 1 minute overlap between two tasks. But at least we can be sure that no message is lost.
    let mut interval = tokio::time::interval(std::time::Duration::from_secs(6 * 60 * 60 - 60));
@@ -442,35 +376,6 @@ mod tests {

        Ok(())
    }
-    #[test]
-    fn parse_cancel_session() -> anyhow::Result<()> {
-        let cancel_key_data = CancelKeyData {
-            backend_pid: 42,
-            cancel_key: 41,
-        };
-        let uuid = uuid::Uuid::new_v4();
-        let msg = Notification::Cancel(CancelSession {
-            cancel_key_data,
-            region_id: None,
-            session_id: uuid,
-            peer_addr: None,
-        });
-        let text = serde_json::to_string(&msg)?;
-        let result: Notification = serde_json::from_str(&text)?;
-        assert_eq!(msg, result);
-
-        let msg = Notification::Cancel(CancelSession {
-            cancel_key_data,
-            region_id: Some("region".to_string()),
-            session_id: uuid,
-            peer_addr: None,
-        });
-        let text = serde_json::to_string(&msg)?;
-        let result: Notification = serde_json::from_str(&text)?;
-        assert_eq!(msg, result,);
-
-        Ok(())
-    }

    #[test]
    fn parse_unknown_topic() -> anyhow::Result<()> {
--- a/proxy/src/serverless/mod.rs
+++ b/proxy/src/serverless/mod.rs
@@ -43,7 +43,7 @@ use tokio_util::task::TaskTracker;
 use tracing::{info, warn, Instrument};
 use utils::http::error::ApiError;

-use crate::cancellation::CancellationHandlerMain;
+use crate::cancellation::CancellationHandler;
 use crate::config::{ProxyConfig, ProxyProtocolV2};
 use crate::context::RequestContext;
 use crate::ext::TaskExt;
@@ -61,7 +61,7 @@ pub async fn task_main(
    auth_backend: &'static crate::auth::Backend<'static, ()>,
    ws_listener: TcpListener,
    cancellation_token: CancellationToken,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    endpoint_rate_limiter: Arc<EndpointRateLimiter>,
 ) -> anyhow::Result<()> {
    scopeguard::defer! {
@@ -318,7 +318,7 @@ async fn connection_handler(
    backend: Arc<PoolingBackend>,
    connections: TaskTracker,
    cancellations: TaskTracker,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    endpoint_rate_limiter: Arc<EndpointRateLimiter>,
    cancellation_token: CancellationToken,
    conn: AsyncRW,
@@ -412,7 +412,7 @@ async fn request_handler(
    config: &'static ProxyConfig,
    backend: Arc<PoolingBackend>,
    ws_connections: TaskTracker,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    session_id: uuid::Uuid,
    conn_info: ConnectionInfo,
    // used to cancel in-flight HTTP requests. not used to cancel websockets
--- a/proxy/src/serverless/websocket.rs
+++ b/proxy/src/serverless/websocket.rs
@@ -12,7 +12,7 @@ use pin_project_lite::pin_project;
 use tokio::io::{self, AsyncBufRead, AsyncRead, AsyncWrite, ReadBuf};
 use tracing::warn;

-use crate::cancellation::CancellationHandlerMain;
+use crate::cancellation::CancellationHandler;
 use crate::config::ProxyConfig;
 use crate::context::RequestContext;
 use crate::error::{io_error, ReportableError};
@@ -129,7 +129,7 @@ pub(crate) async fn serve_websocket(
    auth_backend: &'static crate::auth::Backend<'static, ()>,
    ctx: RequestContext,
    websocket: OnUpgrade,
-    cancellation_handler: Arc<CancellationHandlerMain>,
+    cancellation_handler: Arc<CancellationHandler>,
    endpoint_rate_limiter: Arc<EndpointRateLimiter>,
    hostname: Option<String>,
    cancellations: tokio_util::task::task_tracker::TaskTracker,
--- a/test_runner/fixtures/endpoint/http.py
+++ b/test_runner/fixtures/endpoint/http.py
@@ -28,11 +28,6 @@ class EndpointHttpClient(requests.Session):
        res.raise_for_status()
        return res.text

-    def installed_extensions(self):
-        res = self.get(f"http://localhost:{self.port}/installed_extensions")
-        res.raise_for_status()
-        return res.json()
-
    def extensions(self, extension: str, version: str, database: str):
        body = {
            "extension": extension,
--- a/test_runner/performance/test_compaction.py
+++ b/test_runner/performance/test_compaction.py
@@ -75,6 +75,7 @@ def test_compaction_l0_memory(neon_compare: NeonCompare):
            # Initially disable compaction so that we will build up a stack of L0s
            "compaction_period": "0s",
            "gc_period": "0s",
+            "compaction_upper_limit": 12,
        }
    )
    neon_compare.tenant = tenant_id
@@ -91,6 +92,7 @@ def test_compaction_l0_memory(neon_compare: NeonCompare):
    tenant_conf = pageserver_http.tenant_config(tenant_id)
    assert tenant_conf.effective_config["checkpoint_distance"] == 256 * 1024 * 1024
    assert tenant_conf.effective_config["compaction_threshold"] == 10
+    assert tenant_conf.effective_config["compaction_upper_limit"] == 12

    # Aim to write about 20 L0s, so that we will hit the limit on how many
    # to compact at once
--- a/test_runner/performance/test_layer_map.py
+++ b/test_runner/performance/test_layer_map.py
@@ -31,6 +31,7 @@ def test_layer_map(neon_env_builder: NeonEnvBuilder, zenbenchmark):

    endpoint = env.endpoints.create_start("main", tenant_id=tenant)
    cur = endpoint.connect().cursor()
+    cur.execute("set log_statement = 'all'")
    cur.execute("create table t(x integer)")
    for _ in range(n_iters):
        cur.execute(f"insert into t values (generate_series(1,{n_records}))")
--- a/test_runner/regress/test_attach_tenant_config.py
+++ b/test_runner/regress/test_attach_tenant_config.py
@@ -139,6 +139,7 @@ def test_fully_custom_config(positive_env: NeonEnv):
    fully_custom_config = {
        "compaction_period": "1h",
        "compaction_threshold": 13,
+        "compaction_upper_limit": 100,
        "l0_flush_delay_threshold": 25,
        "l0_flush_stall_threshold": 42,
        "l0_flush_wait_upload": True,
--- a/test_runner/regress/test_compute_metrics.py
+++ b/test_runner/regress/test_compute_metrics.py
@@ -5,16 +5,22 @@ import os
 import shutil
 import sys
 from enum import StrEnum
+from logging import debug
 from pathlib import Path
 from typing import TYPE_CHECKING, cast

 import pytest
 import requests
 import yaml
+from fixtures.endpoint.http import EndpointHttpClient
 from fixtures.log_helper import log
+from fixtures.metrics import parse_metrics
 from fixtures.paths import BASE_DIR, COMPUTE_CONFIG_DIR
+from fixtures.utils import wait_until
+from prometheus_client.samples import Sample

 if TYPE_CHECKING:
+    from collections.abc import Callable
    from types import TracebackType
    from typing import Self, TypedDict

@@ -467,3 +473,88 @@ def test_perf_counters(neon_simple_env: NeonEnv):
    cur.execute("CREATE EXTENSION neon VERSION '1.5'")
    cur.execute("SELECT * FROM neon_perf_counters")
    cur.execute("SELECT * FROM neon_backend_perf_counters")
+
+
+def collect_metric(
+    client: EndpointHttpClient,
+    name: str,
+    filter: dict[str, str],
+    predicate: Callable[[list[Sample]], bool],
+) -> Callable[[], list[Sample]]:
+    """
+    Call this function as the first argument to wait_until().
+    """
+
+    def __collect_metric() -> list[Sample]:
+        resp = client.metrics()
+        debug("Metrics: %s", resp)
+        m = parse_metrics(resp)
+        samples = m.query_all(name, filter)
+        debug("Samples: %s", samples)
+        assert predicate(samples), "predicate failed"
+        return samples
+
+    return __collect_metric
+
+
+def test_compute_installed_extensions_metric(neon_simple_env: NeonEnv):
+    """
+    Test that the compute_installed_extensions properly reports accurate
+    results. Important to note that currently this metric is only gathered on
+    compute start.
+    """
+    env = neon_simple_env
+
+    endpoint = env.endpoints.create_start("main")
+
+    client = endpoint.http_client()
+
+    def __has_plpgsql(samples: list[Sample]) -> bool:
+        """
+        Check that plpgsql is installed in the template1 and postgres databases
+        """
+        return len(samples) == 1 and samples[0].value == 2
+
+    wait_until(
+        collect_metric(
+            client,
+            "compute_installed_extensions",
+            {"extension_name": "plpgsql", "version": "1.0", "owned_by_superuser": "1"},
+            __has_plpgsql,
+        ),
+        name="compute_installed_extensions",
+    )
+
+    # Install the neon extension, so we can check for it on the restart
+    endpoint.safe_psql("CREATE EXTENSION neon VERSION '1.0'")
+
+    # The metric is only gathered on compute start, so restart to check if the
+    # neon extension will now be there.
+    endpoint.stop()
+    endpoint.start()
+
+    client = endpoint.http_client()
+
+    def __has_neon(samples: list[Sample]) -> bool:
+        return len(samples) == 1 and samples[0].value == 1
+
+    wait_until(
+        collect_metric(
+            client,
+            "compute_installed_extensions",
+            {"extension_name": "neon", "version": "1.0", "owned_by_superuser": "1"},
+            __has_neon,
+        ),
+        name="compute_installed_extensions",
+    )
+
+    # Double check that we also still have plpgsql
+    wait_until(
+        collect_metric(
+            client,
+            "compute_installed_extensions",
+            {"extension_name": "plpgsql", "version": "1.0", "owned_by_superuser": "1"},
+            __has_plpgsql,
+        ),
+        name="compute_installed_extensions",
+    )
--- a/test_runner/regress/test_compute_migrations.py
+++ b/test_runner/regress/test_compute_migrations.py
@@ -5,6 +5,8 @@ from typing import TYPE_CHECKING, cast

 import pytest
 from fixtures.compute_migrations import COMPUTE_MIGRATIONS, NUM_COMPUTE_MIGRATIONS
+from fixtures.metrics import parse_metrics
+from fixtures.utils import wait_until

 if TYPE_CHECKING:
    from fixtures.neon_fixtures import NeonEnv
@@ -23,7 +25,26 @@ def test_compute_migrations_retry(neon_simple_env: NeonEnv, compute_migrations_d
    for i in range(1, NUM_COMPUTE_MIGRATIONS + 1):
        endpoint.start(env={"FAILPOINTS": f"compute-migration=return({i})"})

-        # Make sure that the migrations ran
+        # Check that migration failure is properly recorded in the metrics
+        #
+        # N.B. wait_for_migrations() only waits till the last successful
+        # migration is applied. It doesn't wait till the migration failure due
+        # to the failpoint. This opens a race for checking the metrics. To avoid
+        # this, we first wait until the migration failure metric is seen.
+        def check_migration_failure_metrics():
+            client = endpoint.http_client()
+            raw_metrics = client.metrics()
+            metrics = parse_metrics(raw_metrics)
+            failed_migration = metrics.query_all(
+                "compute_ctl_db_migration_failed_total",
+            )
+            assert len(failed_migration) == 1
+            for sample in failed_migration:
+                assert sample.value == 1
+
+        wait_until(check_migration_failure_metrics)
+
+        # Make sure that all migrations before the failed one are applied
        endpoint.wait_for_migrations(wait_for=i - 1)

        # Confirm that we correctly recorded that in the
--- a/test_runner/regress/test_download_extensions.py
+++ b/test_runner/regress/test_download_extensions.py
@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING

 import pytest
 from fixtures.log_helper import log
+from fixtures.metrics import parse_metrics
 from fixtures.neon_fixtures import (
    NeonEnvBuilder,
 )
@@ -128,6 +129,17 @@ def test_remote_extensions(

    httpserver.check()

+    # Check that we properly recorded downloads in the metrics
+    client = endpoint.http_client()
+    raw_metrics = client.metrics()
+    metrics = parse_metrics(raw_metrics)
+    remote_ext_requests = metrics.query_all(
+        "compute_ctl_remote_ext_requests_total",
+    )
+    assert len(remote_ext_requests) == 1
+    for sample in remote_ext_requests:
+        assert sample.value == 1
+

 # TODO
 # 1. Test downloading remote library.
@@ -137,7 +149,7 @@ def test_remote_extensions(
 #
 # 3.Test that extension is downloaded after endpoint restart,
 # when the library is used in the query.
-# Run the test with mutliple simultaneous connections to an endpoint.
+# Run the test with multiple simultaneous connections to an endpoint.
 # to ensure that the extension is downloaded only once.
 #
 # 4. Test that private extensions are only downloaded when they are present in the spec.
--- a/test_runner/regress/test_installed_extensions.py
+++ b/test_runner/regress/test_installed_extensions.py
@@ -1,154 +0,0 @@
-from __future__ import annotations
-
-import time
-from logging import info
-from typing import TYPE_CHECKING
-
-from fixtures.log_helper import log
-from fixtures.metrics import parse_metrics
-
-if TYPE_CHECKING:
-    from fixtures.neon_fixtures import NeonEnv
-
-
-def test_installed_extensions(neon_simple_env: NeonEnv):
-    """basic test for the endpoint that returns the list of installed extensions"""
-
-    env = neon_simple_env
-
-    env.create_branch("test_installed_extensions")
-
-    endpoint = env.endpoints.create_start("test_installed_extensions")
-
-    endpoint.safe_psql("CREATE DATABASE test_installed_extensions")
-    endpoint.safe_psql("CREATE DATABASE test_installed_extensions_2")
-
-    client = endpoint.http_client()
-    res = client.installed_extensions()
-
-    info("Extensions list: %s", res)
-    info("Extensions: %s", res["extensions"])
-    # 'plpgsql' is a default extension that is always installed.
-    assert any(
-        ext["extname"] == "plpgsql" and ext["version"] == "1.0" for ext in res["extensions"]
-    ), "The 'plpgsql' extension is missing"
-
-    # check that the neon_test_utils extension is not installed
-    assert not any(
-        ext["extname"] == "neon_test_utils" for ext in res["extensions"]
-    ), "The 'neon_test_utils' extension is installed"
-
-    pg_conn = endpoint.connect(dbname="test_installed_extensions")
-    with pg_conn.cursor() as cur:
-        cur.execute("CREATE EXTENSION neon_test_utils")
-        cur.execute(
-            "SELECT default_version FROM pg_available_extensions WHERE name = 'neon_test_utils'"
-        )
-        res = cur.fetchone()
-        neon_test_utils_version = res[0]
-
-    with pg_conn.cursor() as cur:
-        cur.execute("CREATE EXTENSION neon version '1.1'")
-
-    pg_conn_2 = endpoint.connect(dbname="test_installed_extensions_2")
-    with pg_conn_2.cursor() as cur:
-        cur.execute("CREATE EXTENSION neon version '1.2'")
-
-    res = client.installed_extensions()
-
-    info("Extensions list: %s", res)
-    info("Extensions: %s", res["extensions"])
-
-    # check that the neon_test_utils extension is installed only in 1 database
-    # and has the expected version
-    assert any(
-        ext["extname"] == "neon_test_utils"
-        and ext["version"] == neon_test_utils_version
-        and ext["n_databases"] == 1
-        for ext in res["extensions"]
-    )
-
-    # check that the plpgsql extension is installed in all databases
-    # this is a default extension that is always installed
-    assert any(ext["extname"] == "plpgsql" and ext["n_databases"] == 4 for ext in res["extensions"])
-
-    # check that the neon extension is installed and has expected versions
-    for ext in res["extensions"]:
-        if ext["extname"] == "neon":
-            assert ext["version"] in ["1.1", "1.2"]
-            assert ext["n_databases"] == 1
-
-    with pg_conn.cursor() as cur:
-        cur.execute("ALTER EXTENSION neon UPDATE TO '1.3'")
-
-    res = client.installed_extensions()
-
-    info("Extensions list: %s", res)
-    info("Extensions: %s", res["extensions"])
-
-    # check that the neon_test_utils extension is updated
-    for ext in res["extensions"]:
-        if ext["extname"] == "neon":
-            assert ext["version"] in ["1.2", "1.3"]
-            assert ext["n_databases"] == 1
-
-    # check that /metrics endpoint is available
-    # ensure that we see the metric before and after restart
-    res = client.metrics()
-    info("Metrics: %s", res)
-    m = parse_metrics(res)
-    neon_m = m.query_all(
-        "compute_installed_extensions",
-        {"extension_name": "neon", "version": "1.2", "owned_by_superuser": "1"},
-    )
-    assert len(neon_m) == 1
-    for sample in neon_m:
-        assert sample.value == 1
-    neon_m = m.query_all(
-        "compute_installed_extensions",
-        {"extension_name": "neon", "version": "1.3", "owned_by_superuser": "1"},
-    )
-    assert len(neon_m) == 1
-    for sample in neon_m:
-        assert sample.value == 1
-
-    endpoint.stop()
-    endpoint.start()
-
-    timeout = 10
-    while timeout > 0:
-        try:
-            res = client.metrics()
-            timeout = -1
-            if len(parse_metrics(res).query_all("compute_installed_extensions")) < 4:
-                # Assume that not all metrics that are collected yet
-                time.sleep(1)
-                timeout -= 1
-                continue
-        except Exception:
-            log.exception("failed to get metrics, assume they are not collected yet")
-            time.sleep(1)
-            timeout -= 1
-            continue
-
-        assert (
-            len(parse_metrics(res).query_all("compute_installed_extensions")) >= 4
-        ), "Not all metrics are collected"
-
-        info("After restart metrics: %s", res)
-        m = parse_metrics(res)
-        neon_m = m.query_all(
-            "compute_installed_extensions",
-            {"extension_name": "neon", "version": "1.2", "owned_by_superuser": "1"},
-        )
-        assert len(neon_m) == 1
-        for sample in neon_m:
-            assert sample.value == 1
-
-        neon_m = m.query_all(
-            "compute_installed_extensions",
-            {"extension_name": "neon", "version": "1.3", "owned_by_superuser": "1"},
-        )
-        assert len(neon_m) == 1
-        for sample in neon_m:
-            assert sample.value == 1
--- a/vendor/postgres-v17
+++ b/vendor/postgres-v17
--- a/vendor/revisions.json
+++ b/vendor/revisions.json
@@ -1,7 +1,7 @@
 {
  "v17": [
    "17.2",
-    "46f9b96555e084c35dd975da9485996db9e86181"
+    "b654fa88b6fd2ad24a03a14a7cd417ec66e518f9"
  ],
  "v16": [
    "16.6",
Author	SHA1	Message	Date
Tristan Partin	9eaaca4834	Add pg_search to the compute images pg_search is an extension by ParadeDB offering text search capabilities. Link: https://paradedb.com Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 15:48:16 -06:00
Tristan Partin	707a926057	Remove unused compute_ctl HTTP routes (#10544 ) These are not used anywhere within the platform, so let's remove dead code. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 19:22:01 +00:00
Alex Chi Z.	5bcefb4ee1	fix(pageserver): compaction perftest wrt upper limit (#10564 ) ## Problem The config is added in https://github.com/neondatabase/neon/pull/10550 causing behavior change for l0 compaction. close https://github.com/neondatabase/neon/issues/10562 ## Summary of changes Fix the test case to consider the effect of upper_limit. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 18:43:39 +00:00
Alexey Kondratov	34322b2424	chore(compute): Simplify new compute_ctl metrics and fix flaky test (#10560 ) ## Problem 1. `d04d924` added separate metrics for total requests and failures separately, but it doesn't make much sense. We could just have a unified counter with `http_status`. 2. `test_compute_migrations_retry` had a race, i.e., it was waiting for the last successful migration, not an actual failure. This was revealed after adding an assert on failure metric in `d04d924`. ## Summary of changes 1. Switch to unified counters for `compute_ctl` requests. 2. Add a waiting loop into `test_compute_migrations_retry` to eliminate the race. Part of neondatabase/cloud#17590	2025-01-29 18:09:25 +00:00
Vlad Lazar	fdfbc7b358	pageserver: hold GC while reading from a timeline (#10559 ) ## Problem If we are GC-ing because a new image layer was added while traversing the timeline, then it will remove layers that are required for fulfilling the current get request (read-path cannot "look back" and notice the new image layer). ## Summary of Changes Prevent GC from progressing on the current timeline while it is being visited for a read. Epic: https://github.com/neondatabase/neon/issues/9376	2025-01-29 17:08:25 +00:00
Conrad Ludgate	190c19c034	chore: update rust-postgres on rebase (#10561 ) I tried a full update of our tokio-postgres fork before. We hit some breaking change. This PR only pulls in ~50% of the changes from upstream: https://github.com/neondatabase/rust-postgres/pull/38.	2025-01-29 17:02:07 +00:00
Mikhail Kot	34e560fe37	download exporters from releases rather than using docker images (#10551 ) Use releases for postgres-exporter, pgbouncer-exporter, and sql-exporter	2025-01-29 15:52:00 +00:00
Tristan Partin	7922458b98	Use num_cpus from the workspace in pageserver (#10545 ) Luckily they were the same version, so we didn't spend time compiling two versions, which could have been the case in the future. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 15:45:36 +00:00
a-masterov	34d9e2d8e3	Add a test for GrapgQL (#10156 ) ## Problem We currently don't run the tests shipped with `pg_graphql`. ## Summary of changes The tests for `pg_graphql` are added.	2025-01-29 15:01:56 +00:00
Conrad Ludgate	2f82c21c63	chore: update rust-postgres fork (#10557 ) I updated the fork to fix some lints. Cargo keeps getting confused by it so let's just update the lockfile here	2025-01-29 12:55:24 +00:00
Ivan Efremov	222cc181e9	impr(proxy): Move the CancelMap to Redis hashes (#10364 ) ## Problem The approach of having CancelMap as an in-memory structure increases code complexity, as well as putting additional load for Redis streams. ## Summary of changes - Implement a set of KV ops for Redis client; - Remove cancel notifications code; - Send KV ops over the bounded channel to the handling background task for removing and adding the cancel keys. Closes #9660	2025-01-29 11:19:10 +00:00
alexanderlaw	4d2328ebe3	Fix C code to satisfy sanitizers (#10473 )	2025-01-29 10:05:43 +00:00
a-masterov	9f81828429	Test extension upgrade compatibility (#10244 ) ## Problem We have to test the extensions, shipped with Neon for compatibility before the upgrade. ## Summary of changes Added the test for compatibility with the upgraded extensions.	2025-01-29 09:19:11 +00:00
Arseny Sher	9ab13d6e2c	Log statements in test_layer_map (#10554 ) ## Problem test_layer_map doesn't log statements and it is not clear how long they take. ## Summary of changes Do log them. ref https://github.com/neondatabase/neon/issues/10409	2025-01-29 09:16:00 +00:00
Alex Chi Z.	983e18e63e	feat(pageserver): add compaction_upper_limit config (#10550 ) ## Problem Follow-up of the incident, we should not use the same bound on lower/upper limit of compaction files. This patch adds an upper bound limit, which is set to 50 for now. ## Summary of changes Add `compaction_upper_limit`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-01-28 23:18:32 +00:00
Alex Chi Z.	b735df6ff0	fix(pageserver): make image layer generation atomic (#10516 ) ## Problem close https://github.com/neondatabase/neon/issues/8362 ## Summary of changes Use `BatchLayerWriter` to ensure we clean up image layers after failed compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-28 21:29:51 +00:00
Fedor Dikarev	68cf0ba439	run benchmark tests on small-metal runners (#10549 ) ## Problem Ref: https://github.com/neondatabase/cloud/issues/23314 We suspect some inconsistency in Benchmark tests runs could be due to different type of runners they are landed in. To have that aligned in both terms: failure rates and benchmark results, lets run them for now on `small-metal` servers and see the progress for the tests stability. ## Summary of changes	2025-01-28 21:26:38 +00:00
Alexey Kondratov	d04d924649	feat(compute): Add some basic compute_ctl metrics (#10504 ) ## Problem There are several parts of `compute_ctl` with a very low visibility of errors: 1. DB migrations that run async in the background after compute start. 2. Requests made to control plane (currently only `GetSpec`). 3. Requests made to the remote extensions server. ## Summary of changes Add new counters to quickly evaluate the amount of errors among the fleet. Part of neondatabase/cloud#17590	2025-01-28 19:24:07 +00:00
JC Grünhage	f5fdaa6dc6	feat(ci): generate basic release notes with links (#10511 ) ## Problem https://github.com/neondatabase/neon/pull/10448 removed release notes, because if their generation failed, the whole release was failing. People liked them though, and wanted some basic release notes as a fall-back instead of completely removing them. ## Summary of changes Include basic release notes that link to the release PR and to a diff to the previous release.	2025-01-28 19:13:39 +00:00