New communicator, with "integrated" cache accessible from all processes

fix(pageserver): report synthetic size = 1 if all tls offloaded (2) (#11731 )
## Problem https://github.com/neondatabase/neon/pull/11648 did this for resident size instead of synthetic size. ## Summary of changes Report synthetic_size == 1 if all timelines are offloaded. Signed-off-by: Alex Chi Z <chi@neon.tech>
2026-05-19 14:10:37 +00:00 · 2025-04-29 11:52:44 +03:00 · 2025-04-28 13:45:45 +00:00 · 2025-04-28 13:24:18 +00:00 · 2025-04-28 12:44:28 +00:00 · 2025-04-28 12:16:29 +00:00
120 changed files with 9834 additions and 820 deletions
--- a/.github/actions/allure-report-generate/action.yml
+++ b/.github/actions/allure-report-generate/action.yml
@@ -7,7 +7,7 @@ inputs:
    type: boolean
    required: false
    default: false
-  aws-oicd-role-arn:
+  aws-oidc-role-arn:
    description: 'OIDC role arn to interract with S3'
    required: true

@@ -88,7 +88,7 @@ runs:
      if: ${{ !cancelled() }}
      with:
        aws-region: eu-central-1
-        role-to-assume: ${{ inputs.aws-oicd-role-arn }}
+        role-to-assume: ${{ inputs.aws-oidc-role-arn }}
        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

    # Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this
--- a/.github/actions/allure-report-store/action.yml
+++ b/.github/actions/allure-report-store/action.yml
@@ -8,7 +8,7 @@ inputs:
  unique-key:
    description: 'string to distinguish different results in the same run'
    required: true
-  aws-oicd-role-arn:
+  aws-oidc-role-arn:
    description: 'OIDC role arn to interract with S3'
    required: true

@@ -39,7 +39,7 @@ runs:
      if: ${{ !cancelled() }}
      with:
        aws-region: eu-central-1
-        role-to-assume: ${{ inputs.aws-oicd-role-arn }}
+        role-to-assume: ${{ inputs.aws-oidc-role-arn }}
        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

    - name: Upload test results
--- a/.github/actions/download/action.yml
+++ b/.github/actions/download/action.yml
@@ -15,7 +15,7 @@ inputs:
  prefix:
    description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
    required: false
-  aws-oicd-role-arn:
+  aws-oidc-role-arn:
    description: 'OIDC role arn to interract with S3'
    required: true

@@ -25,7 +25,7 @@ runs:
    - uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-region: eu-central-1
-        role-to-assume: ${{ inputs.aws-oicd-role-arn }}
+        role-to-assume: ${{ inputs.aws-oidc-role-arn }}
        role-duration-seconds: 3600

    - name: Download artifact
--- a/.github/actions/neon-project-create/action.yml
+++ b/.github/actions/neon-project-create/action.yml
@@ -49,6 +49,10 @@ inputs:
    description: 'A JSON object with project settings'
    required: false
    default: '{}'
+  default_endpoint_settings:
+    description: 'A JSON object with the default endpoint settings'
+    required: false
+    default: '{}'

 outputs:
  dsn:
@@ -66,9 +70,9 @@ runs:
      # A shell without `set -x` to not to expose password/dsn in logs
      shell: bash -euo pipefail {0}
      run: |
-        project=$(curl \
+        res=$(curl \
          "https://${API_HOST}/api/v2/projects" \
-          --fail \
+          -w "%{http_code}" \
          --header "Accept: application/json" \
          --header "Content-Type: application/json" \
          --header "Authorization: Bearer ${API_KEY}" \
@@ -83,6 +87,15 @@ runs:
              \"settings\": ${PROJECT_SETTINGS}
            }
          }")
+        
+        code=${res: -3}
+        if [[ ${code} -ge 400 ]]; then
+          echo Request failed with error code ${code}
+          echo ${res::-3}
+          exit 1
+        else
+          project=${res::-3}
+        fi

        # Mask password
        echo "::add-mask::$(echo $project | jq --raw-output '.roles[] | select(.name != "web_access") | .password')"
@@ -126,6 +139,22 @@ runs:
            -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
            -d "{\"scheduling\": \"Essential\"}"
        fi
+        # XXX
+        # This is a workaround for the default endpoint settings, which currently do not allow some settings in the public API.
+        # https://github.com/neondatabase/cloud/issues/27108
+        if [[ -n ${DEFAULT_ENDPOINT_SETTINGS} && ${DEFAULT_ENDPOINT_SETTINGS} != "{}" ]] ; then
+          PROJECT_DATA=$(curl -X GET \
+              "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/projects/${project_id}" \
+              -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
+              -d "{\"scheduling\": \"Essential\"}"
+          )
+          NEW_DEFAULT_ENDPOINT_SETTINGS=$(echo ${PROJECT_DATA} | jq -rc ".project.default_endpoint_settings + ${DEFAULT_ENDPOINT_SETTINGS}")
+          curl -X POST --fail \
+                "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/projects/${project_id}/default_endpoint_settings" \
+                -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
+                --data "${NEW_DEFAULT_ENDPOINT_SETTINGS}"
+        fi
+        

      env:
        API_HOST: ${{ inputs.api_host }}
@@ -142,3 +171,4 @@ runs:
        PSQL: ${{ inputs.psql_path }}
        LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}
        PROJECT_SETTINGS: ${{ inputs.project_settings }}
+        DEFAULT_ENDPOINT_SETTINGS: ${{ inputs.default_endpoint_settings }}
--- a/.github/actions/run-python-test-set/action.yml
+++ b/.github/actions/run-python-test-set/action.yml
@@ -53,7 +53,7 @@ inputs:
    description: 'benchmark durations JSON'
    required: false
    default: '{}'
-  aws-oicd-role-arn:
+  aws-oidc-role-arn:
    description: 'OIDC role arn to interract with S3'
    required: true

@@ -66,7 +66,7 @@ runs:
      with:
        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact
        path: /tmp/neon
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

    - name: Download Neon binaries for the previous release
      if: inputs.build_type != 'remote'
@@ -75,7 +75,7 @@ runs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
        path: /tmp/neon-previous
        prefix: latest
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

    - name: Download compatibility snapshot
      if: inputs.build_type != 'remote'
@@ -87,7 +87,7 @@ runs:
        # The lack of compatibility snapshot (for example, for the new Postgres version)
        # shouldn't fail the whole job. Only relevant test should fail.
        skip-if-does-not-exist: true
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

    - name: Checkout
      if: inputs.needs_postgres_source == 'true'
@@ -228,13 +228,13 @@ runs:
        # The lack of compatibility snapshot shouldn't fail the job
        # (for example if we didn't run the test for non build-and-test workflow)
        skip-if-does-not-exist: true
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

    - uses: aws-actions/configure-aws-credentials@v4
      if: ${{ !cancelled() }}
      with:
        aws-region: eu-central-1
-        role-to-assume: ${{ inputs.aws-oicd-role-arn }}
+        role-to-assume: ${{ inputs.aws-oidc-role-arn }}
        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

    - name: Upload test results
@@ -243,4 +243,4 @@ runs:
      with:
        report-dir: /tmp/test_output/allure/results
        unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}-${{ runner.arch }}
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
--- a/.github/actions/save-coverage-data/action.yml
+++ b/.github/actions/save-coverage-data/action.yml
@@ -14,11 +14,11 @@ runs:
        name: coverage-data-artifact
        path: /tmp/coverage
        skip-if-does-not-exist: true # skip if there's no previous coverage to download
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

    - name: Upload coverage data
      uses: ./.github/actions/upload
      with:
        name: coverage-data-artifact
        path: /tmp/coverage
-        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
+        aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
--- a/.github/actions/upload/action.yml
+++ b/.github/actions/upload/action.yml
@@ -14,7 +14,7 @@ inputs:
  prefix:
    description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
    required: false
-  aws-oicd-role-arn:
+  aws-oidc-role-arn:
    description: "the OIDC role arn for aws auth"
    required: false
    default: ""
@@ -61,7 +61,7 @@ runs:
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-region: eu-central-1
-        role-to-assume: ${{ inputs.aws-oicd-role-arn }}
+        role-to-assume: ${{ inputs.aws-oidc-role-arn }}
        role-duration-seconds: 3600

    - name: Upload artifact
--- a/.github/workflows/_benchmarking_preparation.yml
+++ b/.github/workflows/_benchmarking_preparation.yml
@@ -81,7 +81,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    # we create a table that has one row for each database that we want to restore with the status whether the restore is done
    - name: Create benchmark_restore_status table if it does not exist
--- a/.github/workflows/_build-and-test-locally.yml
+++ b/.github/workflows/_build-and-test-locally.yml
@@ -28,6 +28,16 @@ on:
        required: false
        default: 'disabled'
        type: string
+      test-selection:
+        description: 'specification of selected test(s) to run'
+        required: false
+        default: ''
+        type: string
+      test-run-count:
+        description: 'number of runs to perform for selected tests'
+        required: false
+        default: 1
+        type: number

 defaults:
  run:
@@ -313,7 +323,7 @@ jobs:
        with:
          name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact
          path: /tmp/neon
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Check diesel schema
        if: inputs.build-type == 'release' && inputs.arch == 'x64'
@@ -381,14 +391,15 @@ jobs:
          run_with_real_s3: true
          real_s3_bucket: neon-github-ci-tests
          real_s3_region: eu-central-1
-          rerun_failed: true
+          rerun_failed: ${{ inputs.test-run-count == 1 }}
          pg_version: ${{ matrix.pg_version }}
          sanitizers: ${{ inputs.sanitizers }}
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
          # `--session-timeout` is equal to (timeout-minutes - 10 minutes) * 60 seconds.
          # Attempt to stop tests gracefully to generate test reports
          # until they are forcibly stopped by the stricter `timeout-minutes` limit.
-          extra_params: --session-timeout=${{ inputs.sanitizers != 'enabled' && 3000 || 10200 }}
+          extra_params: --session-timeout=${{ inputs.sanitizers != 'enabled' && 3000 || 10200 }} --count=${{ inputs.test-run-count }}
+                        ${{ inputs.test-selection != '' && format('-k "{0}"', inputs.test-selection) || '' }}
        env:
          TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
          CHECK_ONDISK_DATA_COMPATIBILITY: nonempty
--- a/.github/workflows/benchmarking.yml
+++ b/.github/workflows/benchmarking.yml
@@ -114,7 +114,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Project
      id: create-neon-project
@@ -132,7 +132,7 @@ jobs:
        run_in_parallel: false
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        # Set --sparse-ordering option of pytest-order plugin
        # to ensure tests are running in order of appears in the file.
        # It's important for test_perf_pgbench.py::test_pgbench_remote_* tests
@@ -165,7 +165,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
@@ -222,8 +222,8 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
-    
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+
    - name: Verify that cumulative statistics are preserved
      uses: ./.github/actions/run-python-test-set
      with:
@@ -233,7 +233,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 3600
        pg_version: ${{ env.DEFAULT_PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -282,7 +282,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Run Logical Replication benchmarks
      uses: ./.github/actions/run-python-test-set
@@ -293,7 +293,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 5400
        pg_version: ${{ env.DEFAULT_PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -310,7 +310,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 5400
        pg_version: ${{ env.DEFAULT_PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -322,7 +322,7 @@ jobs:
      uses: ./.github/actions/allure-report-generate
      with:
        store-test-results-into-db: true
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

@@ -505,7 +505,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Project
      if: contains(fromJSON('["neonvm-captest-new", "neonvm-captest-new-many-tables", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)
@@ -557,7 +557,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_perf_many_relations
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -573,7 +573,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_init
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -588,7 +588,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_simple_update
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -603,7 +603,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_select_only
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -621,7 +621,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
@@ -694,7 +694,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Set up Connection String
      id: set-up-connstr
@@ -726,7 +726,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_pgvector_indexing
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -741,7 +741,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -752,7 +752,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
@@ -828,7 +828,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Set up Connection String
      id: set-up-connstr
@@ -871,7 +871,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 43200 -k test_clickbench
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -885,7 +885,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
@@ -954,7 +954,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Get Connstring Secret Name
      run: |
@@ -1003,7 +1003,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_tpch
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -1015,7 +1015,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
@@ -1078,7 +1078,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Set up Connection String
      id: set-up-connstr
@@ -1121,7 +1121,7 @@ jobs:
        save_perf_report: ${{ env.SAVE_PERF_REPORT }}
        extra_params: -m remote_cluster --timeout 21600 -k test_user_examples
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -1132,7 +1132,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
--- a/.github/workflows/build-macos.yml
+++ b/.github/workflows/build-macos.yml
@@ -34,11 +34,10 @@ permissions:
 jobs:
  build-pgxn:
    if: |
-      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
-        github.ref_name == 'main'
-      )
+      inputs.pg_versions != '[]' || inputs.rebuild_everything ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
+      github.ref_name == 'main'
    timeout-minutes: 30
    runs-on: macos-15
    strategy:
@@ -100,13 +99,21 @@ jobs:
        run: |
          make postgres-headers-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)

+      - name: Upload "pg_install/${{ matrix.postgres-version }}" artifact
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: pg_install--${{ matrix.postgres-version }}
+          path: pg_install/${{ matrix.postgres-version }}
+          # The artifact is supposed to be used by the next job in the same workflow,
+          # so there’s no need to store it for too long.
+          retention-days: 1
+
  build-walproposer-lib:
    if: |
-      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
-        github.ref_name == 'main'
-      )
+      inputs.pg_versions != '[]' || inputs.rebuild_everything ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
+      github.ref_name == 'main'
    timeout-minutes: 30
    runs-on: macos-15
    needs: [build-pgxn]
@@ -127,12 +134,11 @@ jobs:
        id: pg_rev
        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"

-      - name: Cache postgres v17 build
-        id: cache_pg
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+      - name: Download "pg_install/v17" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
        with:
+          name: pg_install--v17
          path: pg_install/v17
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

      - name: Cache walproposer-lib
        id: cache_walproposer_lib
@@ -163,13 +169,21 @@ jobs:
        run:
          make walproposer-lib -j$(sysctl -n hw.ncpu)

+      - name: Upload "pg_install/build/walproposer-lib" artifact
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: pg_install--build--walproposer-lib
+          path: pg_install/build/walproposer-lib
+          # The artifact is supposed to be used by the next job in the same workflow,
+          # so there’s no need to store it for too long.
+          retention-days: 1
+
  cargo-build:
    if: |
-      (inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything) && (
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
-        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
-        github.ref_name == 'main'
-      )
+      inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
+      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
+      github.ref_name == 'main'
    timeout-minutes: 30
    runs-on: macos-15
    needs: [build-pgxn, build-walproposer-lib]
@@ -188,45 +202,43 @@ jobs:
        with:
          submodules: true

-      - name: Set pg v14 for caching
-        id: pg_rev_v14
-        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) | tee -a "${GITHUB_OUTPUT}"
-      - name: Set pg v15 for caching
-        id: pg_rev_v15
-        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) | tee -a "${GITHUB_OUTPUT}"
-      - name: Set pg v16 for caching
-        id: pg_rev_v16
-        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) | tee -a "${GITHUB_OUTPUT}"
-      - name: Set pg v17 for caching
-        id: pg_rev_v17
-        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"
-
-      - name: Cache postgres v14 build
-        id: cache_pg
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+      - name: Download "pg_install/v14" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
        with:
+          name: pg_install--v14
          path: pg_install/v14
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v14-${{ steps.pg_rev_v14.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
-      - name: Cache postgres v15 build
-        id: cache_pg_v15
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
-        with:
-          path: pg_install/v15
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v15-${{ steps.pg_rev_v15.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
-      - name: Cache postgres v16 build
-        id: cache_pg_v16
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
-        with:
-          path: pg_install/v16
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v16-${{ steps.pg_rev_v16.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
-      - name: Cache postgres v17 build
-        id: cache_pg_v17
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
-        with:
-          path: pg_install/v17
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

-      - name: Cache cargo deps (only for v17)
+      - name: Download "pg_install/v15" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          name: pg_install--v15
+          path: pg_install/v15
+
+      - name: Download "pg_install/v16" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          name: pg_install--v16
+          path: pg_install/v16
+
+      - name: Download "pg_install/v17" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          name: pg_install--v17
+          path: pg_install/v17
+
+      - name: Download "pg_install/build/walproposer-lib" artifact
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+        with:
+          name: pg_install--build--walproposer-lib
+          path: pg_install/build/walproposer-lib
+
+      # `actions/download-artifact` doesn't preserve permissions:
+      # https://github.com/actions/download-artifact?tab=readme-ov-file#permission-loss
+      - name: Make pg_install/v*/bin/* executable
+        run: |
+          chmod +x pg_install/v*/bin/*
+
+      - name: Cache cargo deps
        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
        with:
          path: |
@@ -236,13 +248,6 @@ jobs:
            target
          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

-      - name: Cache walproposer-lib
-        id: cache_walproposer_lib
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
-        with:
-          path: pg_install/build/walproposer-lib
-          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
-
      - name: Install build dependencies
        run: |
          brew install flex bison openssl protobuf icu4c
@@ -252,8 +257,8 @@ jobs:
          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

-      - name: Run cargo build (only for v17)
+      - name: Run cargo build
        run: cargo build --all --release -j$(sysctl -n hw.ncpu)

-      - name: Check that no warnings are produced (only for v17)
+      - name: Check that no warnings are produced
        run: ./run_clippy.sh
--- a/.github/workflows/build_and_run_selected_test.yml
+++ b/.github/workflows/build_and_run_selected_test.yml
@@ -0,0 +1,120 @@
+name: Build and Run Selected Test
+
+on:
+  workflow_dispatch:
+    inputs:
+      test-selection:
+        description: 'Specification of selected test(s), as accepted by pytest -k'
+        required: true
+        type: string
+      run-count:
+        description: 'Number of test runs to perform'
+        required: true
+        type: number
+      archs:
+        description: 'Archs to run tests on, e. g.: ["x64", "arm64"]'
+        default: '["x64"]'
+        required: true
+        type: string
+      build-types:
+        description: 'Build types to run tests on, e. g.: ["debug", "release"]'
+        default: '["release"]'
+        required: true
+        type: string
+      pg-versions:
+        description: 'Postgres versions to use for testing,  e.g,: [{"pg_version":"v16"}, {"pg_version":"v17"}])'
+        default: '[{"pg_version":"v17"}]'
+        required: true
+        type: string
+
+defaults:
+  run:
+    shell: bash -euxo pipefail {0}
+
+env:
+  RUST_BACKTRACE: 1
+  COPT: '-Werror'
+
+jobs:
+  meta:
+    uses: ./.github/workflows/_meta.yml
+    with:
+      github-event-name: ${{ github.event_name }}
+      github-event-json: ${{ toJSON(github.event) }}
+
+  build-and-test-locally:
+    needs: [ meta ]
+    strategy:
+      fail-fast: false
+      matrix:
+        arch: ${{ fromJson(inputs.archs) }}
+        build-type: ${{ fromJson(inputs.build-types) }}
+    uses: ./.github/workflows/_build-and-test-locally.yml
+    with:
+      arch: ${{ matrix.arch }}
+      build-tools-image: ghcr.io/neondatabase/build-tools:pinned-bookworm
+      build-tag: ${{ needs.meta.outputs.build-tag }}
+      build-type: ${{ matrix.build-type }}
+      test-cfg: ${{ inputs.pg-versions }}
+      test-selection: ${{ inputs.test-selection }}
+      test-run-count: ${{ fromJson(inputs.run-count) }}
+    secrets: inherit
+
+  create-test-report:
+    needs: [ build-and-test-locally ]
+    if: ${{ !cancelled() }}
+    permissions:
+      id-token: write # aws-actions/configure-aws-credentials
+      statuses: write
+      contents: write
+      pull-requests: write
+    outputs:
+      report-url: ${{ steps.create-allure-report.outputs.report-url }}
+
+    runs-on: [ self-hosted, small ]
+    container:
+      image: ghcr.io/neondatabase/build-tools:pinned-bookworm
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+      options: --init
+
+    steps:
+      - name: Harden the runner (Audit all outbound calls)
+        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
+        with:
+          egress-policy: audit
+
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+
+      - name: Create Allure report
+        if: ${{ !cancelled() }}
+        id: create-allure-report
+        uses: ./.github/actions/allure-report-generate
+        with:
+          store-test-results-into-db: true
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        env:
+          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_DEV }}
+
+      - uses: actions/github-script@v7
+        if: ${{ !cancelled() }}
+        with:
+          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries
+          retries: 5
+          script: |
+            const report = {
+              reportUrl:     "${{ steps.create-allure-report.outputs.report-url }}",
+              reportJsonUrl: "${{ steps.create-allure-report.outputs.report-json-url }}",
+            }
+
+            const coverage = {}
+
+            const script = require("./scripts/comment-test-report.js")
+            await script({
+              github,
+              context,
+              fetch,
+              report,
+              coverage,
+            })
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -317,7 +317,7 @@ jobs:
          extra_params: --splits 5 --group ${{ matrix.pytest_split_group }}
          benchmark_durations: ${{ needs.get-benchmarks-durations.outputs.json }}
          pg_version: v16
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
          PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -384,7 +384,7 @@ jobs:
        uses: ./.github/actions/allure-report-generate
        with:
          store-test-results-into-db: true
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

@@ -451,14 +451,14 @@ jobs:
        with:
          name: neon-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build_type }}-artifact
          path: /tmp/neon
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Get coverage artifact
        uses: ./.github/actions/download
        with:
          name: coverage-data-artifact
          path: /tmp/coverage
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Merge coverage data
        run: scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage merge
--- a/.github/workflows/build_and_test_with_sanitizers.yml
+++ b/.github/workflows/build_and_test_with_sanitizers.yml
@@ -117,7 +117,7 @@ jobs:
        uses: ./.github/actions/allure-report-generate
        with:
          store-test-results-into-db: true
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

--- a/.github/workflows/cloud-extensions.yml
+++ b/.github/workflows/cloud-extensions.yml
@@ -0,0 +1,112 @@
+name: Cloud Extensions Test
+on:
+  schedule:
+    # * is a special character in YAML so you have to quote this string
+    #          ┌───────────── minute (0 - 59)
+    #          │ ┌───────────── hour (0 - 23)
+    #          │ │ ┌───────────── day of the month (1 - 31)
+    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
+    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
+    - cron:  '45 1 * * *' # run once a day, timezone is utc
+  workflow_dispatch: # adds ability to run this manually
+    inputs:
+      region_id:
+        description: 'Project region id. If not set, the default region will be used'
+        required: false
+        default: 'aws-us-east-2'
+
+defaults:
+  run:
+    shell: bash -euxo pipefail {0}
+
+permissions:
+  id-token: write # aws-actions/configure-aws-credentials
+  statuses: write
+  contents: write
+
+jobs:
+  regress:
+    env:
+      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
+      TEST_OUTPUT: /tmp/test_output
+      BUILD_TYPE: remote
+    strategy:
+      fail-fast: false
+      matrix:
+        pg-version: [16, 17]
+
+    runs-on: [ self-hosted, small ]
+    container:
+      # We use the neon-test-extensions image here as it contains the source code for the extensions.
+      image: ghcr.io/neondatabase/neon-test-extensions-v${{ matrix.pg-version }}:latest
+      credentials:
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+      options: --init
+
+    steps:
+      - name: Harden the runner (Audit all outbound calls)
+        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
+        with:
+          egress-policy: audit
+
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+
+      - name: Evaluate the settings
+        id: project-settings
+        run: |
+          if [[ $((${{ matrix.pg-version }})) -lt 17 ]]; then
+            ULID=ulid
+          else
+            ULID=pgx_ulid
+          fi
+          LIBS=timescaledb:rag_bge_small_en_v15,rag_jina_reranker_v1_tiny_en:$ULID
+          settings=$(jq -c -n --arg libs $LIBS '{preload_libraries:{use_defaults:false,enabled_libraries:($libs| split(":"))}}')
+          echo settings=$settings >> $GITHUB_OUTPUT
+          
+      - name: Create Neon Project
+        id: create-neon-project
+        uses: ./.github/actions/neon-project-create
+        with:
+          region_id: ${{ inputs.region_id }}
+          postgres_version: ${{ matrix.pg-version }}
+          project_settings: ${{ steps.project-settings.outputs.settings }}
+          # We need these settings to get the expected output results.
+          # We cannot use the environment variables e.g. PGTZ due to
+          # https://github.com/neondatabase/neon/issues/1287
+          default_endpoint_settings: >
+            {
+              "pg_settings": {
+                "DateStyle": "Postgres,MDY",
+                "TimeZone": "America/Los_Angeles",
+                "compute_query_id": "off",
+                "neon.allow_unstable_extensions": "on"
+              }
+            }
+          api_key: ${{ secrets.NEON_STAGING_API_KEY }}
+          admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }}
+
+      - name: Run the regression tests
+        run: /run-tests.sh -r /ext-src
+        env:
+          BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
+          SKIP: "pg_hint_plan-src,pg_repack-src,pg_cron-src,plpgsql_check-src"
+
+      - name: Delete Neon Project
+        if: ${{ always() }}
+        uses: ./.github/actions/neon-project-delete
+        with:
+          project_id: ${{ steps.create-neon-project.outputs.project_id }}
+          api_key: ${{ secrets.NEON_STAGING_API_KEY }}
+
+      - name: Post to a Slack channel
+        if: ${{ github.event.schedule && failure() }}
+        uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
+        with:
+          channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
+          slack-message: |
+            Periodic extensions test on staging: ${{ job.status }}
+            <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
+
--- a/.github/workflows/cloud-regress.yml
+++ b/.github/workflows/cloud-regress.yml
@@ -89,7 +89,7 @@ jobs:
          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
          path: /tmp/neon/
          prefix: latest
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Create a new branch
        id: create-branch
@@ -105,7 +105,7 @@ jobs:
          test_selection: cloud_regress
          pg_version: ${{matrix.pg-version}}
          extra_params: -m remote_cluster
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}

@@ -122,7 +122,7 @@ jobs:
        if: ${{ !cancelled() }}
        uses: ./.github/actions/allure-report-generate
        with:
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Post to a Slack channel
        if: ${{ github.event.schedule && failure() }}
--- a/.github/workflows/ingest_benchmark.yml
+++ b/.github/workflows/ingest_benchmark.yml
@@ -32,7 +32,7 @@ jobs:
      fail-fast: false # allow other variants to continue even if one fails
      matrix:
        include:
-          - target_project: new_empty_project_stripe_size_2048 
+          - target_project: new_empty_project_stripe_size_2048
            stripe_size: 2048 # 16 MiB
            postgres_version: 16
            disable_sharding: false
@@ -98,7 +98,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Project
      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
@@ -110,10 +110,10 @@ jobs:
        compute_units: '[7, 7]' # we want to test large compute here to avoid compute-side bottleneck
        api_key: ${{ secrets.NEON_STAGING_API_KEY }}
        shard_split_project: ${{ matrix.stripe_size != null && 'true' || 'false' }}
-        admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }} 
+        admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }}
        shard_count: 8
        stripe_size: ${{ matrix.stripe_size }}
-        disable_sharding: ${{ matrix.disable_sharding }} 
+        disable_sharding: ${{ matrix.disable_sharding }}

    - name: Initialize Neon project
      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
@@ -171,7 +171,7 @@ jobs:
        extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb
        pg_version: v${{ matrix.postgres_version }}
        save_perf_report: true
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}
        TARGET_PROJECT_TYPE: ${{ matrix.target_project }}
--- a/.github/workflows/large_oltp_benchmark.yml
+++ b/.github/workflows/large_oltp_benchmark.yml
@@ -33,9 +33,9 @@ jobs:
      fail-fast: false # allow other variants to continue even if one fails
      matrix:
        include:
-          - target: new_branch 
+          - target: new_branch
            custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100
-          - target: reuse_branch 
+          - target: reuse_branch
            custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100
      max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results
    permissions:
@@ -43,7 +43,7 @@ jobs:
      statuses: write
      id-token: write # aws-actions/configure-aws-credentials
    env:
-      TEST_PG_BENCH_DURATIONS_MATRIX: "1h" # todo update to > 1 h 
+      TEST_PG_BENCH_DURATIONS_MATRIX: "1h" # todo update to > 1 h
      TEST_PGBENCH_CUSTOM_SCRIPTS: ${{ matrix.custom_scripts }}
      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
      PG_VERSION: 16 # pre-determined by pre-determined project
@@ -85,7 +85,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Branch for large tenant
      if: ${{ matrix.target == 'new_branch' }}
@@ -129,7 +129,7 @@ jobs:
        ${PSQL} "${BENCHMARK_CONNSTR}" -c "SET statement_timeout = 0; DELETE FROM webhook.incoming_webhooks WHERE created_at > '2025-02-27 23:59:59+00';"
        echo "$(date '+%Y-%m-%d %H:%M:%S') - Finished deleting rows in table webhook.incoming_webhooks from prior runs"

-    - name: Benchmark pgbench with custom-scripts 
+    - name: Benchmark pgbench with custom-scripts
      uses: ./.github/actions/run-python-test-set
      with:
        build_type: ${{ env.BUILD_TYPE }}
@@ -138,7 +138,7 @@ jobs:
        save_perf_report: true
        extra_params: -m remote_cluster --timeout 7200 -k test_perf_oltp_large_tenant_pgbench
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -153,7 +153,7 @@ jobs:
        save_perf_report: true
        extra_params: -m remote_cluster --timeout 172800 -k test_perf_oltp_large_tenant_maintenance
        pg_version: ${{ env.PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr_without_pooler }}
        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -179,8 +179,8 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
-  
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+
    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
--- a/.github/workflows/neon_extra_builds.yml
+++ b/.github/workflows/neon_extra_builds.yml
@@ -69,10 +69,6 @@ jobs:

  check-macos-build:
    needs: [ check-permissions, files-changed ]
-    if: |
-      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||
-      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
-      github.ref_name == 'main'
    uses: ./.github/workflows/build-macos.yml
    with:
      pg_versions: ${{ needs.files-changed.outputs.postgres_changes }}
--- a/.github/workflows/periodic_pagebench.yml
+++ b/.github/workflows/periodic_pagebench.yml
@@ -147,7 +147,7 @@ jobs:
      if: ${{ !cancelled() }}
      uses: ./.github/actions/allure-report-generate
      with:
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Post to a Slack channel
      if: ${{ github.event.schedule && failure() }}
--- a/.github/workflows/pg-clients.yml
+++ b/.github/workflows/pg-clients.yml
@@ -103,7 +103,7 @@ jobs:
          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
          path: /tmp/neon/
          prefix: latest
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Create Neon Project
        id: create-neon-project
@@ -122,7 +122,7 @@ jobs:
          run_in_parallel: false
          extra_params: -m remote_cluster
          pg_version: ${{ env.DEFAULT_PG_VERSION }}
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

@@ -139,7 +139,7 @@ jobs:
        uses: ./.github/actions/allure-report-generate
        with:
          store-test-results-into-db: true
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

@@ -178,7 +178,7 @@ jobs:
        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
        path: /tmp/neon/
        prefix: latest
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Project
      id: create-neon-project
@@ -195,7 +195,7 @@ jobs:
        run_in_parallel: false
        extra_params: -m remote_cluster
        pg_version: ${{ env.DEFAULT_PG_VERSION }}
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

@@ -212,7 +212,7 @@ jobs:
      uses: ./.github/actions/allure-report-generate
      with:
        store-test-results-into-db: true
-        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+        aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
        REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

--- a/.github/workflows/random-ops-test.yml
+++ b/.github/workflows/random-ops-test.yml
@@ -66,7 +66,7 @@ jobs:
          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
          path: /tmp/neon/
          prefix: latest
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

      - name: Run tests
        uses: ./.github/actions/run-python-test-set
@@ -76,7 +76,7 @@ jobs:
          run_in_parallel: false
          extra_params: -m remote_cluster
          pg_version: ${{ matrix.pg-version }}
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          NEON_API_KEY: ${{ secrets.NEON_STAGING_API_KEY }}
          RANDOM_SEED: ${{ inputs.random_seed }}
@@ -88,6 +88,6 @@ jobs:
        uses: ./.github/actions/allure-report-generate
        with:
          store-test-results-into-db: true
-          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
+          aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
        env:
          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -253,6 +253,17 @@ version = "1.1.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "a8ab6b55fe97976e46f91ddbed8d147d966475dc29b2032757ba47e02376fbc3"

+[[package]]
+name = "atomic_enum"
+version = "0.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "99e1aca718ea7b89985790c94aad72d77533063fe00bc497bb79a7c2dae6a661"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn 2.0.100",
+]
+
 [[package]]
 name = "autocfg"
 version = "1.1.0"
@@ -687,13 +698,40 @@ dependencies = [
 "tracing",
 ]

+[[package]]
+name = "axum"
+version = "0.7.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f"
+dependencies = [
+ "async-trait",
+ "axum-core 0.4.5",
+ "bytes",
+ "futures-util",
+ "http 1.1.0",
+ "http-body 1.0.0",
+ "http-body-util",
+ "itoa",
+ "matchit 0.7.3",
+ "memchr",
+ "mime",
+ "percent-encoding",
+ "pin-project-lite",
+ "rustversion",
+ "serde",
+ "sync_wrapper 1.0.1",
+ "tower 0.5.2",
+ "tower-layer",
+ "tower-service",
+]
+
 [[package]]
 name = "axum"
 version = "0.8.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "6d6fd624c75e18b3b4c6b9caf42b1afe24437daaee904069137d8bab077be8b8"
 dependencies = [
- "axum-core",
+ "axum-core 0.5.0",
 "base64 0.22.1",
 "bytes",
 "form_urlencoded",
@@ -704,7 +742,7 @@ dependencies = [
 "hyper 1.4.1",
 "hyper-util",
 "itoa",
- "matchit",
+ "matchit 0.8.4",
 "memchr",
 "mime",
 "percent-encoding",
@@ -724,6 +762,26 @@ dependencies = [
 "tracing",
 ]

+[[package]]
+name = "axum-core"
+version = "0.4.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199"
+dependencies = [
+ "async-trait",
+ "bytes",
+ "futures-util",
+ "http 1.1.0",
+ "http-body 1.0.0",
+ "http-body-util",
+ "mime",
+ "pin-project-lite",
+ "rustversion",
+ "sync_wrapper 1.0.1",
+ "tower-layer",
+ "tower-service",
+]
+
 [[package]]
 name = "axum-core"
 version = "0.5.0"
@@ -750,8 +808,8 @@ version = "0.10.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "460fc6f625a1f7705c6cf62d0d070794e94668988b1c38111baeec177c715f7b"
 dependencies = [
- "axum",
- "axum-core",
+ "axum 0.8.1",
+ "axum-core 0.5.0",
 "bytes",
 "futures-util",
 "headers",
@@ -1086,6 +1144,25 @@ version = "0.3.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"

+[[package]]
+name = "cbindgen"
+version = "0.28.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "eadd868a2ce9ca38de7eeafdcec9c7065ef89b42b32f0839278d55f35c54d1ff"
+dependencies = [
+ "clap",
+ "heck 0.4.1",
+ "indexmap 2.9.0",
+ "log",
+ "proc-macro2",
+ "quote",
+ "serde",
+ "serde_json",
+ "syn 2.0.100",
+ "tempfile",
+ "toml",
+]
+
 [[package]]
 name = "cc"
 version = "1.2.16"
@@ -1206,7 +1283,7 @@ version = "4.5.18"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "4ac6a0c7b1a9e9a5186361f67dfa1b88213572f427fb9ab038efb2bd8c582dab"
 dependencies = [
- "heck",
+ "heck 0.5.0",
 "proc-macro2",
 "quote",
 "syn 2.0.100",
@@ -1264,13 +1341,40 @@ dependencies = [
 "unicode-width",
 ]

+[[package]]
+name = "communicator"
+version = "0.1.0"
+dependencies = [
+ "atomic_enum",
+ "bytes",
+ "cbindgen",
+ "http 1.1.0",
+ "libc",
+ "neonart",
+ "nix 0.27.1",
+ "pageserver_client_grpc",
+ "pageserver_data_api",
+ "prost 0.13.3",
+ "thiserror 1.0.69",
+ "tokio",
+ "tokio-epoll-uring",
+ "tokio-pipe",
+ "tonic",
+ "tracing",
+ "tracing-subscriber",
+ "uring-common",
+ "utils",
+ "zerocopy 0.8.24",
+ "zerocopy-derive 0.8.24",
+]
+
 [[package]]
 name = "compute_api"
 version = "0.1.0"
 dependencies = [
 "anyhow",
 "chrono",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "jsonwebtoken",
 "regex",
 "remote_storage",
@@ -1288,7 +1392,7 @@ dependencies = [
 "aws-sdk-kms",
 "aws-sdk-s3",
 "aws-smithy-types",
- "axum",
+ "axum 0.8.1",
 "axum-extra",
 "base64 0.13.1",
 "bytes",
@@ -1301,7 +1405,7 @@ dependencies = [
 "flate2",
 "futures",
 "http 1.1.0",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "jsonwebtoken",
 "metrics",
 "nix 0.27.1",
@@ -1927,7 +2031,7 @@ checksum = "0892a17df262a24294c382f0d5997571006e7a4348b4327557c4ff1cd4a8bccc"
 dependencies = [
 "darling",
 "either",
- "heck",
+ "heck 0.5.0",
 "proc-macro2",
 "quote",
 "syn 2.0.100",
@@ -2041,7 +2145,7 @@ name = "endpoint_storage"
 version = "0.0.1"
 dependencies = [
 "anyhow",
- "axum",
+ "axum 0.8.1",
 "axum-extra",
 "camino",
 "camino-tempfile",
@@ -2588,7 +2692,7 @@ dependencies = [
 "futures-sink",
 "futures-util",
 "http 0.2.9",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "slab",
 "tokio",
 "tokio-util",
@@ -2607,7 +2711,7 @@ dependencies = [
 "futures-sink",
 "futures-util",
 "http 1.1.0",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "slab",
 "tokio",
 "tokio-util",
@@ -2703,6 +2807,12 @@ dependencies = [
 "http 1.1.0",
 ]

+[[package]]
+name = "heck"
+version = "0.4.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "95505c38b4572b2d910cecb0281560f54b440a19336cbbcb27bf6ce6adc6f5a8"
+
 [[package]]
 name = "heck"
 version = "0.5.0"
@@ -3191,12 +3301,12 @@ dependencies = [

 [[package]]
 name = "indexmap"
-version = "2.0.1"
+version = "2.9.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "ad227c3af19d4914570ad36d30409928b75967c298feb9ea1969db3a610bb14e"
+checksum = "cea70ddb795996207ad57735b50c5982d8844f38ba9ee5f1aedcfb708a2aa11e"
 dependencies = [
 "equivalent",
- "hashbrown 0.14.5",
+ "hashbrown 0.15.2",
 "serde",
 ]

@@ -3219,7 +3329,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "232929e1d75fe899576a3d5c7416ad0d88dbfbb3c3d6aa00873a7408a50ddb88"
 dependencies = [
 "ahash",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "is-terminal",
 "itoa",
 "log",
@@ -3242,7 +3352,7 @@ dependencies = [
 "crossbeam-utils",
 "dashmap 6.1.0",
 "env_logger",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "itoa",
 "log",
 "num-format",
@@ -3594,6 +3704,12 @@ dependencies = [
 "regex-automata 0.1.10",
 ]

+[[package]]
+name = "matchit"
+version = "0.7.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94"
+
 [[package]]
 name = "matchit"
 version = "0.8.4"
@@ -3639,7 +3755,7 @@ version = "0.0.22"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "b9e6777fc80a575f9503d908c8b498782a6c3ee88a06cb416dc3941401e43b94"
 dependencies = [
- "heck",
+ "heck 0.5.0",
 "proc-macro2",
 "quote",
 "syn 2.0.100",
@@ -3785,6 +3901,15 @@ version = "0.8.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "e5ce46fe64a9d73be07dcbe690a38ce1b293be448fd8ce1e6c1b8062c9f72c6a"

+[[package]]
+name = "neonart"
+version = "0.1.0"
+dependencies = [
+ "rand 0.8.5",
+ "tracing",
+ "zerocopy 0.8.24",
+]
+
 [[package]]
 name = "never-say-never"
 version = "6.6.666"
@@ -4208,6 +4333,8 @@ dependencies = [
 "humantime-serde",
 "pageserver_api",
 "pageserver_client",
+ "pageserver_client_grpc",
+ "pageserver_data_api",
 "rand 0.8.5",
 "reqwest",
 "serde",
@@ -4284,6 +4411,8 @@ dependencies = [
 "pageserver_api",
 "pageserver_client",
 "pageserver_compaction",
+ "pageserver_data_api",
+ "peekable",
 "pem",
 "pin-project-lite",
 "postgres-protocol",
@@ -4295,6 +4424,7 @@ dependencies = [
 "pprof",
 "pq_proto",
 "procfs",
+ "prost 0.13.3",
 "rand 0.8.5",
 "range-set-blaze",
 "regex",
@@ -4326,6 +4456,7 @@ dependencies = [
 "tokio-tar",
 "tokio-util",
 "toml_edit",
+ "tonic",
 "tracing",
 "tracing-utils",
 "url",
@@ -4390,6 +4521,18 @@ dependencies = [
 "workspace_hack",
 ]

+[[package]]
+name = "pageserver_client_grpc"
+version = "0.1.0"
+dependencies = [
+ "bytes",
+ "http 1.1.0",
+ "pageserver_data_api",
+ "thiserror 1.0.69",
+ "tonic",
+ "tracing",
+]
+
 [[package]]
 name = "pageserver_compaction"
 version = "0.1.0"
@@ -4413,6 +4556,17 @@ dependencies = [
 "workspace_hack",
 ]

+[[package]]
+name = "pageserver_data_api"
+version = "0.1.0"
+dependencies = [
+ "prost 0.13.3",
+ "thiserror 1.0.69",
+ "tonic",
+ "tonic-build",
+ "utils",
+]
+
 [[package]]
 name = "papaya"
 version = "0.2.1"
@@ -4539,6 +4693,15 @@ dependencies = [
 "sha2",
 ]

+[[package]]
+name = "peekable"
+version = "0.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "225f9651e475709164f871dc2f5724956be59cb9edb055372ffeeab01ec2d20b"
+dependencies = [
+ "smallvec",
+]
+
 [[package]]
 name = "pem"
 version = "3.0.3"
@@ -5010,7 +5173,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "22505a5c94da8e3b7c2996394d1c933236c4d743e81a410bcca4e6989fc066a4"
 dependencies = [
 "bytes",
- "heck",
+ "heck 0.5.0",
 "itertools 0.12.1",
 "log",
 "multimap",
@@ -5031,7 +5194,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "0c1318b19085f08681016926435853bbf7858f9c082d0999b80550ff5d9abe15"
 dependencies = [
 "bytes",
- "heck",
+ "heck 0.5.0",
 "itertools 0.12.1",
 "log",
 "multimap",
@@ -5134,7 +5297,7 @@ dependencies = [
 "hyper 0.14.30",
 "hyper 1.4.1",
 "hyper-util",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "ipnet",
 "itertools 0.10.5",
 "itoa",
@@ -5645,7 +5808,7 @@ dependencies = [
 "async-trait",
 "getrandom 0.2.11",
 "http 1.1.0",
- "matchit",
+ "matchit 0.8.4",
 "opentelemetry",
 "reqwest",
 "reqwest-middleware",
@@ -6806,7 +6969,7 @@ version = "0.26.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "4c6bee85a5a24955dc440386795aa378cd9cf82acd5f764469152d2270e581be"
 dependencies = [
- "heck",
+ "heck 0.5.0",
 "proc-macro2",
 "quote",
 "rustversion",
@@ -7231,6 +7394,16 @@ dependencies = [
 "syn 2.0.100",
 ]

+[[package]]
+name = "tokio-pipe"
+version = "0.2.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f213a84bffbd61b8fa0ba8a044b4bbe35d471d0b518867181e82bd5c15542784"
+dependencies = [
+ "libc",
+ "tokio",
+]
+
 [[package]]
 name = "tokio-postgres"
 version = "0.7.10"
@@ -7413,7 +7586,7 @@ version = "0.22.14"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "f21c7aaf97f1bd9ca9d4f9e73b0a6c74bd5afef56f2bc931943a6e1c37e04e38"
 dependencies = [
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "serde",
 "serde_spanned",
 "toml_datetime",
@@ -7426,9 +7599,13 @@ version = "0.12.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "877c5b330756d856ffcc4553ab34a5684481ade925ecc54bcd1bf02b1d0d4d52"
 dependencies = [
+ "async-stream",
 "async-trait",
+ "axum 0.7.9",
 "base64 0.22.1",
 "bytes",
+ "flate2",
+ "h2 0.4.4",
 "http 1.1.0",
 "http-body 1.0.0",
 "http-body-util",
@@ -7440,6 +7617,7 @@ dependencies = [
 "prost 0.13.3",
 "rustls-native-certs 0.8.0",
 "rustls-pemfile 2.1.1",
+ "socket2",
 "tokio",
 "tokio-rustls 0.26.0",
 "tokio-stream",
@@ -7939,7 +8117,7 @@ name = "vm_monitor"
 version = "0.1.0"
 dependencies = [
 "anyhow",
- "axum",
+ "axum 0.8.1",
 "cgroups-rs",
 "clap",
 "futures",
@@ -8449,7 +8627,7 @@ dependencies = [
 "hyper 1.4.1",
 "hyper-util",
 "indexmap 1.9.3",
- "indexmap 2.0.1",
+ "indexmap 2.9.0",
 "itertools 0.12.1",
 "lazy_static",
 "libc",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -8,6 +8,7 @@ members = [
    "pageserver/compaction",
    "pageserver/ctl",
    "pageserver/client",
+    "pageserver/client_grpc",
    "pageserver/pagebench",
    "proxy",
    "safekeeper",
@@ -29,6 +30,7 @@ members = [
    "libs/pq_proto",
    "libs/tenant_size_model",
    "libs/metrics",
+    "libs/neonart",
    "libs/postgres_connection",
    "libs/remote_storage",
    "libs/tracing-utils",
@@ -41,6 +43,7 @@ members = [
    "libs/proxy/postgres-types2",
    "libs/proxy/tokio-postgres2",
    "endpoint_storage",
+    "pgxn/neon/communicator",
 ]

 [workspace.package]
@@ -142,6 +145,7 @@ parquet = { version = "53", default-features = false, features = ["zstd"] }
 parquet_derive = "53"
 pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }
 pem = "3.0.3"
+peekable = "0.3.0"
 pin-project-lite = "0.2"
 pprof = { version = "0.14", features = ["criterion", "flamegraph", "frame-pointer", "prost-codec"] }
 procfs = "0.16"
@@ -187,7 +191,6 @@ thiserror = "1.0"
 tikv-jemallocator = { version = "0.6", features = ["profiling", "stats", "unprefixed_malloc_on_supported_platforms"] }
 tikv-jemalloc-ctl = { version = "0.6", features = ["stats"] }
 tokio = { version = "1.43.1", features = ["macros"] }
-tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
 tokio-io-timeout = "1.2.0"
 tokio-postgres-rustls = "0.12.0"
 tokio-rustls = { version = "0.26.0", default-features = false, features = ["tls12", "ring"]}
@@ -196,7 +199,7 @@ tokio-tar = "0.3"
 tokio-util = { version = "0.7.10", features = ["io", "rt"] }
 toml = "0.8"
 toml_edit = "0.22"
-tonic = {version = "0.12.3", default-features = false, features = ["channel", "tls", "tls-roots"]}
+tonic = {version = "0.12.3", default-features = false, features = ["channel", "server", "tls", "tls-roots", "gzip"]}
 tower = { version = "0.5.2", default-features = false }
 tower-http = { version = "0.6.2", features = ["auth", "request-id", "trace"] }

@@ -228,6 +231,9 @@ x509-cert = { version = "0.2.5" }
 env_logger = "0.11"
 log = "0.4"

+tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
+uring-common = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
+
 ## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed
 postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
 postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
@@ -245,9 +251,12 @@ compute_api = { version = "0.1", path = "./libs/compute_api/" }
 consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
 http-utils = { version = "0.1", path = "./libs/http-utils/" }
 metrics = { version = "0.1", path = "./libs/metrics/" }
+neonart = { version = "0.1", path = "./libs/neonart/" }
 pageserver = { path = "./pageserver" }
 pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }
 pageserver_client = { path = "./pageserver/client" }
+pageserver_client_grpc = { path = "./pageserver/client_grpc" }
+pageserver_data_api = { path = "./pageserver/data_api" }
 pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }
 postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }
 postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }
@@ -271,6 +280,7 @@ wal_decoder = { version = "0.1", path = "./libs/wal_decoder" }
 workspace_hack = { version = "0.1", path = "./workspace_hack/" }

 ## Build dependencies
+cbindgen = "0.28.0"
 criterion = "0.5.1"
 rcgen = "0.13"
 rstest = "0.18"
--- a/7
+++ b/7
@@ -18,10 +18,12 @@ ifeq ($(BUILD_TYPE),release)
 	PG_LDFLAGS = $(LDFLAGS)
 	# Unfortunately, `--profile=...` is a nightly feature
 	CARGO_BUILD_FLAGS += --release
+	NEON_CARGO_ARTIFACT_TARGET_DIR = $(ROOT_PROJECT_DIR)/target/release
 else ifeq ($(BUILD_TYPE),debug)
 	PG_CONFIGURE_OPTS = --enable-debug --with-openssl --enable-cassert --enable-depend
 	PG_CFLAGS += -O0 -g3 $(CFLAGS)
 	PG_LDFLAGS = $(LDFLAGS)
+	NEON_CARGO_ARTIFACT_TARGET_DIR = $(ROOT_PROJECT_DIR)/target/debug
 else
 	$(error Bad build type '$(BUILD_TYPE)', see Makefile for options)
 endif
@@ -180,11 +182,16 @@ postgres-check-%: postgres-%

 .PHONY: neon-pg-ext-%
 neon-pg-ext-%: postgres-%
+	+@echo "Compiling communicator $*"
+	$(CARGO_CMD_PREFIX) cargo build -p communicator $(CARGO_BUILD_FLAGS)
+
 	+@echo "Compiling neon $*"
 	mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-$*
 	$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \
+		LIBCOMMUNICATOR_PATH=$(NEON_CARGO_ARTIFACT_TARGET_DIR) \
 		-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \
 		-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile install
+
 	+@echo "Compiling neon_walredo $*"
 	mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$*
 	$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \
--- a/compute/compute-node.Dockerfile
+++ b/compute/compute-node.Dockerfile
@@ -1800,8 +1800,8 @@ COPY compute/patches/pg_repack.patch /ext-src
 RUN cd /ext-src/pg_repack-src && patch -p1 </ext-src/pg_repack.patch && rm -f /ext-src/pg_repack.patch

 COPY --chmod=755 docker-compose/run-tests.sh /run-tests.sh
-RUN apt-get update && apt-get install -y libtap-parser-sourcehandler-pgtap-perl\
-   && apt clean && rm -rf /ext-src/*.tar.gz /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y libtap-parser-sourcehandler-pgtap-perl jq \
+   && apt clean && rm -rf /ext-src/*.tar.gz /ext-src/*.patch /var/lib/apt/lists/*
 ENV PATH=/usr/local/pgsql/bin:$PATH
 ENV PGHOST=compute
 ENV PGPORT=55433
--- a/docker-compose/docker_compose_test.sh
+++ b/docker-compose/docker_compose_test.sh
@@ -65,7 +65,7 @@ for pg_version in ${TEST_VERSION_ONLY-14 15 16 17}; do
        docker compose cp "${TMPDIR}/data" compute:/postgres/contrib/file_fdw/data
        rm -rf "${TMPDIR}"
        # Apply patches
-        docker compose exec -i neon-test-extensions bash -c "(cd /postgres && patch -p1)" <"../compute/patches/contrib_pg${pg_version}.patch"
+        docker compose exec -T neon-test-extensions bash -c "(cd /postgres && patch -p1)" <"../compute/patches/contrib_pg${pg_version}.patch"
        # We are running tests now
        rm -f testout.txt testout_contrib.txt
        docker compose exec -e USE_PGXS=1 -e SKIP=timescaledb-src,rdkit-src,postgis-src,pg_jsonschema-src,kq_imcx-src,wal2json_2_5-src,rag_jina_reranker_v1_tiny_en-src,rag_bge_small_en_v15-src \
--- a/docker-compose/ext-src/README.md
+++ b/docker-compose/ext-src/README.md
@@ -0,0 +1,99 @@
+# PostgreSQL Extensions for Testing
+
+This directory contains PostgreSQL extensions used primarily for:
+1. Testing extension upgrades between different Compute versions
+2. Running regression tests with regular users (mostly for cloud instances)
+
+## Directory Structure
+
+Each extension directory follows a standard structure:
+
+- `extension-name-src/` - Directory containing test files for the extension
+  - `test-upgrade.sh` - Script for testing upgrade scenarios
+  - `regular-test.sh` - Script for testing with regular users
+  - Additional test files depending on the extension
+
+## Available Extensions
+
+This directory includes the following extensions:
+
+- `hll-src` - HyperLogLog, a fixed-size data structure for approximating cardinality
+- `hypopg-src` - Extension to create hypothetical indexes
+- `ip4r-src` - IPv4/v6 and subnet data types
+- `pg_cron-src` - Run periodic jobs in PostgreSQL
+- `pg_graphql-src` - GraphQL support for PostgreSQL
+- `pg_hint_plan-src` - Execution plan hints
+- `pg_ivm-src` - Incremental view maintenance
+- `pg_jsonschema-src` - JSON Schema validation
+- `pg_repack-src` - Reorganize tables with minimal locks
+- `pg_roaringbitmap-src` - Roaring bitmap implementation
+- `pg_semver-src` - Semantic version data type
+- `pg_session_jwt-src` - JWT authentication for PostgreSQL
+- `pg_tiktoken-src` - OpenAI Tiktoken tokenizer
+- `pg_uuidv7-src` - UUIDv7 implementation for PostgreSQL
+- `pgjwt-src` - JWT tokens for PostgreSQL
+- `pgrag-src` - Retrieval Augmented Generation for PostgreSQL
+- `pgtap-src` - Unit testing framework for PostgreSQL
+- `pgvector-src` - Vector similarity search
+- `pgx_ulid-src` - ULID data type
+- `plv8-src` - JavaScript language for PostgreSQL stored procedures
+- `postgresql-unit-src` - SI units for PostgreSQL
+- `prefix-src` - Prefix matching for strings
+- `rag_bge_small_en_v15-src` - BGE embedding model for RAG
+- `rag_jina_reranker_v1_tiny_en-src` - Jina reranker model for RAG
+- `rum-src` - RUM access method for text search
+
+## Usage
+
+### Extension Upgrade Testing
+
+The extensions in this directory are used by the `test-upgrade.sh` script to test upgrading extensions between different versions of Neon Compute nodes. The script:
+
+1. Creates a database with extensions installed on an old Compute version
+2. Creates timelines for each extension
+3. Switches to a new Compute version and tests the upgrade process
+4. Verifies extension functionality after upgrade
+
+### Regular User Testing
+
+For testing with regular users (particularly for cloud instances), each extension directory typically contains a `regular-test.sh` script that:
+
+1. Drops the database if it exists
+2. Creates a fresh test database
+3. Installs the extension
+4. Runs regression tests
+
+A note about pg_regress: Since pg_regress attempts to set `lc_messages` for the database by default, which is forbidden for regular users, we create databases manually and use the `--use-existing` option to bypass this limitation.
+
+### CI Workflows
+
+Two main workflows use these extensions:
+
+1. **Cloud Extensions Test** - Tests extensions on Neon cloud projects
+2. **Force Test Upgrading of Extension** - Tests upgrading extensions between different Compute versions
+
+These workflows are integrated into the build-and-test pipeline through shell scripts:
+
+- `docker_compose_test.sh` - Tests extensions in a Docker Compose environment
+       
+- `test_extensions_upgrade.sh` - Tests extension upgrades between different Compute versions
+
+## Adding New Extensions
+
+To add a new extension for testing:
+
+1. Create a directory named `extension-name-src` in this directory
+2. Add at minimum:
+   - `regular-test.sh` for testing with regular users
+   - If `regular-test.sh` doesn't exist, the system will look for `neon-test.sh`
+   - If neither exists, it will try to run `make installcheck`
+   - `test-upgrade.sh` is only needed if you want to test upgrade scenarios
+3. Update the list of extensions in the `test_extensions_upgrade.sh` script if needed for upgrade testing
+
+### Patching Extension Sources
+
+If you need to patch the extension sources:
+
+1. Place the patch file in the extension's directory
+2. Apply the patch in the appropriate script (`test-upgrade.sh`, `neon-test.sh`, `regular-test.sh`, or `Makefile`)
+3. The patch will be applied during the testing process
--- a/docker-compose/ext-src/hll-src/regular-test.sh
+++ b/docker-compose/ext-src/hll-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+dropdb --if-exists contrib_regression
+createdb contrib_regression
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression setup add_agg agg_oob auto_sparse card_op cast_shape copy_binary cumulative_add_cardinality_correction cumulative_add_comprehensive_promotion cumulative_add_sparse_edge cumulative_add_sparse_random cumulative_add_sparse_step cumulative_union_comprehensive cumulative_union_explicit_explicit cumulative_union_explicit_promotion cumulative_union_probabilistic_probabilistic cumulative_union_sparse_full_representation cumulative_union_sparse_promotion cumulative_union_sparse_sparse disable_hashagg equal explicit_thresh hash hash_any meta_func murmur_bigint murmur_bytea nosparse notequal scalar_oob storedproc transaction typmod typmod_insert union_op
--- a/docker-compose/ext-src/hypopg-src/regular-test.sh
+++ b/docker-compose/ext-src/hypopg-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exists contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --inputdir=test --dbname=contrib_regression hypopg hypo_brin hypo_index_part hypo_include hypo_hash hypo_hide_index
--- a/docker-compose/ext-src/ip4r-src/regular-test.sh
+++ b/docker-compose/ext-src/ip4r-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression ip4r ip4r-softerr ip4r-v11
--- a/docker-compose/ext-src/pg_cron-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_cron-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression pg_cron-test
--- a/docker-compose/ext-src/pg_graphql-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_graphql-src/regular-test.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+set -ex
+cd "$(dirname "${0}")"
+PGXS="$(dirname "$(pg_config --pgxs)" )"
+REGRESS="${PGXS}/../test/regress/pg_regress"
+TESTDIR="test"
+TESTS=$(ls "${TESTDIR}/sql" | sort )
+TESTS=${TESTS//\.sql/}
+TESTS=${TESTS/empty_mutations/}
+TESTS=${TESTS/function_return_row_is_selectable/}
+TESTS=${TESTS/issue_300/}
+TESTS=${TESTS/permissions_connection_column/}
+TESTS=${TESTS/permissions_functions/}
+TESTS=${TESTS/permissions_node_column/}
+TESTS=${TESTS/permissions_table_level/}
+TESTS=${TESTS/permissions_types/}
+TESTS=${TESTS/row_level_security/}
+TESTS=${TESTS/sqli_connection/}
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+psql -v ON_ERROR_STOP=1 -f test/fixtures.sql -d contrib_regression
+${REGRESS} --use-existing --dbname=contrib_regression --inputdir=${TESTDIR} ${TESTS}
+
--- a/docker-compose/ext-src/pg_hint_plan-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_hint_plan-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing  --inputdir=./ --bindir='/usr/local/pgsql/bin'    --encoding=UTF8 --dbname=contrib_regression init base_plan pg_hint_plan ut-init ut-A ut-S ut-J ut-L ut-G ut-R ut-fdw ut-W ut-T ut-fini hints_anywhere plpgsql oldextversions
--- a/docker-compose/ext-src/pg_ivm-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_ivm-src/regular-test.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+set -ex
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+cd "$(dirname ${0})"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+patch -p1 <regular.patch
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin' --dbname=contrib_regression pg_ivm create_immv refresh_immv
+patch -R -p1 <regular.patch
--- a/docker-compose/ext-src/pg_ivm-src/regular.patch
+++ b/docker-compose/ext-src/pg_ivm-src/regular.patch
@@ -0,0 +1,309 @@
+diff --git a/expected/pg_ivm.out b/expected/pg_ivm.out
+index e8798ee..4081680 100644
+--- a/expected/pg_ivm.out
+++ b/expected/pg_ivm.out
+@@ -1363,61 +1363,6 @@ SELECT * FROM mv ORDER BY i;
+    |   2 |   4 |                 2 |                 2 |             2
+ (1 row)
+ 
+-ROLLBACK;
+--- IMMV containing user defined type
+-BEGIN;
+-CREATE TYPE mytype;
+-CREATE FUNCTION mytype_in(cstring)
+- RETURNS mytype AS 'int4in'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-NOTICE:  return type mytype is only a shell
+-CREATE FUNCTION mytype_out(mytype)
+- RETURNS cstring AS 'int4out'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-NOTICE:  argument type mytype is only a shell
+-CREATE TYPE mytype (
+- LIKE = int4,
+- INPUT = mytype_in,
+- OUTPUT = mytype_out
+-);
+-CREATE FUNCTION mytype_eq(mytype, mytype)
+- RETURNS bool AS 'int4eq'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE FUNCTION mytype_lt(mytype, mytype)
+- RETURNS bool AS 'int4lt'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE FUNCTION mytype_cmp(mytype, mytype)
+- RETURNS integer AS 'btint4cmp'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE OPERATOR = (
+- leftarg = mytype, rightarg = mytype,
+- procedure = mytype_eq);
+-CREATE OPERATOR < (
+- leftarg = mytype, rightarg = mytype,
+- procedure = mytype_lt);
+-CREATE OPERATOR CLASS mytype_ops
+- DEFAULT FOR TYPE mytype USING btree AS
+- OPERATOR        1       <,
+- OPERATOR        3       = ,
+- FUNCTION		1		mytype_cmp(mytype,mytype);
+-CREATE TABLE t_mytype (x mytype);
+-SELECT create_immv('mv_mytype',
+- 'SELECT * FROM t_mytype');
+-NOTICE:  could not create an index on immv "mv_mytype" automatically
+-DETAIL:  This target list does not have all the primary key columns, or this view does not contain GROUP BY or DISTINCT clause.
+-HINT:  Create an index on the immv for efficient incremental maintenance.
+- create_immv 
+--------------
+-           0
+-(1 row)
+-
+-INSERT INTO t_mytype VALUES ('1'::mytype);
+-SELECT * FROM mv_mytype;
+- x 
+----
+- 1
+-(1 row)
+-
+ ROLLBACK;
+ -- outer join is not supported
+ SELECT create_immv('mv(a,b)',
+@@ -1510,112 +1455,6 @@ SELECT create_immv('mv_ivm_only_values1', 'values(1)');
+ ERROR:  VALUES is not supported on incrementally maintainable materialized view
+ SELECT create_immv('mv_ivm_only_values2',  'SELECT * FROM (values(1)) AS tmp');
+ ERROR:  VALUES is not supported on incrementally maintainable materialized view
+--- views containing base tables with Row Level Security
+-DROP USER IF EXISTS ivm_admin;
+-NOTICE:  role "ivm_admin" does not exist, skipping
+-DROP USER IF EXISTS ivm_user;
+-NOTICE:  role "ivm_user" does not exist, skipping
+-CREATE USER ivm_admin;
+-CREATE USER ivm_user;
+---- create a table with RLS
+-SET SESSION AUTHORIZATION ivm_admin;
+-CREATE TABLE rls_tbl(id int, data text, owner name);
+-INSERT INTO rls_tbl VALUES
+-  (1,'foo','ivm_user'),
+-  (2,'bar','postgres');
+-CREATE TABLE num_tbl(id int, num text);
+-INSERT INTO num_tbl VALUES
+-  (1,'one'),
+-  (2,'two'),
+-  (3,'three'),
+-  (4,'four'),
+-  (5,'five'),
+-  (6,'six');
+---- Users can access only their own rows
+-CREATE POLICY rls_tbl_policy ON rls_tbl FOR SELECT TO PUBLIC USING(owner = current_user);
+-ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY;
+-GRANT ALL on rls_tbl TO PUBLIC;
+-GRANT ALL on num_tbl TO PUBLIC;
+---- create a view owned by ivm_user
+-SET SESSION AUTHORIZATION ivm_user;
+-SELECT create_immv('ivm_rls', 'SELECT * FROM rls_tbl');
+-NOTICE:  could not create an index on immv "ivm_rls" automatically
+-DETAIL:  This target list does not have all the primary key columns, or this view does not contain GROUP BY or DISTINCT clause.
+-HINT:  Create an index on the immv for efficient incremental maintenance.
+- create_immv 
+--------------
+-           1
+-(1 row)
+-
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+- id | data |  owner   
+-----+------+----------
+-  1 | foo  | ivm_user
+-(1 row)
+-
+-RESET SESSION AUTHORIZATION;
+---- inserts rows owned by different users
+-INSERT INTO rls_tbl VALUES
+-  (3,'baz','ivm_user'),
+-  (4,'qux','postgres');
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+- id | data |  owner   
+-----+------+----------
+-  1 | foo  | ivm_user
+-  3 | baz  | ivm_user
+-(2 rows)
+-
+---- combination of diffent kinds of commands
+-WITH
+- i AS (INSERT INTO rls_tbl VALUES(5,'quux','postgres'), (6,'corge','ivm_user')),
+- u AS (UPDATE rls_tbl SET owner = 'postgres' WHERE id = 1),
+- u2 AS (UPDATE rls_tbl SET owner = 'ivm_user' WHERE id = 2)
+-SELECT;
+---
+-(1 row)
+-
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+- id | data  |  owner   
+-----+-------+----------
+-  2 | bar   | ivm_user
+-  3 | baz   | ivm_user
+-  6 | corge | ivm_user
+-(3 rows)
+-
+----
+-SET SESSION AUTHORIZATION ivm_user;
+-SELECT create_immv('ivm_rls2', 'SELECT * FROM rls_tbl JOIN num_tbl USING(id)');
+-NOTICE:  could not create an index on immv "ivm_rls2" automatically
+-DETAIL:  This target list does not have all the primary key columns, or this view does not contain GROUP BY or DISTINCT clause.
+-HINT:  Create an index on the immv for efficient incremental maintenance.
+- create_immv 
+--------------
+-           3
+-(1 row)
+-
+-RESET SESSION AUTHORIZATION;
+-WITH
+- x AS (UPDATE rls_tbl SET data = data || '_2' where id in (3,4)),
+- y AS (UPDATE num_tbl SET num = num || '_2' where id in (3,4))
+-SELECT;
+---
+-(1 row)
+-
+-SELECT * FROM ivm_rls2 ORDER BY 1,2,3;
+- id | data  |  owner   |   num   
+-----+-------+----------+---------
+-  2 | bar   | ivm_user | two
+-  3 | baz_2 | ivm_user | three_2
+-  6 | corge | ivm_user | six
+-(3 rows)
+-
+-DROP TABLE rls_tbl CASCADE;
+-NOTICE:  drop cascades to 2 other objects
+-DETAIL:  drop cascades to table ivm_rls
+-drop cascades to table ivm_rls2
+-DROP TABLE num_tbl CASCADE;
+-DROP USER ivm_user;
+-DROP USER ivm_admin;
+ -- automatic index creation
+ BEGIN;
+ CREATE TABLE base_a (i int primary key, j int);
+diff --git a/sql/pg_ivm.sql b/sql/pg_ivm.sql
+index d3c1a01..203213d 100644
+--- a/sql/pg_ivm.sql
+++ b/sql/pg_ivm.sql
+@@ -454,53 +454,6 @@ DELETE FROM base_t WHERE v = 5;
+ SELECT * FROM mv ORDER BY i;
+ ROLLBACK;
+ 
+--- IMMV containing user defined type
+-BEGIN;
+-
+-CREATE TYPE mytype;
+-CREATE FUNCTION mytype_in(cstring)
+- RETURNS mytype AS 'int4in'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE FUNCTION mytype_out(mytype)
+- RETURNS cstring AS 'int4out'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE TYPE mytype (
+- LIKE = int4,
+- INPUT = mytype_in,
+- OUTPUT = mytype_out
+-);
+-
+-CREATE FUNCTION mytype_eq(mytype, mytype)
+- RETURNS bool AS 'int4eq'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE FUNCTION mytype_lt(mytype, mytype)
+- RETURNS bool AS 'int4lt'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-CREATE FUNCTION mytype_cmp(mytype, mytype)
+- RETURNS integer AS 'btint4cmp'
+- LANGUAGE INTERNAL STRICT IMMUTABLE;
+-
+-CREATE OPERATOR = (
+- leftarg = mytype, rightarg = mytype,
+- procedure = mytype_eq);
+-CREATE OPERATOR < (
+- leftarg = mytype, rightarg = mytype,
+- procedure = mytype_lt);
+-
+-CREATE OPERATOR CLASS mytype_ops
+- DEFAULT FOR TYPE mytype USING btree AS
+- OPERATOR        1       <,
+- OPERATOR        3       = ,
+- FUNCTION		1		mytype_cmp(mytype,mytype);
+-
+-CREATE TABLE t_mytype (x mytype);
+-SELECT create_immv('mv_mytype',
+- 'SELECT * FROM t_mytype');
+-INSERT INTO t_mytype VALUES ('1'::mytype);
+-SELECT * FROM mv_mytype;
+-
+-ROLLBACK;
+-
+ -- outer join is not supported
+ SELECT create_immv('mv(a,b)',
+     'SELECT a.i, b.i FROM mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i');
+@@ -579,71 +532,6 @@ SELECT create_immv('mv_ivm31', 'SELECT sum(i)/sum(j) FROM mv_base_a');
+ SELECT create_immv('mv_ivm_only_values1', 'values(1)');
+ SELECT create_immv('mv_ivm_only_values2',  'SELECT * FROM (values(1)) AS tmp');
+ 
+-
+--- views containing base tables with Row Level Security
+-DROP USER IF EXISTS ivm_admin;
+-DROP USER IF EXISTS ivm_user;
+-CREATE USER ivm_admin;
+-CREATE USER ivm_user;
+-
+---- create a table with RLS
+-SET SESSION AUTHORIZATION ivm_admin;
+-CREATE TABLE rls_tbl(id int, data text, owner name);
+-INSERT INTO rls_tbl VALUES
+-  (1,'foo','ivm_user'),
+-  (2,'bar','postgres');
+-CREATE TABLE num_tbl(id int, num text);
+-INSERT INTO num_tbl VALUES
+-  (1,'one'),
+-  (2,'two'),
+-  (3,'three'),
+-  (4,'four'),
+-  (5,'five'),
+-  (6,'six');
+-
+---- Users can access only their own rows
+-CREATE POLICY rls_tbl_policy ON rls_tbl FOR SELECT TO PUBLIC USING(owner = current_user);
+-ALTER TABLE rls_tbl ENABLE ROW LEVEL SECURITY;
+-GRANT ALL on rls_tbl TO PUBLIC;
+-GRANT ALL on num_tbl TO PUBLIC;
+-
+---- create a view owned by ivm_user
+-SET SESSION AUTHORIZATION ivm_user;
+-SELECT create_immv('ivm_rls', 'SELECT * FROM rls_tbl');
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+-RESET SESSION AUTHORIZATION;
+-
+---- inserts rows owned by different users
+-INSERT INTO rls_tbl VALUES
+-  (3,'baz','ivm_user'),
+-  (4,'qux','postgres');
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+-
+---- combination of diffent kinds of commands
+-WITH
+- i AS (INSERT INTO rls_tbl VALUES(5,'quux','postgres'), (6,'corge','ivm_user')),
+- u AS (UPDATE rls_tbl SET owner = 'postgres' WHERE id = 1),
+- u2 AS (UPDATE rls_tbl SET owner = 'ivm_user' WHERE id = 2)
+-SELECT;
+-SELECT id, data, owner FROM ivm_rls ORDER BY 1,2,3;
+-
+----
+-SET SESSION AUTHORIZATION ivm_user;
+-SELECT create_immv('ivm_rls2', 'SELECT * FROM rls_tbl JOIN num_tbl USING(id)');
+-RESET SESSION AUTHORIZATION;
+-
+-WITH
+- x AS (UPDATE rls_tbl SET data = data || '_2' where id in (3,4)),
+- y AS (UPDATE num_tbl SET num = num || '_2' where id in (3,4))
+-SELECT;
+-SELECT * FROM ivm_rls2 ORDER BY 1,2,3;
+-
+-DROP TABLE rls_tbl CASCADE;
+-DROP TABLE num_tbl CASCADE;
+-
+-DROP USER ivm_user;
+-DROP USER ivm_admin;
+-
+ -- automatic index creation
+ BEGIN;
+ CREATE TABLE base_a (i int primary key, j int);
--- a/docker-compose/ext-src/pg_jsonschema-src/Makefile
+++ b/docker-compose/ext-src/pg_jsonschema-src/Makefile
@@ -1,8 +1,13 @@
 EXTENSION = pg_jsonschema
 DATA = pg_jsonschema--1.0.sql
 REGRESS = jsonschema_valid_api  jsonschema_edge_cases
-REGRESS_OPTS = --load-extension=pg_jsonschema

 PG_CONFIG ?= pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
+PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
+.PHONY installcheck:
+installcheck:
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	psql -d contrib_regression -c "CREATE EXTENSION $(EXTENSION)"
+	$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/pg_roaringbitmap-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_roaringbitmap-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression roaringbitmap
--- a/docker-compose/ext-src/pg_semver-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_semver-src/regular-test.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+set -ex
+# For v16 it's required to create a type which is impossible without superuser access
+# do not run this test so far
+if [[ "${PG_VERSION}" = v16 ]]; then
+  exit 0
+fi
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --inputdir=test --dbname=contrib_regression base corpus
--- a/docker-compose/ext-src/pg_session_jwt-src/Makefile
+++ b/docker-compose/ext-src/pg_session_jwt-src/Makefile
@@ -6,4 +6,10 @@ export PGOPTIONS = -c pg_session_jwt.jwk={"crv":"Ed25519","kty":"OKP","x":"R_Abz

 PG_CONFIG ?= pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
+PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
+.PHONY installcheck:
+installcheck:
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	psql -d contrib_regression -c "CREATE EXTENSION $(EXTENSION)"
+	$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/pg_tiktoken-src/Makefile
+++ b/docker-compose/ext-src/pg_tiktoken-src/Makefile
@@ -5,4 +5,6 @@ REGRESS = pg_tiktoken
 installcheck: regression-test

 regression-test:
-	$(PG_REGRESS) --inputdir=. --outputdir=. --dbname=contrib_regression $(REGRESS)
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	$(PG_REGRESS) --inputdir=. --outputdir=. --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/pg_uuidv7-src/regular-test.sh
+++ b/docker-compose/ext-src/pg_uuidv7-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname "${0}")"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --inputdir=test --dbname=contrib_regression 001_setup 002_uuid_generate_v7 003_uuid_v7_to_timestamptz 004_uuid_timestamptz_to_v7 005_uuid_v7_to_timestamp 006_uuid_timestamp_to_v7
--- a/docker-compose/ext-src/pgjwt-src/neon-test.sh
+++ b/docker-compose/ext-src/pgjwt-src/neon-test.sh
@@ -1,4 +1,6 @@
 #!/bin/bash
 set -ex
 cd "$(dirname "${0}")"
-pg_prove test.sql
+dropdb --if-exists contrib_regression
+createdb contrib_regression
+pg_prove -d contrib_regression test.sql
--- a/docker-compose/ext-src/pgrag-src/regular-test.sh
+++ b/docker-compose/ext-src/pgrag-src/regular-test.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+set -ex
+cd "$(dirname "${0}")"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin'    --use-existing --load-extension=vector --load-extension=rag --dbname=contrib_regression basic_functions text_processing api_keys chunking_functions document_processing embedding_api_functions voyageai_functions
--- a/docker-compose/ext-src/pgtap-src/regular-test.sh
+++ b/docker-compose/ext-src/pgtap-src/regular-test.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+make installcheck || true
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+sed -i '/hastap/d' test/build/run.sch
+sed -Ei 's/\b(aretap|enumtap|ownership|privs|usergroup)\b//g' test/build/run.sch
+${PG_REGRESS} --use-existing --dbname=contrib_regression --inputdir=./ --bindir='/usr/local/pgsql/bin'    --inputdir=test --max-connections=879 --schedule test/schedule/main.sch   --schedule test/build/run.sch
--- a/docker-compose/ext-src/pgvector-src/regular-test.sh
+++ b/docker-compose/ext-src/pgvector-src/regular-test.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+psql -d contrib_regression -c "CREATE EXTENSION vector"
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --inputdir=test --use-existing --dbname=contrib_regression bit btree cast copy halfvec hnsw_bit hnsw_halfvec hnsw_sparsevec hnsw_vector ivfflat_bit ivfflat_halfvec ivfflat_vector sparsevec vector_type
--- a/docker-compose/ext-src/pgx_ulid-src/Makefile
+++ b/docker-compose/ext-src/pgx_ulid-src/Makefile
@@ -4,13 +4,21 @@ PGFILEDESC = "pgx_ulid - ULID type for PostgreSQL"

 PG_CONFIG ?= pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
+PG_REGRESS = $(dir $(PGXS))/../../src/test/regress/pg_regress
 PG_MAJOR_VERSION := $(word 2, $(subst ., , $(shell $(PG_CONFIG) --version)))
 ifeq ($(shell test $(PG_MAJOR_VERSION) -lt 17; echo $$?),0)
-  REGRESS_OPTS = --load-extension=ulid
  REGRESS = 00_ulid_generation 01_ulid_conversions 03_ulid_errors
+  EXTNAME = ulid
 else
-  REGRESS_OPTS = --load-extension=pgx_ulid
  REGRESS = 00_ulid_generation 01_ulid_conversions 02_ulid_conversions 03_ulid_errors
+  EXTNAME = pgx_ulid
 endif

-include $(PGXS)
+.PHONY: installcheck
+installcheck: regression-test
+
+regression-test:
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	psql -d contrib_regression -c "CREATE EXTENSION $(EXTNAME)"
+	$(PG_REGRESS) --inputdir=. --outputdir=. --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/plv8-src/regular-test.sh
+++ b/docker-compose/ext-src/plv8-src/regular-test.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+REGRESS="$(make -n installcheck | awk '{print substr($0,index($0,"init-extension"));}')"
+REGRESS="${REGRESS/startup_perms/}"
+REGRESS="${REGRESS/startup /}"
+REGRESS="${REGRESS/find_function_perms/}"
+REGRESS="${REGRESS/guc/}"
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin'  --use-existing --dbname=contrib_regression ${REGRESS}
--- a/docker-compose/ext-src/postgresql-unit-src/regular-test.sh
+++ b/docker-compose/ext-src/postgresql-unit-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --dbname=contrib_regression extension tables unit binary unicode prefix units time temperature functions language_functions round derived compare aggregate iec custom crosstab convert
--- a/docker-compose/ext-src/prefix-src/regular-test.sh
+++ b/docker-compose/ext-src/prefix-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --use-existing --inputdir=./ --bindir='/usr/local/pgsql/bin'    --dbname=contrib_regression create_extension prefix falcon explain queries
--- a/docker-compose/ext-src/rag_bge_small_en_v15-src/Makefile
+++ b/docker-compose/ext-src/rag_bge_small_en_v15-src/Makefile
@@ -3,8 +3,13 @@ MODULE_big = rag_bge_small_en_v15
 OBJS = $(patsubst %.rs,%.o,$(wildcard src/*.rs))

 REGRESS = basic_functions embedding_functions basic_functions_enhanced embedding_functions_enhanced
-REGRESS_OPTS = --load-extension=vector --load-extension=rag_bge_small_en_v15

 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
+PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
+.PHONY installcheck:
+installcheck:
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag_bge_small_en_v15"
+	$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/Makefile
+++ b/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/Makefile
@@ -3,8 +3,13 @@ MODULE_big = rag_jina_reranker_v1_tiny_en
 OBJS = $(patsubst %.rs,%.o,$(wildcard src/*.rs))

 REGRESS = reranking_functions reranking_functions_enhanced
-REGRESS_OPTS = --load-extension=vector --load-extension=rag_jina_reranker_v1_tiny_en

 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
+PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
+.PHONY installcheck:
+installcheck:
+	dropdb --if-exists contrib_regression
+	createdb contrib_regression
+	psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag_jina_reranker_v1_tiny_en"
+	$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)
--- a/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/expected/reranking_functions.out
+++ b/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/expected/reranking_functions.out
@@ -1,25 +1,27 @@
 -- Reranking function tests
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon');
- rerank_distance 
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);
+ round  
+--------
+ 0.8989
+(1 row)
+
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);
+      array      
 -----------------
-       0.8989152
+ {0.8989,1.3018}
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
-    rerank_distance    
-----------------------
- {0.8989152,1.3018152}
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);
+  round  
+---------
+ -0.8989
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon');
- rerank_score 
--------------
-   -0.8989152
-(1 row)
-
-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
-      rerank_score       
-------------------------
- {-0.8989152,-1.3018152}
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) as x);
+       array       
+-------------------
+ {-0.8989,-1.3018}
 (1 row)

--- a/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/expected/reranking_functions_enhanced.out
+++ b/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/expected/reranking_functions_enhanced.out
@@ -1,41 +1,41 @@
 -- Reranking function tests - single passage
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon');
- rerank_distance 
-----------------
-       0.8989152
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);
+ round  
+--------
+ 0.8989
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the tanks fired at the buildings');
- rerank_distance 
-----------------
-       1.3018152
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the tanks fired at the buildings')::NUMERIC,4);
+ round  
+--------
+ 1.3018
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('query about cats', 'information about felines');
- rerank_distance 
-----------------
-       1.3133051
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('query about cats', 'information about felines')::NUMERIC,4);
+ round  
+--------
+ 1.3133
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('', 'empty query test');
- rerank_distance 
-----------------
-       0.7075559
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('', 'empty query test')::NUMERIC,4);
+ round  
+--------
+ 0.7076
 (1 row)

 -- Reranking function tests - array of passages
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
-    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
-    rerank_distance    
-----------------------
- {0.8989152,1.3018152}
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);
+      array      
+-----------------
+ {0.8989,1.3018}
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('query about programming',
-    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases']);
-          rerank_distance           
------------------------------------
- {0.16591403,0.33475375,0.10132827}
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('query about programming',
+    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases'])) AS x);
+         array          
+------------------------
+ {0.1659,0.3348,0.1013}
 (1 row)

 SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('empty array test', ARRAY[]::text[]);
@@ -45,43 +45,43 @@ SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('empty array test', ARRAY[]:
 (1 row)

 -- Reranking score function tests - single passage
-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon');
- rerank_score 
--------------
-   -0.8989152
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);
+  round  
+---------
+ -0.8989
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the tanks fired at the buildings');
- rerank_score 
--------------
-   -1.3018152
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the tanks fired at the buildings')::NUMERIC,4);
+  round  
+---------
+ -1.3018
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('query about cats', 'information about felines');
- rerank_score 
--------------
-   -1.3133051
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('query about cats', 'information about felines')::NUMERIC,4);
+  round  
+---------
+ -1.3133
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('', 'empty query test');
- rerank_score 
--------------
-   -0.7075559
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('', 'empty query test')::NUMERIC,4);
+  round  
+---------
+ -0.7076
 (1 row)

 -- Reranking score function tests - array of passages
-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
-    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
-      rerank_score       
-------------------------
- {-0.8989152,-1.3018152}
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);
+       array       
+-------------------
+ {-0.8989,-1.3018}
 (1 row)

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('query about programming',
-    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases']);
-             rerank_score              
---------------------------------------
- {-0.16591403,-0.33475375,-0.10132827}
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('query about programming',
+    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases'])) AS x);
+           array           
+---------------------------
+ {-0.1659,-0.3348,-0.1013}
 (1 row)

 SELECT rag_jina_reranker_v1_tiny_en.rerank_score('empty array test', ARRAY[]::text[]);
--- a/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/sql/reranking_functions.sql
+++ b/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/sql/reranking_functions.sql
@@ -1,8 +1,10 @@
 -- Reranking function tests
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) as x);
--- a/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/sql/reranking_functions_enhanced.sql
+++ b/docker-compose/ext-src/rag_jina_reranker_v1_tiny_en-src/sql/reranking_functions_enhanced.sql
@@ -1,35 +1,35 @@
 -- Reranking function tests - single passage
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the tanks fired at the buildings');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat', 'the tanks fired at the buildings')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('query about cats', 'information about felines');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('query about cats', 'information about felines')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('', 'empty query test');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_distance('', 'empty query test')::NUMERIC,4);

 -- Reranking function tests - array of passages
-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
-    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('query about programming',
-    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_distance('query about programming',
+    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases'])) AS x);

 SELECT rag_jina_reranker_v1_tiny_en.rerank_distance('empty array test', ARRAY[]::text[]);

 -- Reranking score function tests - single passage
-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the baboon played with the balloon')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the tanks fired at the buildings');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat', 'the tanks fired at the buildings')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('query about cats', 'information about felines');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('query about cats', 'information about felines')::NUMERIC,4);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('', 'empty query test');
+SELECT ROUND(rag_jina_reranker_v1_tiny_en.rerank_score('', 'empty query test')::NUMERIC,4);

 -- Reranking score function tests - array of passages
-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
-    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('the cat sat on the mat',
+    ARRAY['the baboon played with the balloon', 'the tanks fired at the buildings'])) AS x);

-SELECT rag_jina_reranker_v1_tiny_en.rerank_score('query about programming',
-    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases']);
+SELECT ARRAY(SELECT ROUND(x::NUMERIC,4) FROM unnest(rag_jina_reranker_v1_tiny_en.rerank_score('query about programming',
+    ARRAY['Python is a programming language', 'Java is also a programming language', 'SQL is used for databases'])) AS x);

 SELECT rag_jina_reranker_v1_tiny_en.rerank_score('empty array test', ARRAY[]::text[]);
--- a/docker-compose/ext-src/rum-src/regular-test.sh
+++ b/docker-compose/ext-src/rum-src/regular-test.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+set -ex
+cd "$(dirname ${0})"
+dropdb --if-exist contrib_regression
+createdb contrib_regression
+PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
+${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --dbname=contrib_regression rum rum_hash ruminv timestamp orderby orderby_hash altorder altorder_hash limits int2 int4 int8 float4 float8 money oid time timetz date interval macaddr inet cidr text varchar char bytea bit varbit numeric rum_weight expr array
--- a/docker-compose/run-tests.sh
+++ b/docker-compose/run-tests.sh
@@ -1,6 +1,42 @@
 #!/bin/bash
 set -x

+if [[ -v BENCHMARK_CONNSTR ]]; then
+  uri_no_proto="${BENCHMARK_CONNSTR#postgres://}"
+  uri_no_proto="${uri_no_proto#postgresql://}"
+  if [[ $uri_no_proto == *\?* ]]; then
+    base="${uri_no_proto%%\?*}"       # before '?'
+  else
+    base="$uri_no_proto"
+  fi
+  if [[ $base =~ ^([^:]+):([^@]+)@([^:/]+):?([0-9]*)/(.+)$ ]]; then
+    export PGUSER="${BASH_REMATCH[1]}"
+    export PGPASSWORD="${BASH_REMATCH[2]}"
+    export PGHOST="${BASH_REMATCH[3]}"
+    export PGPORT="${BASH_REMATCH[4]:-5432}"
+    export PGDATABASE="${BASH_REMATCH[5]}"
+    echo export PGUSER="${BASH_REMATCH[1]}"
+    echo export PGPASSWORD="${BASH_REMATCH[2]}"
+    echo export PGHOST="${BASH_REMATCH[3]}"
+    echo export PGPORT="${BASH_REMATCH[4]:-5432}"
+    echo export PGDATABASE="${BASH_REMATCH[5]}"
+  else
+    echo "Invalid PostgreSQL base URI"
+    exit 1
+  fi
+fi
+REGULAR_USER=false
+while getopts r arg; do
+  case $arg in
+  r)
+    REGULAR_USER=true
+    shift $((OPTIND-1))
+    ;;
+  *) :
+    ;;
+  esac
+done
+
 extdir=${1}

 cd "${extdir}" || exit 2
@@ -12,6 +48,11 @@ for d in ${LIST}; do
      FAILED="${d} ${FAILED}"
      break
    fi
+    if [[ ${REGULAR_USER} = true ]] && [ -f "${d}"/regular-test.sh ]; then
+       "${d}/regular-test.sh" || FAILED="${d} ${FAILED}"
+       continue
+    fi
+
    if [ -f "${d}/neon-test.sh" ]; then
       "${d}/neon-test.sh" || FAILED="${d} ${FAILED}"
    else
@@ -19,5 +60,8 @@ for d in ${LIST}; do
    fi
 done
 [ -z "${FAILED}" ] && exit 0
+for d in ${FAILED}; do
+  cat "$(find $d -name regression.diffs)"
+done
 echo "${FAILED}"
 exit 1
--- a/docs/consumption_metrics.md
+++ b/docs/consumption_metrics.md
@@ -13,7 +13,7 @@ For design details see [the RFC](./rfcs/021-metering.md) and [the discussion on
 batch format is
 ```json

-{ "events" : [metric1, metric2, ...]]}
+{ "events" : [metric1, metric2, ...] }

 ```
 See metric format examples below.
@@ -49,11 +49,13 @@ Size of the remote storage (S3) directory.
 This is an absolute, per-tenant metric.

 - `timeline_logical_size`
-Logical size of the data in the timeline
+
+Logical size of the data in the timeline.
 This is an absolute, per-timeline metric.

 - `synthetic_storage_size`
-Size of all tenant's branches including WAL
+
+Size of all tenant's branches including WAL.
 This is the same metric that `tenant/{tenant_id}/size` endpoint returns.
 This is an absolute, per-tenant metric.

@@ -106,10 +108,10 @@ This is an incremental, per-endpoint metric.
 ```

 The metric is incremental, so the value is the difference between the current and the previous value.
-If there is no previous value, the value, the value is the current value and the `start_time` equals `stop_time`.
+If there is no previous value, the value is the current value and the `start_time` equals `stop_time`.

 ### TODO

 - [ ] Handle errors better: currently if one tenant fails to gather metrics, the whole iteration fails and metrics are not sent for any tenant.
 - [ ] Add retries
- [ ] Tune the interval
+- [ ] Tune the interval
--- a/docs/rfcs/0XY-storcon-rollout.md
+++ b/docs/rfcs/0XY-storcon-rollout.md
@@ -1,106 +0,0 @@
-# Feature Rollout on Storage Controller
-
-This RFC describes the rollout interface from a user's perspective. I do not have a concreate implementation idea
-yet, but the operations described below should be intuitively feasible to implement -- most of them map to a single
-SQL inside the storcon database and then some reconcile operations.
-
-The rollout RFC makes it possible for the storcon to gradually modify tenant configs based on filters and percentages.
-
-What will it look like if we want to rollout gc-compaction gradually?
-
-Create a feature called gc-compaction.
-
-```
-$ storcon-cli feature create --job gc-compaction
-```
-
-Add two config sets to the rollout: gc-compaction without verification, and gc-compaction with read path verification.
-
-```
-$ storcon-cli feature config-set add --job gc-compaction --id enable_wo_verification --config '{ "gc_compaction_enabled": true }'
-$ storcon-cli feature config-set add --job gc-compaction --id enable --config '{ "gc_compaction_enabled": true, "gc_compaction_verification": true }'
-$ storcon-cli feature config-set list --job gc-compaction
-default {}
-enable_wo_verification { "gc_compaction_enabled": true }
-enable { "gc_compaction_enabled": true, "gc_compaction_verification": true }
-```
-
-Week 1: rollout to 1% of active small tenants with the verification mode enabled.
-
-```
-$ storcon-cli feature rollout --job gc-compaction --config-set enable --filter "remote_size < 10GB" --coverage-percentage 1
-20000 attached tenants satisfying the filter and randomly picked 200 tenants to apply `enable`, use `feature status` to view the rollout status
-```
-
-```
-$ storcon-cli feature status --job gc-compaction
-enable_wo_verification 0
-enable 200
-default 100000
-
-$ storcon-cli feature status --job gc-compaction --filter "remote_size < 10GB"
-enable_wo_verification 0
-enable 200
-default 19800
-
-$ storcon-cli feature history --job gc-compaction
-id,time,config-set,filter,percentage,count
-1,2025-04-25 14:00:00,enable,"remote_size < 10GB",1,200
-
-$ storcon-cli feature history --job gc-compaction --id 1
-<tenant_ids that are involved in this rollout>
-```
-
-Week 2: rollout to all active small tenants with the verification mode enabled, previously rolled-out tenants switch to full rollout that disables verification mode.
-
-```
-$ storcon-cli feature rollout --job gc-compaction --config-set enable_wo_verification --filter "config_set=enable" --coverage-percentage all
-$ storcon-cli feature rollout --job gc-compaction --config-set enable --filter "remote_size < 10GB" --coverage-percentage all
-```
-
-Week 3: rollout gradually to 50% larger tenants before Jun 1. The storage controller will randomly select some tenants every day at 12am to rollout the change, and will finish the rollout to at least 50% of the tenants by Jun 1.
-
-```
-$ storcon-cli feature scheduled-rollout --job gc-compaction --config-set enable --filter "remote_size < 100GB" --coverage-percentage 50 --cron "0 0 * * *" --before 2025-06-01 00:00:00
-```
-
-Week 4: we discover a bug over a specific tenant and want to disable gc-compaction on it,
-
-```
-$ storcon-cli feature rollout --job gc-compaction --config-set default --filter "tenant-id=<id>" --coverage-percentage all
-rollout succeeded, operation_id=11
-```
-
-Then we realize that this bug might affect all tenants and decide to disable it for all tenants:
-
-```
-$ storcon-cli feature rollout --job gc-compaction --config-set default --coverage-percentage all
-rollout succeeded, operation_id=10
-```
-
-We get a fix and can re-enable it on those tenants which had the feature enabled previously.
-
-```
-$ storcon-cli feature rollout --job gc-compaction --revert 11
-$ storcon-cli feature rollout --job gc-compaction --revert 10
-```
-
-Week 5: enable by default
-
-For newly-attached tenants, we want to enable gc-compaction by default.
-
-```
-$ storcon-cli feature set-default-when-attached --job gc-compaction --config-set enable
-```
-
-Week 6: full rollout
-
-```
-$ storcon-cli feature rollout --job gc-compaction --config-set enable --coverage-percentage all
-```
-
-Then we make a tenant default config change in the infra repo, get it deployed, and we can delete the feature rollout record in the storcon database.
-
-```
-$ storcon-cli feature delete --job gc-compaction
-```
--- a/libs/neonart/Cargo.toml
+++ b/libs/neonart/Cargo.toml
@@ -0,0 +1,11 @@
+[package]
+name = "neonart"
+version = "0.1.0"
+edition.workspace = true
+license.workspace = true
+
+[dependencies]
+tracing.workspace = true
+
+rand.workspace = true # for tests
+zerocopy = "0.8"
--- a/libs/neonart/src/algorithm.rs
+++ b/libs/neonart/src/algorithm.rs
@@ -0,0 +1,377 @@
+mod lock_and_version;
+mod node_ptr;
+mod node_ref;
+
+use std::vec::Vec;
+
+use crate::algorithm::lock_and_version::ResultOrRestart;
+use crate::algorithm::node_ptr::{MAX_PREFIX_LEN, NodePtr};
+use crate::algorithm::node_ref::ChildOrValue;
+use crate::algorithm::node_ref::{NodeRef, ReadLockedNodeRef, WriteLockedNodeRef};
+
+use crate::epoch::EpochPin;
+use crate::{Allocator, Key, Value};
+
+pub(crate) type RootPtr<V> = node_ptr::NodePtr<V>;
+
+pub fn new_root<V: Value>(allocator: &Allocator) -> RootPtr<V> {
+    node_ptr::new_root(allocator)
+}
+
+pub(crate) fn search<'e, K: Key, V: Value>(
+    key: &K,
+    root: RootPtr<V>,
+    epoch_pin: &'e EpochPin,
+) -> Option<V> {
+    loop {
+        let root_ref = NodeRef::from_root_ptr(root);
+        if let Ok(result) = lookup_recurse(key.as_bytes(), root_ref, None, epoch_pin) {
+            break result;
+        }
+        // retry
+    }
+}
+
+pub(crate) fn update_fn<'e, K: Key, V: Value, F>(
+    key: &K,
+    value_fn: F,
+    root: RootPtr<V>,
+    allocator: &Allocator,
+    epoch_pin: &'e EpochPin,
+) where
+    F: FnOnce(Option<&V>) -> Option<V>,
+{
+    let value_fn_cell = std::cell::Cell::new(Some(value_fn));
+    loop {
+        let root_ref = NodeRef::from_root_ptr(root);
+        let this_value_fn = |arg: Option<&V>| value_fn_cell.take().unwrap()(arg);
+        let key_bytes = key.as_bytes();
+        if let Ok(()) = update_recurse(
+            key_bytes,
+            this_value_fn,
+            root_ref,
+            None,
+            allocator,
+            epoch_pin,
+            0,
+            key_bytes,
+        ) {
+            break;
+        }
+        // retry
+    }
+}
+
+pub(crate) fn dump_tree<'e, V: Value + std::fmt::Debug>(root: RootPtr<V>, epoch_pin: &'e EpochPin) {
+    let root_ref = NodeRef::from_root_ptr(root);
+
+    let _ = dump_recurse(&[], root_ref, &epoch_pin, 0);
+}
+
+// Error means you must retry.
+//
+// This corresponds to the 'lookupOpt' function in the paper
+fn lookup_recurse<'e, V: Value>(
+    key: &[u8],
+    node: NodeRef<'e, V>,
+    parent: Option<ReadLockedNodeRef<V>>,
+    epoch_pin: &'e EpochPin,
+) -> ResultOrRestart<Option<V>> {
+    let rnode = node.read_lock_or_restart()?;
+    if let Some(parent) = parent {
+        parent.read_unlock_or_restart()?;
+    }
+
+    // check if prefix matches, may increment level
+    let prefix_len = if let Some(prefix_len) = rnode.prefix_matches(key) {
+        prefix_len
+    } else {
+        rnode.read_unlock_or_restart()?;
+        return Ok(None);
+    };
+    let key = &key[prefix_len..];
+
+    // find child (or leaf value)
+    let next_node = rnode.find_child_or_value_or_restart(key[0])?;
+
+    match next_node {
+        None => Ok(None), // key not found
+        Some(ChildOrValue::Value(vptr)) => {
+            // safety: It's OK to follow the pointer because we checked the version.
+            let v = unsafe { (*vptr).clone() };
+            Ok(Some(v))
+        }
+        Some(ChildOrValue::Child(v)) => lookup_recurse(&key[1..], v, Some(rnode), epoch_pin),
+    }
+}
+
+// This corresponds to the 'insertOpt' function in the paper
+pub(crate) fn update_recurse<'e, V: Value, F>(
+    key: &[u8],
+    value_fn: F,
+    node: NodeRef<'e, V>,
+    rparent: Option<(ReadLockedNodeRef<V>, u8)>,
+    allocator: &Allocator,
+    epoch_pin: &'e EpochPin,
+    level: usize,
+    orig_key: &[u8],
+) -> ResultOrRestart<()>
+where
+    F: FnOnce(Option<&V>) -> Option<V>,
+{
+    let rnode = node.read_lock_or_restart()?;
+
+    let prefix_match_len = rnode.prefix_matches(key);
+    if prefix_match_len.is_none() {
+        let (rparent, parent_key) = rparent.expect("direct children of the root have no prefix");
+        let mut wparent = rparent.upgrade_to_write_lock_or_restart()?;
+        let mut wnode = rnode.upgrade_to_write_lock_or_restart()?;
+
+        if let Some(new_value) = value_fn(None) {
+            insert_split_prefix(
+                key,
+                new_value,
+                &mut wnode,
+                &mut wparent,
+                parent_key,
+                allocator,
+            );
+        }
+        wnode.write_unlock();
+        wparent.write_unlock();
+        return Ok(());
+    }
+    let prefix_match_len = prefix_match_len.unwrap();
+    let key = &key[prefix_match_len as usize..];
+    let level = level + prefix_match_len as usize;
+
+    let next_node = rnode.find_child_or_value_or_restart(key[0])?;
+
+    if next_node.is_none() {
+        if rnode.is_full() {
+            let (rparent, parent_key) = rparent.expect("root node cannot become full");
+            let mut wparent = rparent.upgrade_to_write_lock_or_restart()?;
+            let wnode = rnode.upgrade_to_write_lock_or_restart()?;
+
+            if let Some(new_value) = value_fn(None) {
+                insert_and_grow(key, new_value, &wnode, &mut wparent, parent_key, allocator);
+                wnode.write_unlock_obsolete();
+                wparent.write_unlock();
+            } else {
+                wnode.write_unlock();
+                wparent.write_unlock();
+            }
+        } else {
+            let mut wnode = rnode.upgrade_to_write_lock_or_restart()?;
+            if let Some((rparent, _)) = rparent {
+                rparent.read_unlock_or_restart()?;
+            }
+            if let Some(new_value) = value_fn(None) {
+                insert_to_node(&mut wnode, key, new_value, allocator);
+            }
+            wnode.write_unlock();
+        }
+        return Ok(());
+    } else {
+        let next_node = next_node.unwrap(); // checked above it's not None
+        if let Some((rparent, _)) = rparent {
+            rparent.read_unlock_or_restart()?;
+        }
+
+        match next_node {
+            ChildOrValue::Value(existing_value_ptr) => {
+                assert!(key.len() == 1);
+                let wnode = rnode.upgrade_to_write_lock_or_restart()?;
+
+                // safety: Now that we have acquired the write lock, we have exclusive access to the
+                // value
+                let vmut = unsafe { existing_value_ptr.cast_mut().as_mut() }.unwrap();
+                if let Some(new_value) = value_fn(Some(vmut)) {
+                    *vmut = new_value;
+                } else {
+                    // TODO: Treat this as deletion?
+                }
+                wnode.write_unlock();
+
+                Ok(())
+            }
+            ChildOrValue::Child(next_child) => {
+                // recurse to next level
+                update_recurse(
+                    &key[1..],
+                    value_fn,
+                    next_child,
+                    Some((rnode, key[0])),
+                    allocator,
+                    epoch_pin,
+                    level + 1,
+                    orig_key,
+                )
+            }
+        }
+    }
+}
+
+#[derive(Clone)]
+enum PathElement {
+    Prefix(Vec<u8>),
+    KeyByte(u8),
+}
+
+impl std::fmt::Debug for PathElement {
+    fn fmt(&self, fmt: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
+        match self {
+            PathElement::Prefix(prefix) => write!(fmt, "{:?}", prefix),
+            PathElement::KeyByte(key_byte) => write!(fmt, "{}", key_byte),
+        }
+    }
+}
+
+fn dump_recurse<'e, V: Value + std::fmt::Debug>(
+    path: &[PathElement],
+    node: NodeRef<'e, V>,
+    epoch_pin: &'e EpochPin,
+    level: usize,
+) -> ResultOrRestart<()> {
+    let indent = str::repeat(" ", level);
+
+    let rnode = node.read_lock_or_restart()?;
+    let mut path = Vec::from(path);
+    let prefix = rnode.get_prefix();
+    if prefix.len() != 0 {
+        path.push(PathElement::Prefix(Vec::from(prefix)));
+    }
+
+    for key_byte in 0..u8::MAX {
+        match rnode.find_child_or_value_or_restart(key_byte)? {
+            None => continue,
+            Some(ChildOrValue::Child(child_ref)) => {
+                let rchild = child_ref.read_lock_or_restart()?;
+                eprintln!(
+                    "{} {:?}, {}: prefix {:?}",
+                    indent,
+                    &path,
+                    key_byte,
+                    rchild.get_prefix()
+                );
+
+                let mut child_path = path.clone();
+                child_path.push(PathElement::KeyByte(key_byte));
+
+                dump_recurse(&child_path, child_ref, epoch_pin, level + 1)?;
+            }
+            Some(ChildOrValue::Value(val)) => {
+                eprintln!("{} {:?}, {}: {:?}", indent, path, key_byte, unsafe {
+                    val.as_ref().unwrap()
+                });
+            }
+        }
+    }
+
+    Ok(())
+}
+
+///```text
+///        [fooba]r -> value
+///
+/// [foo]b -> [a]r  -> value
+///      e -> [ls]e -> value
+///```
+fn insert_split_prefix<'a, V: Value>(
+    key: &[u8],
+    value: V,
+    node: &mut WriteLockedNodeRef<V>,
+    parent: &mut WriteLockedNodeRef<V>,
+    parent_key: u8,
+    allocator: &Allocator,
+) {
+    let old_node = node;
+    let old_prefix = old_node.get_prefix();
+    let common_prefix_len = common_prefix(key, old_prefix);
+
+    // Allocate a node for the new value.
+    let new_value_node = allocate_node_for_value(&key[common_prefix_len + 1..], value, allocator);
+
+    // Allocate a new internal node with the common prefix
+    let mut prefix_node = node_ref::new_internal(&key[..common_prefix_len], allocator);
+
+    // Add the old node and the new nodes to the new internal node
+    prefix_node.insert_child(old_prefix[common_prefix_len], old_node.as_ptr());
+    prefix_node.insert_child(key[common_prefix_len], new_value_node);
+
+    // Modify the prefix of the old child in place
+    old_node.truncate_prefix(old_prefix.len() - common_prefix_len - 1);
+
+    // replace the pointer in the parent
+    parent.replace_child(parent_key, prefix_node.into_ptr());
+}
+
+fn insert_to_node<V: Value>(
+    wnode: &mut WriteLockedNodeRef<V>,
+    key: &[u8],
+    value: V,
+    allocator: &Allocator,
+) {
+    if wnode.is_leaf() {
+        wnode.insert_value(key[0], value);
+    } else {
+        let value_child = allocate_node_for_value(&key[1..], value, allocator);
+        wnode.insert_child(key[0], value_child);
+    }
+}
+
+// On entry: 'parent' and 'node' are locked
+fn insert_and_grow<V: Value>(
+    key: &[u8],
+    value: V,
+    wnode: &WriteLockedNodeRef<V>,
+    parent: &mut WriteLockedNodeRef<V>,
+    parent_key_byte: u8,
+    allocator: &Allocator,
+) {
+    let mut bigger_node = wnode.grow(allocator);
+
+    if wnode.is_leaf() {
+        bigger_node.insert_value(key[0], value);
+    } else {
+        let value_child = allocate_node_for_value(&key[1..], value, allocator);
+        bigger_node.insert_child(key[0], value_child);
+    }
+
+    // Replace the pointer in the parent
+    parent.replace_child(parent_key_byte, bigger_node.into_ptr());
+}
+
+// Allocate a new leaf node to hold 'value'. If key is long, we may need to allocate
+// new internal nodes to hold it too
+fn allocate_node_for_value<V: Value>(key: &[u8], value: V, allocator: &Allocator) -> NodePtr<V> {
+    let mut prefix_off = key.len().saturating_sub(MAX_PREFIX_LEN + 1);
+
+    let mut leaf_node = node_ref::new_leaf(&key[prefix_off..key.len() - 1], allocator);
+    leaf_node.insert_value(*key.last().unwrap(), value);
+
+    let mut node = leaf_node;
+    while prefix_off > 0 {
+        // Need another internal node
+        let remain_prefix = &key[0..prefix_off];
+
+        prefix_off = remain_prefix.len().saturating_sub(MAX_PREFIX_LEN + 1);
+        let mut internal_node = node_ref::new_internal(
+            &remain_prefix[prefix_off..remain_prefix.len() - 1],
+            allocator,
+        );
+        internal_node.insert_child(*remain_prefix.last().unwrap(), node.into_ptr());
+        node = internal_node;
+    }
+
+    node.into_ptr()
+}
+
+fn common_prefix(a: &[u8], b: &[u8]) -> usize {
+    for i in 0..MAX_PREFIX_LEN {
+        if a[i] != b[i] {
+            return i;
+        }
+    }
+    panic!("prefixes are equal");
+}
--- a/libs/neonart/src/algorithm/lock_and_version.rs
+++ b/libs/neonart/src/algorithm/lock_and_version.rs
@@ -0,0 +1,85 @@
+use std::sync::atomic::{AtomicU64, Ordering};
+
+pub(crate) struct AtomicLockAndVersion {
+    inner: AtomicU64,
+}
+
+impl AtomicLockAndVersion {
+    pub(crate) fn new() -> AtomicLockAndVersion {
+        AtomicLockAndVersion {
+            inner: AtomicU64::new(0),
+        }
+    }
+}
+
+pub(crate) type ResultOrRestart<T> = Result<T, ()>;
+
+const fn restart<T>() -> ResultOrRestart<T> {
+    Err(())
+}
+
+impl AtomicLockAndVersion {
+    pub(crate) fn read_lock_or_restart(&self) -> ResultOrRestart<u64> {
+        let version = self.await_node_unlocked();
+        if is_obsolete(version) {
+            return restart();
+        }
+        Ok(version)
+    }
+
+    pub(crate) fn check_or_restart(&self, version: u64) -> ResultOrRestart<()> {
+        self.read_unlock_or_restart(version)
+    }
+
+    pub(crate) fn read_unlock_or_restart(&self, version: u64) -> ResultOrRestart<()> {
+        if self.inner.load(Ordering::Acquire) != version {
+            return restart();
+        }
+        Ok(())
+    }
+
+    pub(crate) fn upgrade_to_write_lock_or_restart(&self, version: u64) -> ResultOrRestart<()> {
+        if self
+            .inner
+            .compare_exchange(
+                version,
+                set_locked_bit(version),
+                Ordering::Acquire,
+                Ordering::Relaxed,
+            )
+            .is_err()
+        {
+            return restart();
+        }
+        Ok(())
+    }
+
+    pub(crate) fn write_unlock(&self) {
+        // reset locked bit and overflow into version
+        self.inner.fetch_add(2, Ordering::Release);
+    }
+
+    pub(crate) fn write_unlock_obsolete(&self) {
+        // set obsolete, reset locked, overflow into version
+        self.inner.fetch_add(3, Ordering::Release);
+    }
+
+    // Helper functions
+    fn await_node_unlocked(&self) -> u64 {
+        let mut version = self.inner.load(Ordering::Acquire);
+        while (version & 2) == 2 {
+            // spinlock
+            std::thread::yield_now();
+            version = self.inner.load(Ordering::Acquire)
+        }
+        version
+    }
+}
+
+fn set_locked_bit(version: u64) -> u64 {
+    return version + 2;
+}
+
+fn is_obsolete(version: u64) -> bool {
+    return (version & 1) == 1;
+}
--- a/libs/neonart/src/algorithm/node_ptr.rs
+++ b/libs/neonart/src/algorithm/node_ptr.rs
@@ -0,0 +1,983 @@
+use std::marker::PhantomData;
+use std::ptr::NonNull;
+
+use super::lock_and_version::AtomicLockAndVersion;
+
+use crate::Allocator;
+use crate::Value;
+
+pub(crate) const MAX_PREFIX_LEN: usize = 8;
+
+enum NodeTag {
+    Internal4,
+    Internal16,
+    Internal48,
+    Internal256,
+    Leaf4,
+    Leaf16,
+    Leaf48,
+    Leaf256,
+}
+
+#[repr(C)]
+struct NodeBase {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+}
+
+pub(crate) struct NodePtr<V> {
+    ptr: *mut NodeBase,
+
+    phantom_value: PhantomData<V>,
+}
+
+impl<V> std::fmt::Debug for NodePtr<V> {
+    fn fmt(&self, fmt: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
+        write!(fmt, "0x{}", self.ptr.addr())
+    }
+}
+
+impl<V> Copy for NodePtr<V> {}
+impl<V> Clone for NodePtr<V> {
+    fn clone(&self) -> NodePtr<V> {
+        NodePtr {
+            ptr: self.ptr,
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+enum NodeVariant<'a, V> {
+    Internal4(&'a NodeInternal4<V>),
+    Internal16(&'a NodeInternal16<V>),
+    Internal48(&'a NodeInternal48<V>),
+    Internal256(&'a NodeInternal256<V>),
+    Leaf4(&'a NodeLeaf4<V>),
+    Leaf16(&'a NodeLeaf16<V>),
+    Leaf48(&'a NodeLeaf48<V>),
+    Leaf256(&'a NodeLeaf256<V>),
+}
+
+enum NodeVariantMut<'a, V> {
+    Internal4(&'a mut NodeInternal4<V>),
+    Internal16(&'a mut NodeInternal16<V>),
+    Internal48(&'a mut NodeInternal48<V>),
+    Internal256(&'a mut NodeInternal256<V>),
+    Leaf4(&'a mut NodeLeaf4<V>),
+    Leaf16(&'a mut NodeLeaf16<V>),
+    Leaf48(&'a mut NodeLeaf48<V>),
+    Leaf256(&'a mut NodeLeaf256<V>),
+}
+
+pub(crate) enum ChildOrValuePtr<V> {
+    Child(NodePtr<V>),
+    Value(*const V),
+}
+
+#[repr(C)]
+struct NodeInternal4<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+    num_children: u8,
+
+    child_keys: [u8; 4],
+    child_ptrs: [NodePtr<V>; 4],
+}
+
+#[repr(C)]
+struct NodeInternal16<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_children: u8,
+    child_keys: [u8; 16],
+    child_ptrs: [NodePtr<V>; 16],
+}
+
+const INVALID_CHILD_INDEX: u8 = u8::MAX;
+
+#[repr(C)]
+struct NodeInternal48<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_children: u8,
+    child_indexes: [u8; 256],
+    child_ptrs: [NodePtr<V>; 48],
+}
+
+#[repr(C)]
+pub(crate) struct NodeInternal256<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_children: u16,
+    child_ptrs: [NodePtr<V>; 256],
+}
+
+#[repr(C)]
+struct NodeLeaf4<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_values: u8,
+    child_keys: [u8; 4],
+    child_values: [Option<V>; 4],
+}
+
+#[repr(C)]
+struct NodeLeaf16<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_values: u8,
+    child_keys: [u8; 16],
+    child_values: [Option<V>; 16],
+}
+
+#[repr(C)]
+struct NodeLeaf48<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_values: u8,
+    child_indexes: [u8; 256],
+    child_values: [Option<V>; 48],
+}
+
+#[repr(C)]
+struct NodeLeaf256<V> {
+    tag: NodeTag,
+    lock_and_version: AtomicLockAndVersion,
+
+    prefix: [u8; MAX_PREFIX_LEN],
+    prefix_len: u8,
+
+    num_values: u16,
+    child_values: [Option<V>; 256],
+}
+
+impl<V> NodePtr<V> {
+    pub(crate) fn is_leaf(&self) -> bool {
+        match self.variant() {
+            NodeVariant::Internal4(_) => false,
+            NodeVariant::Internal16(_) => false,
+            NodeVariant::Internal48(_) => false,
+            NodeVariant::Internal256(_) => false,
+            NodeVariant::Leaf4(_) => true,
+            NodeVariant::Leaf16(_) => true,
+            NodeVariant::Leaf48(_) => true,
+            NodeVariant::Leaf256(_) => true,
+        }
+    }
+
+    pub(crate) fn lockword(&self) -> &AtomicLockAndVersion {
+        match self.variant() {
+            NodeVariant::Internal4(n) => &n.lock_and_version,
+            NodeVariant::Internal16(n) => &n.lock_and_version,
+            NodeVariant::Internal48(n) => &n.lock_and_version,
+            NodeVariant::Internal256(n) => &n.lock_and_version,
+            NodeVariant::Leaf4(n) => &n.lock_and_version,
+            NodeVariant::Leaf16(n) => &n.lock_and_version,
+            NodeVariant::Leaf48(n) => &n.lock_and_version,
+            NodeVariant::Leaf256(n) => &n.lock_and_version,
+        }
+    }
+
+    pub(crate) fn is_null(&self) -> bool {
+        self.ptr.is_null()
+    }
+
+    pub(crate) const fn null() -> NodePtr<V> {
+        NodePtr {
+            ptr: std::ptr::null_mut(),
+            phantom_value: PhantomData,
+        }
+    }
+
+    fn variant(&self) -> NodeVariant<V> {
+        unsafe {
+            match (*self.ptr).tag {
+                NodeTag::Internal4 => NodeVariant::Internal4(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal4<V>>()).as_ref(),
+                ),
+                NodeTag::Internal16 => NodeVariant::Internal16(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal16<V>>()).as_ref(),
+                ),
+                NodeTag::Internal48 => NodeVariant::Internal48(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal48<V>>()).as_ref(),
+                ),
+                NodeTag::Internal256 => NodeVariant::Internal256(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal256<V>>()).as_ref(),
+                ),
+                NodeTag::Leaf4 => NodeVariant::Leaf4(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf4<V>>()).as_ref(),
+                ),
+                NodeTag::Leaf16 => NodeVariant::Leaf16(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf16<V>>()).as_ref(),
+                ),
+                NodeTag::Leaf48 => NodeVariant::Leaf48(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf48<V>>()).as_ref(),
+                ),
+                NodeTag::Leaf256 => NodeVariant::Leaf256(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf256<V>>()).as_ref(),
+                ),
+            }
+        }
+    }
+
+    fn variant_mut(&mut self) -> NodeVariantMut<V> {
+        unsafe {
+            match (*self.ptr).tag {
+                NodeTag::Internal4 => NodeVariantMut::Internal4(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal4<V>>()).as_mut(),
+                ),
+                NodeTag::Internal16 => NodeVariantMut::Internal16(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal16<V>>()).as_mut(),
+                ),
+                NodeTag::Internal48 => NodeVariantMut::Internal48(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal48<V>>()).as_mut(),
+                ),
+                NodeTag::Internal256 => NodeVariantMut::Internal256(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeInternal256<V>>()).as_mut(),
+                ),
+                NodeTag::Leaf4 => NodeVariantMut::Leaf4(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf4<V>>()).as_mut(),
+                ),
+                NodeTag::Leaf16 => NodeVariantMut::Leaf16(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf16<V>>()).as_mut(),
+                ),
+                NodeTag::Leaf48 => NodeVariantMut::Leaf48(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf48<V>>()).as_mut(),
+                ),
+                NodeTag::Leaf256 => NodeVariantMut::Leaf256(
+                    NonNull::new_unchecked(self.ptr.cast::<NodeLeaf256<V>>()).as_mut(),
+                ),
+            }
+        }
+    }
+}
+
+impl<V: Value> NodePtr<V> {
+    pub(crate) fn prefix_matches(&self, key: &[u8]) -> Option<usize> {
+        let node_prefix = self.get_prefix();
+        assert!(node_prefix.len() <= key.len()); // because we only use fixed-size keys
+        if &key[0..node_prefix.len()] != node_prefix {
+            None
+        } else {
+            Some(node_prefix.len())
+        }
+    }
+
+    pub(crate) fn get_prefix(&self) -> &[u8] {
+        match self.variant() {
+            NodeVariant::Internal4(n) => n.get_prefix(),
+            NodeVariant::Internal16(n) => n.get_prefix(),
+            NodeVariant::Internal48(n) => n.get_prefix(),
+            NodeVariant::Internal256(n) => n.get_prefix(),
+            NodeVariant::Leaf4(n) => n.get_prefix(),
+            NodeVariant::Leaf16(n) => n.get_prefix(),
+            NodeVariant::Leaf48(n) => n.get_prefix(),
+            NodeVariant::Leaf256(n) => n.get_prefix(),
+        }
+    }
+
+    pub(crate) fn is_full(&self) -> bool {
+        match self.variant() {
+            NodeVariant::Internal4(n) => n.is_full(),
+            NodeVariant::Internal16(n) => n.is_full(),
+            NodeVariant::Internal48(n) => n.is_full(),
+            NodeVariant::Internal256(n) => n.is_full(),
+            NodeVariant::Leaf4(n) => n.is_full(),
+            NodeVariant::Leaf16(n) => n.is_full(),
+            NodeVariant::Leaf48(n) => n.is_full(),
+            NodeVariant::Leaf256(n) => n.is_full(),
+        }
+    }
+
+    pub(crate) fn find_child_or_value(&self, key_byte: u8) -> Option<ChildOrValuePtr<V>> {
+        match self.variant() {
+            NodeVariant::Internal4(n) => n.find_child(key_byte).map(|c| ChildOrValuePtr::Child(c)),
+            NodeVariant::Internal16(n) => n.find_child(key_byte).map(|c| ChildOrValuePtr::Child(c)),
+            NodeVariant::Internal48(n) => n.find_child(key_byte).map(|c| ChildOrValuePtr::Child(c)),
+            NodeVariant::Internal256(n) => {
+                n.find_child(key_byte).map(|c| ChildOrValuePtr::Child(c))
+            }
+            NodeVariant::Leaf4(n) => n
+                .get_leaf_value(key_byte)
+                .map(|v| ChildOrValuePtr::Value(v)),
+            NodeVariant::Leaf16(n) => n
+                .get_leaf_value(key_byte)
+                .map(|v| ChildOrValuePtr::Value(v)),
+            NodeVariant::Leaf48(n) => n
+                .get_leaf_value(key_byte)
+                .map(|v| ChildOrValuePtr::Value(v)),
+            NodeVariant::Leaf256(n) => n
+                .get_leaf_value(key_byte)
+                .map(|v| ChildOrValuePtr::Value(v)),
+        }
+    }
+
+    pub(crate) fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        match self.variant_mut() {
+            NodeVariantMut::Internal4(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Internal16(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Internal48(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Internal256(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Leaf4(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Leaf16(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Leaf48(n) => n.truncate_prefix(new_prefix_len),
+            NodeVariantMut::Leaf256(n) => n.truncate_prefix(new_prefix_len),
+        }
+    }
+
+    pub(crate) fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        match self.variant() {
+            NodeVariant::Internal4(n) => n.grow(allocator),
+            NodeVariant::Internal16(n) => n.grow(allocator),
+            NodeVariant::Internal48(n) => n.grow(allocator),
+            NodeVariant::Internal256(_) => panic!("cannot grow Internal256 node"),
+            NodeVariant::Leaf4(n) => n.grow(allocator),
+            NodeVariant::Leaf16(n) => n.grow(allocator),
+            NodeVariant::Leaf48(n) => n.grow(allocator),
+            NodeVariant::Leaf256(_) => panic!("cannot grow Leaf256 node"),
+        }
+    }
+
+    pub(crate) fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        match self.variant_mut() {
+            NodeVariantMut::Internal4(n) => n.insert_child(key_byte, child),
+            NodeVariantMut::Internal16(n) => n.insert_child(key_byte, child),
+            NodeVariantMut::Internal48(n) => n.insert_child(key_byte, child),
+            NodeVariantMut::Internal256(n) => n.insert_child(key_byte, child),
+            NodeVariantMut::Leaf4(_)
+            | NodeVariantMut::Leaf16(_)
+            | NodeVariantMut::Leaf48(_)
+            | NodeVariantMut::Leaf256(_) => panic!("insert_child called on leaf node"),
+        }
+    }
+
+    pub(crate) fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        match self.variant_mut() {
+            NodeVariantMut::Internal4(n) => n.replace_child(key_byte, replacement),
+            NodeVariantMut::Internal16(n) => n.replace_child(key_byte, replacement),
+            NodeVariantMut::Internal48(n) => n.replace_child(key_byte, replacement),
+            NodeVariantMut::Internal256(n) => n.replace_child(key_byte, replacement),
+            NodeVariantMut::Leaf4(_)
+            | NodeVariantMut::Leaf16(_)
+            | NodeVariantMut::Leaf48(_)
+            | NodeVariantMut::Leaf256(_) => panic!("replace_child called on leaf node"),
+        }
+    }
+
+    pub(crate) fn insert_value(&mut self, key_byte: u8, value: V) {
+        match self.variant_mut() {
+            NodeVariantMut::Internal4(_)
+            | NodeVariantMut::Internal16(_)
+            | NodeVariantMut::Internal48(_)
+            | NodeVariantMut::Internal256(_) => panic!("insert_value called on internal node"),
+            NodeVariantMut::Leaf4(n) => n.insert_value(key_byte, value),
+            NodeVariantMut::Leaf16(n) => n.insert_value(key_byte, value),
+            NodeVariantMut::Leaf48(n) => n.insert_value(key_byte, value),
+            NodeVariantMut::Leaf256(n) => n.insert_value(key_byte, value),
+        }
+    }
+}
+
+pub fn new_root<V: Value>(allocator: &Allocator) -> NodePtr<V> {
+    NodePtr {
+        ptr: allocator.alloc(NodeInternal256::<V>::new()).as_ptr().cast(),
+        phantom_value: PhantomData,
+    }
+}
+
+pub fn new_internal<V: Value>(prefix: &[u8], allocator: &Allocator) -> NodePtr<V> {
+    let mut node = allocator.alloc(NodeInternal4 {
+        tag: NodeTag::Internal4,
+        lock_and_version: AtomicLockAndVersion::new(),
+
+        prefix: [8; MAX_PREFIX_LEN],
+        prefix_len: prefix.len() as u8,
+        num_children: 0,
+
+        child_keys: [0; 4],
+        child_ptrs: [const { NodePtr::null() }; 4],
+    });
+    node.prefix[0..prefix.len()].copy_from_slice(prefix);
+
+    node.as_ptr().into()
+}
+
+pub fn new_leaf<V: Value>(prefix: &[u8], allocator: &Allocator) -> NodePtr<V> {
+    let mut node = allocator.alloc(NodeLeaf4 {
+        tag: NodeTag::Leaf4,
+        lock_and_version: AtomicLockAndVersion::new(),
+
+        prefix: [8; MAX_PREFIX_LEN],
+        prefix_len: prefix.len() as u8,
+        num_values: 0,
+
+        child_keys: [0; 4],
+        child_values: [const { None }; 4],
+    });
+    node.prefix[0..prefix.len()].copy_from_slice(prefix);
+
+    node.as_ptr().into()
+}
+
+impl<V: Value> NodeInternal4<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn find_child(&self, key: u8) -> Option<NodePtr<V>> {
+        for i in 0..self.num_children as usize {
+            if self.child_keys[i] == key {
+                return Some(self.child_ptrs[i]);
+            }
+        }
+        None
+    }
+
+    fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        for i in 0..self.num_children as usize {
+            if self.child_keys[i] == key_byte {
+                self.child_ptrs[i] = replacement;
+                return;
+            }
+        }
+        panic!("could not re-find parent with key {}", key_byte);
+    }
+
+    fn is_full(&self) -> bool {
+        self.num_children == 4
+    }
+
+    fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        assert!(self.num_children < 4);
+
+        let idx = self.num_children as usize;
+        self.child_keys[idx] = key_byte;
+        self.child_ptrs[idx] = child;
+        self.num_children += 1;
+    }
+
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node16 = allocator.alloc(NodeInternal16 {
+            tag: NodeTag::Internal16,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_children: self.num_children,
+
+            child_keys: [0; 16],
+            child_ptrs: [const { NodePtr::null() }; 16],
+        });
+        for i in 0..self.num_children as usize {
+            node16.child_keys[i] = self.child_keys[i];
+            node16.child_ptrs[i] = self.child_ptrs[i];
+        }
+
+        node16.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeInternal16<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn find_child(&self, key_byte: u8) -> Option<NodePtr<V>> {
+        for i in 0..self.num_children as usize {
+            if self.child_keys[i] == key_byte {
+                return Some(self.child_ptrs[i]);
+            }
+        }
+        None
+    }
+
+    fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        for i in 0..self.num_children as usize {
+            if self.child_keys[i] == key_byte {
+                self.child_ptrs[i] = replacement;
+                return;
+            }
+        }
+        panic!("could not re-find parent with key {}", key_byte);
+    }
+
+    fn is_full(&self) -> bool {
+        self.num_children == 16
+    }
+
+    fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        assert!(self.num_children < 16);
+
+        let idx = self.num_children as usize;
+        self.child_keys[idx] = key_byte;
+        self.child_ptrs[idx] = child;
+        self.num_children += 1;
+    }
+
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node48 = allocator.alloc(NodeInternal48 {
+            tag: NodeTag::Internal48,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_children: self.num_children,
+
+            child_indexes: [INVALID_CHILD_INDEX; 256],
+            child_ptrs: [const { NodePtr::null() }; 48],
+        });
+        for i in 0..self.num_children as usize {
+            let idx = self.child_keys[i] as usize;
+            node48.child_indexes[idx] = i as u8;
+            node48.child_ptrs[i] = self.child_ptrs[i];
+        }
+
+        node48.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeInternal48<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn find_child(&self, key_byte: u8) -> Option<NodePtr<V>> {
+        let idx = self.child_indexes[key_byte as usize];
+        if idx != INVALID_CHILD_INDEX {
+            Some(self.child_ptrs[idx as usize])
+        } else {
+            None
+        }
+    }
+
+    fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        let idx = self.child_indexes[key_byte as usize];
+        if idx != INVALID_CHILD_INDEX {
+            self.child_ptrs[idx as usize] = replacement
+        } else {
+            panic!("could not re-find parent with key {}", key_byte);
+        }
+    }
+
+    fn is_full(&self) -> bool {
+        self.num_children == 48
+    }
+
+    fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        assert!(self.num_children < 48);
+        assert!(self.child_indexes[key_byte as usize] == INVALID_CHILD_INDEX);
+        let idx = self.num_children;
+        self.child_indexes[key_byte as usize] = idx;
+        self.child_ptrs[idx as usize] = child;
+        self.num_children += 1;
+    }
+
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node256 = allocator.alloc(NodeInternal256 {
+            tag: NodeTag::Internal256,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_children: self.num_children as u16,
+
+            child_ptrs: [const { NodePtr::null() }; 256],
+        });
+        for i in 0..256 {
+            let idx = self.child_indexes[i];
+            if idx != INVALID_CHILD_INDEX {
+                node256.child_ptrs[i] = self.child_ptrs[idx as usize];
+            }
+        }
+        node256.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeInternal256<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn find_child(&self, key_byte: u8) -> Option<NodePtr<V>> {
+        let idx = key_byte as usize;
+        if !self.child_ptrs[idx].is_null() {
+            Some(self.child_ptrs[idx])
+        } else {
+            None
+        }
+    }
+
+    fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        let idx = key_byte as usize;
+        if !self.child_ptrs[idx].is_null() {
+            self.child_ptrs[idx] = replacement
+        } else {
+            panic!("could not re-find parent with key {}", key_byte);
+        }
+    }
+
+    fn is_full(&self) -> bool {
+        self.num_children == 256
+    }
+
+    fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        assert!(self.num_children < 256);
+        assert!(self.child_ptrs[key_byte as usize].is_null());
+        self.child_ptrs[key_byte as usize] = child;
+        self.num_children += 1;
+    }
+}
+
+impl<V: Value> NodeLeaf4<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn get_leaf_value<'a: 'b, 'b>(&'a self, key: u8) -> Option<&'b V> {
+        for i in 0..self.num_values {
+            if self.child_keys[i as usize] == key {
+                assert!(self.child_values[i as usize].is_some());
+                return self.child_values[i as usize].as_ref();
+            }
+        }
+        None
+    }
+    fn is_full(&self) -> bool {
+        self.num_values == 4
+    }
+
+    fn insert_value(&mut self, key_byte: u8, value: V) {
+        assert!(self.num_values < 16);
+
+        let idx = self.num_values as usize;
+        self.child_keys[idx] = key_byte;
+        self.child_values[idx] = Some(value);
+        self.num_values += 1;
+    }
+
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node16 = allocator.alloc(NodeLeaf16 {
+            tag: NodeTag::Leaf16,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_values: self.num_values,
+
+            child_keys: [0; 16],
+            child_values: [const { None }; 16],
+        });
+        for i in 0..self.num_values as usize {
+            node16.child_keys[i] = self.child_keys[i];
+            node16.child_values[i] = self.child_values[i].clone();
+        }
+        node16.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeLeaf16<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn get_leaf_value(&self, key: u8) -> Option<&V> {
+        for i in 0..self.num_values {
+            if self.child_keys[i as usize] == key {
+                assert!(self.child_values[i as usize].is_some());
+                return self.child_values[i as usize].as_ref();
+            }
+        }
+        None
+    }
+    fn is_full(&self) -> bool {
+        self.num_values == 16
+    }
+
+    fn insert_value(&mut self, key_byte: u8, value: V) {
+        assert!(self.num_values < 16);
+
+        let idx = self.num_values as usize;
+        self.child_keys[idx] = key_byte;
+        self.child_values[idx] = Some(value);
+        self.num_values += 1;
+    }
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node48 = allocator.alloc(NodeLeaf48 {
+            tag: NodeTag::Leaf48,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_values: self.num_values,
+
+            child_indexes: [INVALID_CHILD_INDEX; 256],
+            child_values: [const { None }; 48],
+        });
+        for i in 0..self.num_values {
+            let idx = self.child_keys[i as usize];
+            node48.child_indexes[idx as usize] = i;
+            node48.child_values[i as usize] = self.child_values[i as usize].clone();
+        }
+        node48.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeLeaf48<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn get_leaf_value(&self, key: u8) -> Option<&V> {
+        let idx = self.child_indexes[key as usize];
+        if idx != INVALID_CHILD_INDEX {
+            assert!(self.child_values[idx as usize].is_some());
+            self.child_values[idx as usize].as_ref()
+        } else {
+            None
+        }
+    }
+    fn is_full(&self) -> bool {
+        self.num_values == 48
+    }
+
+    fn insert_value(&mut self, key_byte: u8, value: V) {
+        assert!(self.num_values < 48);
+        assert!(self.child_indexes[key_byte as usize] == INVALID_CHILD_INDEX);
+        let idx = self.num_values;
+        self.child_indexes[key_byte as usize] = idx;
+        self.child_values[idx as usize] = Some(value);
+        self.num_values += 1;
+    }
+    fn grow(&self, allocator: &Allocator) -> NodePtr<V> {
+        let mut node256 = allocator.alloc(NodeLeaf256 {
+            tag: NodeTag::Leaf256,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: self.prefix.clone(),
+            prefix_len: self.prefix_len,
+            num_values: self.num_values as u16,
+
+            child_values: [const { None }; 256],
+        });
+        for i in 0..256 {
+            let idx = self.child_indexes[i];
+            if idx != INVALID_CHILD_INDEX {
+                node256.child_values[i] = self.child_values[idx as usize].clone();
+            }
+        }
+        node256.as_ptr().into()
+    }
+}
+
+impl<V: Value> NodeLeaf256<V> {
+    fn get_prefix(&self) -> &[u8] {
+        &self.prefix[0..self.prefix_len as usize]
+    }
+
+    fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        assert!(new_prefix_len < self.prefix_len as usize);
+        let prefix = &mut self.prefix;
+        let offset = self.prefix_len as usize - new_prefix_len;
+        for i in 0..new_prefix_len {
+            prefix[i] = prefix[i + offset];
+        }
+        self.prefix_len = new_prefix_len as u8;
+    }
+
+    fn get_leaf_value(&self, key: u8) -> Option<&V> {
+        let idx = key as usize;
+        self.child_values[idx].as_ref()
+    }
+    fn is_full(&self) -> bool {
+        self.num_values == 256
+    }
+
+    fn insert_value(&mut self, key_byte: u8, value: V) {
+        assert!(self.num_values < 256);
+        assert!(self.child_values[key_byte as usize].is_none());
+        self.child_values[key_byte as usize] = Some(value);
+        self.num_values += 1;
+    }
+}
+
+impl<V: Value> NodeInternal256<V> {
+    pub(crate) fn new() -> NodeInternal256<V> {
+        NodeInternal256 {
+            tag: NodeTag::Internal256,
+            lock_and_version: AtomicLockAndVersion::new(),
+
+            prefix: [0; MAX_PREFIX_LEN],
+            prefix_len: 0,
+            num_children: 0,
+
+            child_ptrs: [const { NodePtr::null() }; 256],
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeInternal4<V>> for NodePtr<V> {
+    fn from(val: *mut NodeInternal4<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+impl<V: Value> From<*mut NodeInternal16<V>> for NodePtr<V> {
+    fn from(val: *mut NodeInternal16<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeInternal48<V>> for NodePtr<V> {
+    fn from(val: *mut NodeInternal48<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeInternal256<V>> for NodePtr<V> {
+    fn from(val: *mut NodeInternal256<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeLeaf4<V>> for NodePtr<V> {
+    fn from(val: *mut NodeLeaf4<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+impl<V: Value> From<*mut NodeLeaf16<V>> for NodePtr<V> {
+    fn from(val: *mut NodeLeaf16<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeLeaf48<V>> for NodePtr<V> {
+    fn from(val: *mut NodeLeaf48<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
+
+impl<V: Value> From<*mut NodeLeaf256<V>> for NodePtr<V> {
+    fn from(val: *mut NodeLeaf256<V>) -> NodePtr<V> {
+        NodePtr {
+            ptr: val.cast(),
+            phantom_value: PhantomData,
+        }
+    }
+}
--- a/libs/neonart/src/algorithm/node_ref.rs
+++ b/libs/neonart/src/algorithm/node_ref.rs
@@ -0,0 +1,202 @@
+use std::fmt::Debug;
+use std::marker::PhantomData;
+
+use super::lock_and_version::ResultOrRestart;
+use super::node_ptr;
+use super::node_ptr::ChildOrValuePtr;
+use super::node_ptr::NodePtr;
+use crate::EpochPin;
+use crate::algorithm::lock_and_version::AtomicLockAndVersion;
+use crate::{Allocator, Value};
+
+pub struct NodeRef<'e, V> {
+    ptr: NodePtr<V>,
+
+    phantom: PhantomData<&'e EpochPin>,
+}
+
+impl<'e, V> Debug for NodeRef<'e, V> {
+    fn fmt(&self, fmt: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
+        write!(fmt, "{:?}", self.ptr)
+    }
+}
+
+impl<'e, V: Value> NodeRef<'e, V> {
+    pub(crate) fn from_root_ptr(root_ptr: NodePtr<V>) -> NodeRef<'e, V> {
+        NodeRef {
+            ptr: root_ptr,
+            phantom: PhantomData,
+        }
+    }
+
+    pub(crate) fn read_lock_or_restart(&self) -> ResultOrRestart<ReadLockedNodeRef<'e, V>> {
+        let version = self.lockword().read_lock_or_restart()?;
+        Ok(ReadLockedNodeRef {
+            ptr: self.ptr,
+            version,
+            phantom: self.phantom,
+        })
+    }
+
+    fn lockword(&self) -> &AtomicLockAndVersion {
+        self.ptr.lockword()
+    }
+}
+
+/// A reference to a node that has been optimistically read-locked. The functions re-check
+/// the version after each read.
+pub struct ReadLockedNodeRef<'e, V> {
+    ptr: NodePtr<V>,
+    version: u64,
+
+    phantom: PhantomData<&'e EpochPin>,
+}
+
+pub(crate) enum ChildOrValue<'e, V> {
+    Child(NodeRef<'e, V>),
+    Value(*const V),
+}
+
+impl<'e, V: Value> ReadLockedNodeRef<'e, V> {
+    pub(crate) fn is_full(&self) -> bool {
+        self.ptr.is_full()
+    }
+
+    pub(crate) fn get_prefix(&self) -> &[u8] {
+        self.ptr.get_prefix()
+    }
+
+    /// Note: because we're only holding a read lock, the prefix can change concurrently.
+    /// You must be prepared to restart, if read_unlock() returns error later.
+    ///
+    /// Returns the length of the prefix, or None if it's not a match
+    pub(crate) fn prefix_matches(&self, key: &[u8]) -> Option<usize> {
+        self.ptr.prefix_matches(key)
+    }
+
+    pub(crate) fn find_child_or_value_or_restart(
+        &self,
+        key_byte: u8,
+    ) -> ResultOrRestart<Option<ChildOrValue<'e, V>>> {
+        let child_or_value = self.ptr.find_child_or_value(key_byte);
+        self.ptr.lockword().check_or_restart(self.version)?;
+
+        match child_or_value {
+            None => Ok(None),
+            Some(ChildOrValuePtr::Value(vptr)) => Ok(Some(ChildOrValue::Value(vptr))),
+            Some(ChildOrValuePtr::Child(child_ptr)) => Ok(Some(ChildOrValue::Child(NodeRef {
+                ptr: child_ptr,
+                phantom: self.phantom,
+            }))),
+        }
+    }
+
+    pub(crate) fn upgrade_to_write_lock_or_restart(
+        self,
+    ) -> ResultOrRestart<WriteLockedNodeRef<'e, V>> {
+        self.ptr
+            .lockword()
+            .upgrade_to_write_lock_or_restart(self.version)?;
+
+        Ok(WriteLockedNodeRef {
+            ptr: self.ptr,
+            phantom: self.phantom,
+        })
+    }
+
+    pub(crate) fn read_unlock_or_restart(self) -> ResultOrRestart<()> {
+        self.ptr.lockword().check_or_restart(self.version)?;
+        Ok(())
+    }
+}
+
+/// A reference to a node that has been optimistically read-locked. The functions re-check
+/// the version after each read.
+pub struct WriteLockedNodeRef<'e, V> {
+    ptr: NodePtr<V>,
+    phantom: PhantomData<&'e EpochPin>,
+}
+
+impl<'e, V: Value> WriteLockedNodeRef<'e, V> {
+    pub(crate) fn is_leaf(&self) -> bool {
+        self.ptr.is_leaf()
+    }
+
+    pub(crate) fn write_unlock(mut self) {
+        self.ptr.lockword().write_unlock();
+        self.ptr = NodePtr::null();
+    }
+
+    pub(crate) fn write_unlock_obsolete(mut self) {
+        self.ptr.lockword().write_unlock_obsolete();
+        self.ptr = NodePtr::null();
+    }
+
+    pub(crate) fn get_prefix(&self) -> &[u8] {
+        self.ptr.get_prefix()
+    }
+
+    pub(crate) fn truncate_prefix(&mut self, new_prefix_len: usize) {
+        self.ptr.truncate_prefix(new_prefix_len)
+    }
+
+    pub(crate) fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        self.ptr.insert_child(key_byte, child)
+    }
+
+    pub(crate) fn insert_value(&mut self, key_byte: u8, value: V) {
+        self.ptr.insert_value(key_byte, value)
+    }
+
+    pub(crate) fn grow(&self, allocator: &Allocator) -> NewNodeRef<V> {
+        let new_node = self.ptr.grow(allocator);
+        NewNodeRef { ptr: new_node }
+    }
+
+    pub(crate) fn as_ptr(&self) -> NodePtr<V> {
+        self.ptr
+    }
+
+    pub(crate) fn replace_child(&mut self, key_byte: u8, replacement: NodePtr<V>) {
+        self.ptr.replace_child(key_byte, replacement);
+    }
+}
+
+impl<'e, V> Drop for WriteLockedNodeRef<'e, V> {
+    fn drop(&mut self) {
+        if !self.ptr.is_null() {
+            self.ptr.lockword().write_unlock();
+        }
+    }
+}
+
+pub(crate) struct NewNodeRef<V> {
+    ptr: NodePtr<V>,
+}
+
+impl<V: Value> NewNodeRef<V> {
+    pub(crate) fn insert_child(&mut self, key_byte: u8, child: NodePtr<V>) {
+        self.ptr.insert_child(key_byte, child)
+    }
+
+    pub(crate) fn insert_value(&mut self, key_byte: u8, value: V) {
+        self.ptr.insert_value(key_byte, value)
+    }
+
+    pub(crate) fn into_ptr(self) -> NodePtr<V> {
+        let ptr = self.ptr;
+        ptr
+    }
+}
+
+pub(crate) fn new_internal<V: Value>(prefix: &[u8], allocator: &Allocator) -> NewNodeRef<V> {
+    NewNodeRef {
+        ptr: node_ptr::new_internal(prefix, allocator),
+    }
+}
+
+pub(crate) fn new_leaf<V: Value>(prefix: &[u8], allocator: &Allocator) -> NewNodeRef<V> {
+    NewNodeRef {
+        ptr: node_ptr::new_leaf(prefix, allocator),
+    }
+}
--- a/libs/neonart/src/allocator.rs
+++ b/libs/neonart/src/allocator.rs
@@ -0,0 +1,107 @@
+use std::marker::PhantomData;
+use std::mem::MaybeUninit;
+use std::ops::{Deref, DerefMut};
+use std::ptr::NonNull;
+use std::sync::atomic::{AtomicUsize, Ordering};
+
+pub struct Allocator {
+    area: *mut MaybeUninit<u8>,
+    allocated: AtomicUsize,
+    size: usize,
+}
+
+// FIXME: I don't know if these are really safe...
+unsafe impl Send for Allocator {}
+unsafe impl Sync for Allocator {}
+
+#[repr(transparent)]
+pub struct AllocatedBox<'a, T> {
+    inner: NonNull<T>,
+
+    _phantom: PhantomData<&'a Allocator>,
+}
+
+// FIXME: I don't know if these are really safe...
+unsafe impl<'a, T> Send for AllocatedBox<'a, T> {}
+unsafe impl<'a, T> Sync for AllocatedBox<'a, T> {}
+
+impl<T> Deref for AllocatedBox<'_, T> {
+    type Target = T;
+
+    fn deref(&self) -> &T {
+        unsafe { self.inner.as_ref() }
+    }
+}
+
+impl<T> DerefMut for AllocatedBox<'_, T> {
+    fn deref_mut(&mut self) -> &mut T {
+        unsafe { self.inner.as_mut() }
+    }
+}
+
+impl<T> AsMut<T> for AllocatedBox<'_, T> {
+    fn as_mut(&mut self) -> &mut T {
+        unsafe { self.inner.as_mut() }
+    }
+}
+
+impl<T> AllocatedBox<'_, T> {
+    pub fn as_ptr(&self) -> *mut T {
+        self.inner.as_ptr()
+    }
+}
+
+const MAXALIGN: usize = std::mem::align_of::<usize>();
+
+impl Allocator {
+    pub fn new_uninit(area: &'static mut [MaybeUninit<u8>]) -> Allocator {
+        let ptr = area.as_mut_ptr();
+        let size = area.len();
+        Self::new_from_ptr(ptr, size)
+    }
+
+    pub fn new(area: &'static mut [u8]) -> Allocator {
+        let ptr: *mut MaybeUninit<u8> = area.as_mut_ptr().cast();
+        let size = area.len();
+        Self::new_from_ptr(ptr, size)
+    }
+
+    pub fn new_from_ptr(ptr: *mut MaybeUninit<u8>, size: usize) -> Allocator {
+        let padding = ptr.align_offset(MAXALIGN);
+
+        Allocator {
+            area: ptr,
+            allocated: AtomicUsize::new(padding),
+            size,
+        }
+    }
+
+    pub fn alloc<'a, T: Sized>(&'a self, value: T) -> AllocatedBox<'a, T> {
+        let sz = std::mem::size_of::<T>();
+
+        // pad all allocations to MAXALIGN boundaries
+        assert!(std::mem::align_of::<T>() <= MAXALIGN);
+        let sz = sz.next_multiple_of(MAXALIGN);
+
+        let offset = self.allocated.fetch_add(sz, Ordering::Relaxed);
+
+        if offset + sz > self.size {
+            panic!("out of memory");
+        }
+
+        let inner = unsafe {
+            let inner = self.area.offset(offset as isize).cast::<T>();
+            *inner = value;
+            NonNull::new_unchecked(inner)
+        };
+
+        AllocatedBox {
+            inner,
+            _phantom: PhantomData,
+        }
+    }
+
+    pub fn _dealloc_node<T>(&self, _node: AllocatedBox<T>) {
+        // doesn't free it immediately.
+    }
+}
--- a/libs/neonart/src/epoch.rs
+++ b/libs/neonart/src/epoch.rs
@@ -0,0 +1,23 @@
+//! This is similar to crossbeam_epoch crate, but works in shared memory
+//!
+//! FIXME: not implemented yet. (We haven't implemented removing any nodes from the ART
+//! tree, which is why we get away without this now)
+
+pub(crate) struct EpochPin {}
+
+pub(crate) fn pin_epoch() -> EpochPin {
+    EpochPin {}
+}
+
+/*
+struct CollectorGlobal {
+    epoch: AtomicU64,
+
+    participants: CachePadded<AtomicU64>, // make it an array
+}
+
+
+struct CollectorQueue {
+
+}
+*/
--- a/libs/neonart/src/lib.rs
+++ b/libs/neonart/src/lib.rs
@@ -0,0 +1,301 @@
+//! Adaptive Radix Tree (ART) implementation, with Optimistic Lock Coupling.
+//!
+//! The data structure is described in these two papers:
+//!
+//! [1] Leis, V. & Kemper, Alfons & Neumann, Thomas. (2013).
+//!     The adaptive radix tree: ARTful indexing for main-memory databases.
+//!     Proceedings - International Conference on Data Engineering. 38-49. 10.1109/ICDE.2013.6544812.
+//!     https://db.in.tum.de/~leis/papers/ART.pdf
+//!
+//! [2] Leis, Viktor & Scheibner, Florian & Kemper, Alfons & Neumann, Thomas. (2016).
+//!     The ART of practical synchronization.
+//!     1-8. 10.1145/2933349.2933352.
+//!     https://db.in.tum.de/~leis/papers/artsync.pdf
+//!
+//! [1] describes the base data structure, and [2] describes the Optimistic Lock Coupling that we
+//! use.
+//!
+//! The papers mention a few different variants. We have made the following choices in this
+//! implementation:
+//!
+//! - All keys have the same length
+//!
+//! - Multi-value leaves. The values are stored directly in one of the four different leaf node
+//!   types.
+//!
+//! - For collapsing inner nodes, we use the Pessimistic approach, where each inner node stores a
+//!   variable length "prefix", which stores the keys of all the one-way nodes which have been
+//!   removed. However, similar to the "hybrid" approach described in the paper, each node only has
+//!   space for a constant-size prefix of 8 bytes. If a node would have a longer prefix, then we
+//!   create create one-way nodes to store them. (There was no particular reason for this choice,
+//!   the "hybrid" approach described in the paper might be better.)
+//!
+//! - For concurrency, we use Optimistic Lock Coupling. The paper [2] also describes another method,
+//!   ROWEX, which generally performs better when there is contention, but that is not important
+//!   for use and Optimisic Lock Coupling is simpler to implement.
+//!
+//! ## Requirements
+//!
+//! This data structure is currently used for the integrated LFC, relsize and last-written LSN cache
+//! in the compute communicator, part of the 'neon' Postgres extension. We have some unique
+//! requirements, which is why we had to write our own. Namely:
+//!
+//! - The data structure has to live in fixed-sized shared memory segment. That rules out any
+//!   built-in Rust collections and most crates. (Except possibly with the 'allocator_api' rust
+//!   feature, which still nightly-only experimental as of this writing).
+//!
+//! - The data structure is accessed from multiple processes. Only one process updates the data
+//!   structure, but other processes perform reads. That rules out using built-in Rust locking
+//!   primitives like Mutex and RwLock, and most crates too.
+//!
+//! - Within the one process with write-access, multiple threads can perform updates concurrently.
+//!   That rules out using PostgreSQL LWLocks for the locking.
+//!
+//! The implementation is generic, and doesn't depend on any PostgreSQL specifics, but it has been
+//! written with that usage and the above constraints in mind. Some noteworthy assumptions:
+//!
+//! - Contention is assumed to be rare. In the integrated cache in PostgreSQL, there's higher level
+//!   locking in the PostgreSQL buffer manager, which ensures that two backends should not try to
+//!   read / write the same page at the same time. (Prefetching can conflict with actual reads,
+//!   however.)
+//!
+//!  - The keys in the integrated cache are 17 bytes long.
+//!
+//! ## Usage
+//!
+//! Because this is designed to be used as a Postgres shared memory data structure, initialization
+//! happens in three stages:
+//!
+//! 0. A fixed area of shared memory is allocated at postmaster startup.
+//!
+//! 1. TreeInitStruct::new() is called to initialize it, still in Postmaster process, before any
+//!    other process or thread is running. It returns a TreeInitStruct, which is inherited by all
+//!    the processes through fork().
+//!
+//! 2. One process may have write-access to the struct, by calling
+//!    [TreeInitStruct::attach_writer]. (That process is the communicator process.)
+//!
+//! 3. Other processes get read-access to the struct, by calling [TreeInitStruct::attach_reader]
+//!
+//! "Write access" means that you can insert / update / delete values in the tree.
+//!
+//! NOTE: The Values stored in the tree are sometimes moved, when a leaf node fills up and a new
+//! larger node needs to be allocated. The versioning and epoch-based allocator ensure that the data
+//! structure stays consistent, but if the Value has interior mutability, like atomic fields,
+//! updates to such fields might be lost if the leaf node is concurrently moved! If that becomes a
+//! problem, the version check could be passed up to the caller, so that the caller could detect the
+//! lost updates and retry the operation.
+//!
+//! ## Implementation
+//!
+//! node_ptr: Provides low-level implementations of the four different node types (eight actually,
+//! since there is an Internal and Leaf variant of each)
+//!
+//! lock_and_version.rs: Provides an abstraction for the combined lock and version counter on each
+//! node.
+//!
+//! node_ref.rs: The code in node_ptr.rs deals with raw pointers. node_ref.rs provides more type-safe
+//!   abstractions on top.
+//!
+//! algorithm.rs: Contains the functions to implement lookups and updates in the tree
+//!
+//! allocator.rs: Provides a facility to allocate memory for the tree nodes. (We must provide our
+//!   own abstraction for that because we need the data structure to live in a pre-allocated shared
+//!   memory segment).
+//!
+//! epoch.rs: The data structure requires that when a node is removed from the tree, it is not
+//!   immediately deallocated, but stays around for as long as concurrent readers might still have
+//!   pointers to them. This is enforced by an epoch system. This is similar to
+//!   e.g. crossbeam_epoch, but we couldn't use that either because it has to work across processes
+//!   communicating over the shared memory segment.
+//!
+//! ## See also
+//!
+//! There are some existing Rust ART implementations out there, but none of them filled all
+//! the requirements:
+//!
+//! - https://github.com/XiangpengHao/congee
+//! - https://github.com/declanvk/blart
+//!
+//! ## TODO
+//!
+//! - Removing values has not been implemented
+
+mod algorithm;
+mod allocator;
+mod epoch;
+
+use algorithm::RootPtr;
+
+use allocator::AllocatedBox;
+
+use std::fmt::Debug;
+use std::marker::PhantomData;
+use std::sync::atomic::{AtomicBool, Ordering};
+
+use crate::epoch::EpochPin;
+
+#[cfg(test)]
+mod tests;
+
+pub use allocator::Allocator;
+
+/// Fixed-length key type.
+///
+pub trait Key: Clone + Debug {
+    const KEY_LEN: usize;
+
+    fn as_bytes(&self) -> &[u8];
+}
+
+/// Values stored in the tree
+///
+/// Values need to be Cloneable, because when a node "grows", the value is copied to a new node and
+/// the old sticks around until all readers that might see the old value are gone.
+pub trait Value: Clone {}
+
+struct Tree<K: Key, V: Value> {
+    root: RootPtr<V>,
+
+    writer_attached: AtomicBool,
+
+    phantom_key: PhantomData<K>,
+}
+
+/// Struct created at postmaster startup
+pub struct TreeInitStruct<'t, K: Key, V: Value> {
+    tree: AllocatedBox<'t, Tree<K, V>>,
+
+    allocator: &'t Allocator,
+}
+
+/// The worker process has a reference to this. The write operations are only safe
+/// from the worker process
+pub struct TreeWriteAccess<'t, K: Key, V: Value>
+where
+    K: Key,
+    V: Value,
+{
+    tree: AllocatedBox<'t, Tree<K, V>>,
+
+    allocator: &'t Allocator,
+}
+
+/// The backends have a reference to this. It cannot be used to modify the tree
+pub struct TreeReadAccess<'t, K: Key, V: Value>
+where
+    K: Key,
+    V: Value,
+{
+    tree: AllocatedBox<'t, Tree<K, V>>,
+}
+
+impl<'a, 't: 'a, K: Key, V: Value> TreeInitStruct<'t, K, V> {
+    pub fn new(allocator: &'t Allocator) -> TreeInitStruct<'t, K, V> {
+        let tree = allocator.alloc(Tree {
+            root: algorithm::new_root(allocator),
+            writer_attached: AtomicBool::new(false),
+            phantom_key: PhantomData,
+        });
+
+        TreeInitStruct { tree, allocator }
+    }
+
+    pub fn attach_writer(self) -> TreeWriteAccess<'t, K, V> {
+        let previously_attached = self.tree.writer_attached.swap(true, Ordering::Relaxed);
+        if previously_attached {
+            panic!("writer already attached");
+        }
+        TreeWriteAccess {
+            tree: self.tree,
+            allocator: self.allocator,
+        }
+    }
+
+    pub fn attach_reader(self) -> TreeReadAccess<'t, K, V> {
+        TreeReadAccess { tree: self.tree }
+    }
+}
+
+impl<'t, K: Key + Clone, V: Value> TreeWriteAccess<'t, K, V> {
+    pub fn start_write(&'t self) -> TreeWriteGuard<'t, K, V> {
+        // TODO: grab epoch guard
+        TreeWriteGuard {
+            allocator: self.allocator,
+            tree: &self.tree,
+            epoch_pin: epoch::pin_epoch(),
+        }
+    }
+
+    pub fn start_read(&'t self) -> TreeReadGuard<'t, K, V> {
+        TreeReadGuard {
+            tree: &self.tree,
+            epoch_pin: epoch::pin_epoch(),
+        }
+    }
+}
+
+impl<'t, K: Key + Clone, V: Value> TreeReadAccess<'t, K, V> {
+    pub fn start_read(&'t self) -> TreeReadGuard<'t, K, V> {
+        TreeReadGuard {
+            tree: &self.tree,
+            epoch_pin: epoch::pin_epoch(),
+        }
+    }
+}
+
+pub struct TreeReadGuard<'t, K, V>
+where
+    K: Key,
+    V: Value,
+{
+    tree: &'t AllocatedBox<'t, Tree<K, V>>,
+
+    epoch_pin: EpochPin,
+}
+
+impl<'t, K: Key, V: Value> TreeReadGuard<'t, K, V> {
+    pub fn get(&self, key: &K) -> Option<V> {
+        algorithm::search(key, self.tree.root, &self.epoch_pin)
+    }
+}
+
+pub struct TreeWriteGuard<'t, K, V>
+where
+    K: Key,
+    V: Value,
+{
+    tree: &'t AllocatedBox<'t, Tree<K, V>>,
+    allocator: &'t Allocator,
+
+    epoch_pin: EpochPin,
+}
+
+impl<'t, K: Key, V: Value> TreeWriteGuard<'t, K, V> {
+    pub fn insert(&mut self, key: &K, value: V) {
+        self.update_with_fn(key, |_| Some(value))
+    }
+
+    pub fn update_with_fn<F>(&mut self, key: &K, value_fn: F)
+    where
+        F: FnOnce(Option<&V>) -> Option<V>,
+    {
+        algorithm::update_fn(
+            key,
+            value_fn,
+            self.tree.root,
+            self.allocator,
+            &self.epoch_pin,
+        )
+    }
+
+    pub fn get(&mut self, key: &K) -> Option<V> {
+        algorithm::search(key, self.tree.root, &self.epoch_pin)
+    }
+}
+
+impl<'t, K: Key, V: Value + Debug> TreeWriteGuard<'t, K, V> {
+    pub fn dump(&mut self) {
+        algorithm::dump_tree(self.tree.root, &self.epoch_pin)
+    }
+}
--- a/libs/neonart/src/tests.rs
+++ b/libs/neonart/src/tests.rs
@@ -0,0 +1,90 @@
+use std::collections::HashSet;
+
+use crate::Allocator;
+use crate::TreeInitStruct;
+
+use crate::{Key, Value};
+
+use rand::seq::SliceRandom;
+use rand::thread_rng;
+
+const TEST_KEY_LEN: usize = 16;
+
+#[derive(Clone, Copy, Debug)]
+struct TestKey([u8; TEST_KEY_LEN]);
+
+impl Key for TestKey {
+    const KEY_LEN: usize = TEST_KEY_LEN;
+
+    fn as_bytes(&self) -> &[u8] {
+        &self.0
+    }
+}
+
+impl From<u128> for TestKey {
+    fn from(val: u128) -> TestKey {
+        TestKey(val.to_be_bytes())
+    }
+}
+
+impl Value for usize {}
+
+fn test_inserts<K: Into<TestKey> + Copy>(keys: &[K]) {
+    const MEM_SIZE: usize = 10000000;
+    let area = Box::leak(Box::new_uninit_slice(MEM_SIZE));
+
+    let allocator = Box::leak(Box::new(Allocator::new_uninit(area)));
+
+    let init_struct = TreeInitStruct::<TestKey, usize>::new(allocator);
+    let tree_writer = init_struct.attach_writer();
+
+    for (idx, k) in keys.iter().enumerate() {
+        let mut w = tree_writer.start_write();
+        w.insert(&(*k).into(), idx);
+        eprintln!("INSERTED {:?}", Into::<TestKey>::into(*k));
+    }
+
+    //tree_writer.start_read().dump();
+
+    for (idx, k) in keys.iter().enumerate() {
+        let r = tree_writer.start_read();
+        let value = r.get(&(*k).into());
+        assert_eq!(value, Some(idx));
+    }
+}
+
+#[test]
+fn dense() {
+    // This exercises splitting a node with prefix
+    let keys: &[u128] = &[0, 1, 2, 3, 256];
+    test_inserts(keys);
+
+    // Dense keys
+    let mut keys: Vec<u128> = (0..10000).collect();
+    test_inserts(&keys);
+
+    // Do the same in random orders
+    for _ in 1..10 {
+        keys.shuffle(&mut thread_rng());
+        test_inserts(&keys);
+    }
+}
+
+#[test]
+fn sparse() {
+    // sparse keys
+    let mut keys: Vec<TestKey> = Vec::new();
+    let mut used_keys = HashSet::new();
+    for _ in 0..10000 {
+        loop {
+            let key = rand::random::<u128>();
+            if used_keys.get(&key).is_some() {
+                continue;
+            }
+            used_keys.insert(key);
+            keys.push(key.into());
+            break;
+        }
+    }
+    test_inserts(&keys);
+}
--- a/libs/tenant_size_model/src/calculation.rs
+++ b/libs/tenant_size_model/src/calculation.rs
@@ -77,7 +77,9 @@ impl StorageModel {
        }

        SizeResult {
-            total_size,
+            // If total_size is 0, it means that the tenant has all timelines offloaded; we need to report 1
+            // here so that the data point shows up in the s3 files.
+            total_size: total_size.max(1),
            segments: segment_results,
        }
    }
--- a/pageserver/Cargo.toml
+++ b/pageserver/Cargo.toml
@@ -42,12 +42,14 @@ nix.workspace = true
 num_cpus.workspace = true
 num-traits.workspace = true
 once_cell.workspace = true
+peekable.workspace = true
 pin-project-lite.workspace = true
 postgres_backend.workspace = true
 postgres-protocol.workspace = true
 postgres-types.workspace = true
 postgres_initdb.workspace = true
 pprof.workspace = true
+prost.workspace = true
 rand.workspace = true
 range-set-blaze = { version = "0.1.16", features = ["alloc"] }
 regex.workspace = true
@@ -60,6 +62,7 @@ serde_path_to_error.workspace = true
 serde_with.workspace = true
 sysinfo.workspace = true
 tokio-tar.workspace = true
+tonic.workspace = true
 thiserror.workspace = true
 tikv-jemallocator.workspace = true
 tokio = { workspace = true, features = ["process", "sync", "fs", "rt", "io-util", "time"] }
@@ -76,6 +79,7 @@ url.workspace = true
 walkdir.workspace = true
 metrics.workspace = true
 pageserver_api.workspace = true
+pageserver_data_api.workspace = true
 pageserver_client.workspace = true # for ResponseErrorMessageExt TOOD refactor that
 pageserver_compaction.workspace = true
 pem.workspace = true
--- a/pageserver/client_grpc/Cargo.toml
+++ b/pageserver/client_grpc/Cargo.toml
@@ -0,0 +1,13 @@
+[package]
+name = "pageserver_client_grpc"
+version = "0.1.0"
+edition = "2024"
+
+[dependencies]
+bytes.workspace = true
+http.workspace = true
+thiserror.workspace = true
+tonic.workspace = true
+tracing.workspace = true
+
+pageserver_data_api.workspace = true
--- a/pageserver/client_grpc/src/lib.rs
+++ b/pageserver/client_grpc/src/lib.rs
@@ -0,0 +1,221 @@
+//! Pageserver Data API client
+//!
+//! - Manage connections to pageserver
+//! - Send requests to correct shards
+//!
+use std::collections::HashMap;
+use std::sync::RwLock;
+
+use bytes::Bytes;
+use http;
+use thiserror::Error;
+use tonic;
+use tonic::metadata::AsciiMetadataValue;
+use tonic::transport::Channel;
+
+use pageserver_data_api::model::*;
+use pageserver_data_api::proto;
+
+type Shardno = u16;
+
+use pageserver_data_api::client::PageServiceClient;
+
+type MyPageServiceClient = pageserver_data_api::client::PageServiceClient<
+    tonic::service::interceptor::InterceptedService<tonic::transport::Channel, AuthInterceptor>,
+>;
+
+#[derive(Error, Debug)]
+pub enum PageserverClientError {
+    #[error("could not connect to service: {0}")]
+    ConnectError(#[from] tonic::transport::Error),
+    #[error("could not perform request: {0}`")]
+    RequestError(#[from] tonic::Status),
+
+    #[error("could not perform request: {0}`")]
+    InvalidUri(#[from] http::uri::InvalidUri),
+}
+
+pub struct PageserverClient {
+    _tenant_id: String,
+    _timeline_id: String,
+
+    _auth_token: Option<String>,
+
+    shard_map: HashMap<Shardno, String>,
+
+    channels: RwLock<HashMap<Shardno, Channel>>,
+
+    auth_interceptor: AuthInterceptor,
+}
+
+impl PageserverClient {
+    /// TODO: this doesn't currently react to changes in the shard map.
+    pub fn new(
+        tenant_id: &str,
+        timeline_id: &str,
+        auth_token: &Option<String>,
+        shard_map: HashMap<Shardno, String>,
+    ) -> Self {
+        Self {
+            _tenant_id: tenant_id.to_string(),
+            _timeline_id: timeline_id.to_string(),
+            _auth_token: auth_token.clone(),
+            shard_map,
+            channels: RwLock::new(HashMap::new()),
+            auth_interceptor: AuthInterceptor::new(tenant_id, timeline_id, auth_token.as_ref()),
+        }
+    }
+
+    pub async fn process_rel_exists_request(
+        &self,
+        request: &RelExistsRequest,
+    ) -> Result<bool, PageserverClientError> {
+        // Current sharding model assumes that all metadata is present only at shard 0.
+        let shard_no = 0;
+
+        let mut client = self.get_client(shard_no).await?;
+
+        let request = proto::RelExistsRequest::from(request);
+        let response = client.rel_exists(tonic::Request::new(request)).await?;
+
+        Ok(response.get_ref().exists)
+    }
+
+    pub async fn process_rel_size_request(
+        &self,
+        request: &RelSizeRequest,
+    ) -> Result<u32, PageserverClientError> {
+        // Current sharding model assumes that all metadata is present only at shard 0.
+        let shard_no = 0;
+
+        let mut client = self.get_client(shard_no).await?;
+
+        let request = proto::RelSizeRequest::from(request);
+        let response = client.rel_size(tonic::Request::new(request)).await?;
+
+        Ok(response.get_ref().num_blocks)
+    }
+
+    pub async fn get_page(&self, request: &GetPageRequest) -> Result<Bytes, PageserverClientError> {
+        // FIXME: calculate the shard number correctly
+        let shard_no = 0;
+
+        let mut client = self.get_client(shard_no).await?;
+
+        let request = proto::GetPageRequest::from(request);
+        let response = client.get_page(tonic::Request::new(request)).await?;
+
+        Ok(response.into_inner().page_image)
+    }
+
+    /// Process a request to get the size of a database.
+    pub async fn process_dbsize_request(
+        &self,
+        request: &DbSizeRequest,
+    ) -> Result<u64, PageserverClientError> {
+        // Current sharding model assumes that all metadata is present only at shard 0.
+        let shard_no = 0;
+
+        let mut client = self.get_client(shard_no).await?;
+
+        let request = proto::DbSizeRequest::from(request);
+        let response = client.db_size(tonic::Request::new(request)).await?;
+
+        Ok(response.get_ref().num_bytes)
+    }
+
+    /// Process a request to get the size of a database.
+    pub async fn get_base_backup(
+        &self,
+        request: &GetBaseBackupRequest,
+        gzip: bool,
+    ) -> std::result::Result<
+        tonic::Response<tonic::codec::Streaming<proto::GetBaseBackupResponseChunk>>,
+        PageserverClientError,
+    > {
+        // Current sharding model assumes that all metadata is present only at shard 0.
+        let shard_no = 0;
+
+        let mut client = self.get_client(shard_no).await?;
+        if gzip {
+            client = client.accept_compressed(tonic::codec::CompressionEncoding::Gzip);
+        }
+
+        let request = proto::GetBaseBackupRequest::from(request);
+        let response = client.get_base_backup(tonic::Request::new(request)).await?;
+
+        Ok(response)
+    }
+
+    /// Get a client for given shard
+    ///
+    /// This implements very basic caching. If we already have a client for the given shard,
+    /// reuse it. If not, create a new client and put it to the cache.
+    async fn get_client(
+        &self,
+        shard_no: u16,
+    ) -> Result<MyPageServiceClient, PageserverClientError> {
+        let reused_channel: Option<Channel> = {
+            let channels = self.channels.read().unwrap();
+
+            channels.get(&shard_no).cloned()
+        };
+
+        let channel = if let Some(reused_channel) = reused_channel {
+            reused_channel
+        } else {
+            let endpoint: tonic::transport::Endpoint = self
+                .shard_map
+                .get(&shard_no)
+                .expect("no url for shard {shard_no}")
+                .parse()?;
+            let channel = endpoint.connect().await?;
+
+            // Insert it to the cache so that it can be reused on subsequent calls. It's possible
+            // that another thread did the same concurrently, in which case we will overwrite the
+            // client in the cache.
+            {
+                let mut channels = self.channels.write().unwrap();
+                channels.insert(shard_no, channel.clone());
+            }
+            channel
+        };
+
+        let client = PageServiceClient::with_interceptor(channel, self.auth_interceptor.clone());
+        Ok(client)
+    }
+}
+
+/// Inject tenant_id, timeline_id and authentication token to all pageserver requests.
+#[derive(Clone)]
+struct AuthInterceptor {
+    tenant_id: AsciiMetadataValue,
+    timeline_id: AsciiMetadataValue,
+
+    auth_token: Option<AsciiMetadataValue>,
+}
+
+impl AuthInterceptor {
+    fn new(tenant_id: &str, timeline_id: &str, auth_token: Option<&String>) -> Self {
+        Self {
+            tenant_id: tenant_id.parse().expect("could not parse tenant id"),
+            timeline_id: timeline_id.parse().expect("could not parse timeline id"),
+            auth_token: auth_token.map(|x| x.parse().expect("could not parse auth token")),
+        }
+    }
+}
+
+impl tonic::service::Interceptor for AuthInterceptor {
+    fn call(&mut self, mut req: tonic::Request<()>) -> Result<tonic::Request<()>, tonic::Status> {
+        req.metadata_mut()
+            .insert("neon-tenant-id", self.tenant_id.clone());
+        req.metadata_mut()
+            .insert("neon-timeline-id", self.timeline_id.clone());
+        if let Some(auth_token) = &self.auth_token {
+            req.metadata_mut()
+                .insert("neon-auth-token", auth_token.clone());
+        }
+
+        Ok(req)
+    }
+}
--- a/pageserver/data_api/Cargo.toml
+++ b/pageserver/data_api/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "pageserver_data_api"
+version = "0.1.0"
+edition = "2024"
+
+[dependencies]
+
+# For Lsn.
+#
+# TODO: move Lsn to separate crate? This draws in a lot more dependencies
+utils.workspace = true
+
+prost.workspace = true
+thiserror.workspace = true
+tonic.workspace = true
+
+[build-dependencies]
+tonic-build.workspace = true
--- a/pageserver/data_api/build.rs
+++ b/pageserver/data_api/build.rs
@@ -0,0 +1,8 @@
+fn main() -> Result<(), Box<dyn std::error::Error>> {
+    // Generate rust code from .proto protobuf.
+    tonic_build::configure()
+        .bytes(&["."])
+        .compile_protos(&["proto/page_service.proto"], &["proto"])
+        .unwrap_or_else(|e| panic!("failed to compile protos {:?}", e));
+    Ok(())
+}
--- a/pageserver/data_api/proto/page_service.proto
+++ b/pageserver/data_api/proto/page_service.proto
@@ -0,0 +1,84 @@
+// Page service presented by pageservers, for computes
+//
+// Each request must come with the following metadata:
+// - neon-tenant-id
+// - neon-timeline-id
+// - neon-auth-token (if auth is enabled)
+//
+// TODO: what else? Priority? OpenTelemetry tracing?
+//
+
+syntax = "proto3";
+package page_service;
+
+service PageService {
+  rpc RelExists(RelExistsRequest) returns (RelExistsResponse);
+
+  // Returns size of a relation, as # of blocks
+  rpc RelSize (RelSizeRequest) returns (RelSizeResponse);
+
+  rpc GetPage (GetPageRequest) returns (GetPageResponse);
+
+  // Returns total size of a database, as # of bytes
+  rpc DbSize (DbSizeRequest) returns (DbSizeResponse);
+
+  rpc GetBaseBackup (GetBaseBackupRequest) returns (stream GetBaseBackupResponseChunk);
+}
+
+message RequestCommon {
+  uint64 request_lsn = 1;
+  uint64 not_modified_since_lsn = 2;
+}
+
+message RelTag {
+    uint32 spc_oid = 1;
+    uint32 db_oid = 2;
+    uint32 rel_number = 3;
+    uint32 fork_number = 4;
+}
+
+message RelExistsRequest {
+  RequestCommon common = 1;
+  RelTag rel = 2;
+}
+
+message RelExistsResponse {
+  bool exists = 1;
+}
+
+message RelSizeRequest {
+  RequestCommon common = 1;
+  RelTag rel = 2;
+}
+
+message RelSizeResponse {
+  uint32 num_blocks = 1;
+}
+
+message GetPageRequest {
+  RequestCommon common = 1;
+  RelTag rel = 2;
+  uint32 block_number = 3;
+}
+
+message GetPageResponse {
+  bytes page_image = 1;
+}
+
+message DbSizeRequest {
+  RequestCommon common = 1;
+  uint32 db_oid = 2;
+}
+
+message DbSizeResponse {
+  uint64 num_bytes = 1;
+}
+
+message GetBaseBackupRequest {
+  RequestCommon common = 1;
+  bool replica = 2;
+}
+
+message GetBaseBackupResponseChunk {
+  bytes chunk = 1;
+}
--- a/pageserver/data_api/src/lib.rs
+++ b/pageserver/data_api/src/lib.rs
@@ -0,0 +1,17 @@
+//! This crate has two modules related to the Pageserver Data API:
+//!
+//! proto: code auto-generated from the protobuf definition
+//! model: slightly more ergonomic structs representing the same API
+//!
+//! See protobuf spec under the protos/ subdirectory.
+//!
+//! This crate is used by both the client and the server. Try to keep it slim.
+//!
+pub mod model;
+
+// Code generated by protobuf.
+pub mod proto {
+    tonic::include_proto!("page_service");
+}
+
+pub use proto::page_service_client as client;
--- a/pageserver/data_api/src/model.rs
+++ b/pageserver/data_api/src/model.rs
@@ -0,0 +1,239 @@
+//! Structs representing the API
+//!
+//! These mirror the pageserver APIs and the structs automatically generated
+//! from the protobuf specification. The differences are:
+//!
+//! - Types that are in fact required by the API are not Options. The protobuf "required"
+//!   attribute is deprecated and 'prost' marks a lot of members as optional because of that.
+//!   (See https://github.com/tokio-rs/prost/issues/800 for a gripe on this)
+//!
+//! - Use more precise datatypes, e.g. Lsn and uints shorter than 32 bits.
+
+use utils::lsn::Lsn;
+
+use crate::proto;
+
+#[derive(Clone, Debug)]
+pub struct RequestCommon {
+    pub request_lsn: Lsn,
+    pub not_modified_since_lsn: Lsn,
+}
+
+#[derive(Clone, Debug, Eq, PartialEq, Hash, PartialOrd, Ord)]
+pub struct RelTag {
+    pub spc_oid: u32,
+    pub db_oid: u32,
+    pub rel_number: u32,
+    pub fork_number: u8,
+}
+
+#[derive(Clone, Debug)]
+pub struct RelExistsRequest {
+    pub common: RequestCommon,
+    pub rel: RelTag,
+}
+
+#[derive(Clone, Debug)]
+pub struct RelSizeRequest {
+    pub common: RequestCommon,
+    pub rel: RelTag,
+}
+
+#[derive(Clone, Debug)]
+pub struct RelSizeResponse {
+    pub num_blocks: u32,
+}
+
+#[derive(Clone, Debug)]
+pub struct GetPageRequest {
+    pub common: RequestCommon,
+    pub rel: RelTag,
+    pub block_number: u32,
+}
+
+#[derive(Clone, Debug)]
+pub struct GetPageResponse {
+    pub page_image: std::vec::Vec<u8>,
+}
+
+#[derive(Clone, Debug)]
+pub struct DbSizeRequest {
+    pub common: RequestCommon,
+    pub db_oid: u32,
+}
+
+#[derive(Clone, Debug)]
+pub struct DbSizeResponse {
+    pub num_bytes: u64,
+}
+
+#[derive(Clone, Debug)]
+pub struct GetBaseBackupRequest {
+    pub common: RequestCommon,
+    pub replica: bool,
+}
+
+//--- Conversions to/from the generated proto types
+
+use thiserror::Error;
+
+#[derive(Error, Debug)]
+pub enum ProtocolError {
+    #[error("the value for field `{0}` is invalid")]
+    InvalidValue(&'static str),
+    #[error("the required field `{0}` is missing ")]
+    Missing(&'static str),
+}
+
+impl From<ProtocolError> for tonic::Status {
+    fn from(e: ProtocolError) -> Self {
+        match e {
+            ProtocolError::InvalidValue(_field) => tonic::Status::invalid_argument(e.to_string()),
+            ProtocolError::Missing(_field) => tonic::Status::invalid_argument(e.to_string()),
+        }
+    }
+}
+
+impl From<&RelTag> for proto::RelTag {
+    fn from(value: &RelTag) -> proto::RelTag {
+        proto::RelTag {
+            spc_oid: value.spc_oid,
+            db_oid: value.db_oid,
+            rel_number: value.rel_number,
+            fork_number: value.fork_number as u32,
+        }
+    }
+}
+impl TryFrom<&proto::RelTag> for RelTag {
+    type Error = ProtocolError;
+
+    fn try_from(value: &proto::RelTag) -> Result<RelTag, ProtocolError> {
+        Ok(RelTag {
+            spc_oid: value.spc_oid,
+            db_oid: value.db_oid,
+            rel_number: value.rel_number,
+            fork_number: value
+                .fork_number
+                .try_into()
+                .or(Err(ProtocolError::InvalidValue("fork_number")))?,
+        })
+    }
+}
+
+impl From<&RequestCommon> for proto::RequestCommon {
+    fn from(value: &RequestCommon) -> proto::RequestCommon {
+        proto::RequestCommon {
+            request_lsn: value.request_lsn.into(),
+            not_modified_since_lsn: value.not_modified_since_lsn.into(),
+        }
+    }
+}
+impl From<&proto::RequestCommon> for RequestCommon {
+    fn from(value: &proto::RequestCommon) -> RequestCommon {
+        RequestCommon {
+            request_lsn: value.request_lsn.into(),
+            not_modified_since_lsn: value.not_modified_since_lsn.into(),
+        }
+    }
+}
+
+impl From<&RelExistsRequest> for proto::RelExistsRequest {
+    fn from(value: &RelExistsRequest) -> proto::RelExistsRequest {
+        proto::RelExistsRequest {
+            common: Some((&value.common).into()),
+            rel: Some((&value.rel).into()),
+        }
+    }
+}
+impl TryFrom<&proto::RelExistsRequest> for RelExistsRequest {
+    type Error = ProtocolError;
+
+    fn try_from(value: &proto::RelExistsRequest) -> Result<RelExistsRequest, ProtocolError> {
+        Ok(RelExistsRequest {
+            common: (&value.common.ok_or(ProtocolError::Missing("common"))?).into(),
+            rel: (&value.rel.ok_or(ProtocolError::Missing("rel"))?).try_into()?,
+        })
+    }
+}
+
+impl From<&RelSizeRequest> for proto::RelSizeRequest {
+    fn from(value: &RelSizeRequest) -> proto::RelSizeRequest {
+        proto::RelSizeRequest {
+            common: Some((&value.common).into()),
+            rel: Some((&value.rel).into()),
+        }
+    }
+}
+impl TryFrom<&proto::RelSizeRequest> for RelSizeRequest {
+    type Error = ProtocolError;
+
+    fn try_from(value: &proto::RelSizeRequest) -> Result<RelSizeRequest, ProtocolError> {
+        Ok(RelSizeRequest {
+            common: (&value.common.ok_or(ProtocolError::Missing("common"))?).into(),
+            rel: (&value.rel.ok_or(ProtocolError::Missing("rel"))?).try_into()?,
+        })
+    }
+}
+
+impl From<&GetPageRequest> for proto::GetPageRequest {
+    fn from(value: &GetPageRequest) -> proto::GetPageRequest {
+        proto::GetPageRequest {
+            common: Some((&value.common).into()),
+            rel: Some((&value.rel).into()),
+            block_number: value.block_number,
+        }
+    }
+}
+impl TryFrom<&proto::GetPageRequest> for GetPageRequest {
+    type Error = ProtocolError;
+
+    fn try_from(value: &proto::GetPageRequest) -> Result<GetPageRequest, ProtocolError> {
+        Ok(GetPageRequest {
+            common: (&value.common.ok_or(ProtocolError::Missing("common"))?).into(),
+            rel: (&value.rel.ok_or(ProtocolError::Missing("rel"))?).try_into()?,
+            block_number: value.block_number,
+        })
+    }
+}
+
+impl From<&DbSizeRequest> for proto::DbSizeRequest {
+    fn from(value: &DbSizeRequest) -> proto::DbSizeRequest {
+        proto::DbSizeRequest {
+            common: Some((&value.common).into()),
+            db_oid: value.db_oid,
+        }
+    }
+}
+
+impl TryFrom<&proto::DbSizeRequest> for DbSizeRequest {
+    type Error = ProtocolError;
+
+    fn try_from(value: &proto::DbSizeRequest) -> Result<DbSizeRequest, ProtocolError> {
+        Ok(DbSizeRequest {
+            common: (&value.common.ok_or(ProtocolError::Missing("common"))?).into(),
+            db_oid: value.db_oid,
+        })
+    }
+}
+
+impl From<&GetBaseBackupRequest> for proto::GetBaseBackupRequest {
+    fn from(value: &GetBaseBackupRequest) -> proto::GetBaseBackupRequest {
+        proto::GetBaseBackupRequest {
+            common: Some((&value.common).into()),
+            replica: value.replica,
+        }
+    }
+}
+
+impl TryFrom<&proto::GetBaseBackupRequest> for GetBaseBackupRequest {
+    type Error = ProtocolError;
+
+    fn try_from(
+        value: &proto::GetBaseBackupRequest,
+    ) -> Result<GetBaseBackupRequest, ProtocolError> {
+        Ok(GetBaseBackupRequest {
+            common: (&value.common.ok_or(ProtocolError::Missing("common"))?).into(),
+            replica: value.replica,
+        })
+    }
+}
--- a/pageserver/pagebench/Cargo.toml
+++ b/pageserver/pagebench/Cargo.toml
@@ -23,6 +23,8 @@ tokio.workspace = true
 tokio-util.workspace = true

 pageserver_client.workspace = true
+pageserver_client_grpc.workspace = true
+pageserver_data_api.workspace = true
 pageserver_api.workspace = true
 utils = { path = "../../libs/utils/" }
 workspace_hack = { version = "0.1", path = "../../workspace_hack" }
--- a/pageserver/pagebench/src/cmd/basebackup.rs
+++ b/pageserver/pagebench/src/cmd/basebackup.rs
@@ -9,6 +9,9 @@ use anyhow::Context;
 use pageserver_api::shard::TenantShardId;
 use pageserver_client::mgmt_api::ForceAwaitLogicalSize;
 use pageserver_client::page_service::BasebackupRequest;
+use pageserver_client_grpc;
+use pageserver_data_api::model::{GetBaseBackupRequest, RequestCommon};
+
 use rand::prelude::*;
 use tokio::sync::Barrier;
 use tokio::task::JoinSet;
@@ -22,6 +25,8 @@ use crate::util::{request_stats, tokio_thread_local_stats};
 /// basebackup@LatestLSN
 #[derive(clap::Parser)]
 pub(crate) struct Args {
+    #[clap(long, default_value = "false")]
+    grpc: bool,
    #[clap(long, default_value = "http://localhost:9898")]
    mgmt_api_endpoint: String,
    #[clap(long, default_value = "postgres://postgres@localhost:64000")]
@@ -52,7 +57,7 @@ impl LiveStats {

 struct Target {
    timeline: TenantTimelineId,
-    lsn_range: Option<Range<Lsn>>,
+    lsn_range: Range<Lsn>,
 }

 #[derive(serde::Serialize)]
@@ -105,7 +110,7 @@ async fn main_impl(
                anyhow::Ok(Target {
                    timeline,
                    // TODO: support lsn_range != latest LSN
-                    lsn_range: Some(info.last_record_lsn..(info.last_record_lsn + 1)),
+                    lsn_range: info.last_record_lsn..(info.last_record_lsn + 1),
                })
            }
        });
@@ -149,14 +154,27 @@ async fn main_impl(
    for tl in &timelines {
        let (sender, receiver) = tokio::sync::mpsc::channel(1); // TODO: not sure what the implications of this are
        work_senders.insert(tl, sender);
-        tasks.push(tokio::spawn(client(
-            args,
-            *tl,
-            Arc::clone(&start_work_barrier),
-            receiver,
-            Arc::clone(&all_work_done_barrier),
-            Arc::clone(&live_stats),
-        )));
+
+        let client_task = if args.grpc {
+            tokio::spawn(client_grpc(
+                args,
+                *tl,
+                Arc::clone(&start_work_barrier),
+                receiver,
+                Arc::clone(&all_work_done_barrier),
+                Arc::clone(&live_stats),
+            ))
+        } else {
+            tokio::spawn(client(
+                args,
+                *tl,
+                Arc::clone(&start_work_barrier),
+                receiver,
+                Arc::clone(&all_work_done_barrier),
+                Arc::clone(&live_stats),
+            ))
+        };
+        tasks.push(client_task);
    }

    let work_sender = async move {
@@ -165,7 +183,7 @@ async fn main_impl(
            let (timeline, work) = {
                let mut rng = rand::thread_rng();
                let target = all_targets.choose(&mut rng).unwrap();
-                let lsn = target.lsn_range.clone().map(|r| rng.gen_range(r));
+                let lsn = rng.gen_range(target.lsn_range.clone());
                (
                    target.timeline,
                    Work {
@@ -215,7 +233,7 @@ async fn main_impl(

 #[derive(Copy, Clone)]
 struct Work {
-    lsn: Option<Lsn>,
+    lsn: Lsn,
    gzip: bool,
 }

@@ -240,7 +258,7 @@ async fn client(
            .basebackup(&BasebackupRequest {
                tenant_id: timeline.tenant_id,
                timeline_id: timeline.timeline_id,
-                lsn,
+                lsn: Some(lsn),
                gzip,
            })
            .await
@@ -270,3 +288,71 @@ async fn client(

    all_work_done_barrier.wait().await;
 }
+
+#[instrument(skip_all)]
+async fn client_grpc(
+    args: &'static Args,
+    timeline: TenantTimelineId,
+    start_work_barrier: Arc<Barrier>,
+    mut work: tokio::sync::mpsc::Receiver<Work>,
+    all_work_done_barrier: Arc<Barrier>,
+    live_stats: Arc<LiveStats>,
+) {
+    let shard_map = HashMap::from([(0, args.page_service_connstring.clone())]);
+    let client = pageserver_client_grpc::PageserverClient::new(
+        &timeline.tenant_id.to_string(),
+        &timeline.timeline_id.to_string(),
+        &None,
+        shard_map,
+    );
+
+    start_work_barrier.wait().await;
+
+    while let Some(Work { lsn, gzip }) = work.recv().await {
+        let start = Instant::now();
+
+        //tokio::time::sleep(std::time::Duration::from_secs(1)).await;
+
+        info!("starting get_base_backup");
+        let mut basebackup_stream = client
+            .get_base_backup(
+                &GetBaseBackupRequest {
+                    common: RequestCommon {
+                        request_lsn: lsn,
+                        not_modified_since_lsn: lsn,
+                    },
+                    replica: false,
+                },
+                gzip,
+            )
+            .await
+            .with_context(|| format!("start basebackup for {timeline}"))
+            .unwrap()
+            .into_inner();
+
+        info!("starting receive");
+        use futures::StreamExt;
+        let mut size = 0;
+        let mut nchunks = 0;
+        while let Some(chunk) = basebackup_stream.next().await {
+            let chunk = chunk
+                .with_context(|| format!("error during basebackup"))
+                .unwrap();
+            size += chunk.chunk.len();
+            nchunks += 1;
+        }
+
+        info!(
+            "basebackup size is {} bytes, avg chunk size {} bytes",
+            size,
+            size as f32 / nchunks as f32
+        );
+        let elapsed = start.elapsed();
+        live_stats.inc();
+        STATS.with(|stats| {
+            stats.borrow().lock().unwrap().observe(elapsed).unwrap();
+        });
+    }
+
+    all_work_done_barrier.wait().await;
+}
--- a/pageserver/pagebench/src/cmd/getpage_latest_lsn.rs
+++ b/pageserver/pagebench/src/cmd/getpage_latest_lsn.rs
@@ -1,4 +1,4 @@
-use std::collections::{HashSet, VecDeque};
+use std::collections::{HashMap, HashSet, VecDeque};
 use std::future::Future;
 use std::num::NonZeroUsize;
 use std::pin::Pin;
@@ -8,6 +8,8 @@ use std::time::{Duration, Instant};

 use anyhow::Context;
 use camino::Utf8PathBuf;
+use futures::StreamExt;
+use futures::stream::FuturesOrdered;
 use pageserver_api::key::Key;
 use pageserver_api::keyspace::KeySpaceAccum;
 use pageserver_api::models::{PagestreamGetPageRequest, PagestreamRequest};
@@ -25,6 +27,8 @@ use crate::util::{request_stats, tokio_thread_local_stats};
 /// GetPage@LatestLSN, uniformly distributed across the compute-accessible keyspace.
 #[derive(clap::Parser)]
 pub(crate) struct Args {
+    #[clap(long, default_value = "false")]
+    grpc: bool,
    #[clap(long, default_value = "http://localhost:9898")]
    mgmt_api_endpoint: String,
    #[clap(long, default_value = "postgres://postgres@localhost:64000")]
@@ -295,7 +299,29 @@ async fn main_impl(
                .unwrap();

        Box::pin(async move {
-            client_libpq(args, worker_id, ss, cancel, rps_period, ranges, weights).await
+            if args.grpc {
+                client_grpc(
+                    args,
+                    worker_id,
+                    ss,
+                    cancel,
+                    rps_period,
+                    ranges,
+                    weights,
+                )
+                .await
+            } else {
+                client_libpq(
+                    args,
+                    worker_id,
+                    ss,
+                    cancel,
+                    rps_period,
+                    ranges,
+                    weights,
+                )
+                .await
+            }
        })
    };

@@ -434,3 +460,100 @@ async fn client_libpq(
        }
    }
 }
+
+async fn client_grpc(
+    args: &Args,
+    worker_id: WorkerId,
+    shared_state: Arc<SharedState>,
+    cancel: CancellationToken,
+    rps_period: Option<Duration>,
+    ranges: Vec<KeyRange>,
+    weights: rand::distributions::weighted::WeightedIndex<i128>,
+) {
+    let shard_map = HashMap::from([(0, args.page_service_connstring.clone())]);
+    let client = pageserver_client_grpc::PageserverClient::new(
+        &worker_id.timeline.tenant_id.to_string(),
+        &worker_id.timeline.timeline_id.to_string(),
+        &None,
+        shard_map,
+    );
+    let client = Arc::new(client);
+
+    shared_state.start_work_barrier.wait().await;
+    let client_start = Instant::now();
+    let mut ticks_processed = 0;
+    let mut inflight = FuturesOrdered::new();
+    while !cancel.is_cancelled() {
+        // Detect if a request took longer than the RPS rate
+        if let Some(period) = &rps_period {
+            let periods_passed_until_now =
+                usize::try_from(client_start.elapsed().as_micros() / period.as_micros()).unwrap();
+
+            if periods_passed_until_now > ticks_processed {
+                shared_state
+                    .live_stats
+                    .missed((periods_passed_until_now - ticks_processed) as u64);
+            }
+            ticks_processed = periods_passed_until_now;
+        }
+
+        while inflight.len() < args.queue_depth.get() {
+            let start = Instant::now();
+            let req = {
+                let mut rng = rand::thread_rng();
+                let r = &ranges[weights.sample(&mut rng)];
+                let key: i128 = rng.gen_range(r.start..r.end);
+                let key = Key::from_i128(key);
+                assert!(key.is_rel_block_key());
+                let (rel_tag, block_no) = key
+                    .to_rel_block()
+                    .expect("we filter non-rel-block keys out above");
+                pageserver_data_api::model::GetPageRequest {
+                    common: pageserver_data_api::model::RequestCommon {
+                        request_lsn: if rng.gen_bool(args.req_latest_probability) {
+                            Lsn::MAX
+                        } else {
+                            r.timeline_lsn
+                        },
+                        not_modified_since_lsn: r.timeline_lsn,
+                    },
+                    rel: pageserver_data_api::model::RelTag {
+                        spc_oid: rel_tag.spcnode,
+                        db_oid: rel_tag.dbnode,
+                        rel_number: rel_tag.relnode,
+                        fork_number: rel_tag.forknum,
+                    },
+                    block_number: block_no,
+                }
+            };
+            let client_clone = client.clone();
+            let getpage_fut = async move {
+                let result = client_clone.get_page(&req).await;
+                (start, result)
+            };
+            inflight.push_back(getpage_fut);
+        }
+
+        let (start, result) = inflight.next().await.unwrap();
+        result.expect("getpage request should succeed");
+        let end = Instant::now();
+        shared_state.live_stats.request_done();
+        ticks_processed += 1;
+        STATS.with(|stats| {
+            stats
+                .borrow()
+                .lock()
+                .unwrap()
+                .observe(end.duration_since(start))
+                .unwrap();
+        });
+
+        if let Some(period) = &rps_period {
+            let next_at = client_start
+                + Duration::from_micros(
+                    (ticks_processed) as u64 * u64::try_from(period.as_micros()).unwrap(),
+                );
+            tokio::time::sleep_until(next_at.into()).await;
+        }
+    }
+}
--- a/pageserver/src/basebackup.rs
+++ b/pageserver/src/basebackup.rs
@@ -151,10 +151,14 @@ where
                .map_err(|_| BasebackupError::Shutdown)?,
        ),
    };
-    basebackup
+    let res = basebackup
        .send_tarball()
        .instrument(info_span!("send_tarball", backup_lsn=%backup_lsn))
-        .await
+        .await;
+
+    info!("basebackup done!");
+
+    res
 }

 /// This is short-living object only for the time of tarball creation,
--- a/pageserver/src/bin/pageserver.rs
+++ b/pageserver/src/bin/pageserver.rs
@@ -16,6 +16,7 @@ use http_utils::tls_certs::ReloadingCertificateResolver;
 use metrics::launch_timestamp::{LaunchTimestamp, set_launch_timestamp_metric};
 use metrics::set_build_info_metric;
 use nix::sys::socket::{setsockopt, sockopt};
+use pageserver::compute_service;
 use pageserver::config::{PageServerConf, PageserverIdentity, ignored_fields};
 use pageserver::controller_upcall_client::StorageControllerUpcallClient;
 use pageserver::deletion_queue::DeletionQueue;
@@ -27,7 +28,7 @@ use pageserver::task_mgr::{
 use pageserver::tenant::{TenantSharedResources, mgr, secondary};
 use pageserver::{
    CancellableTask, ConsumptionMetricsTasks, HttpEndpointListener, HttpsEndpointListener, http,
-    page_cache, page_service, task_mgr, virtual_file,
+    page_cache, task_mgr, virtual_file,
 };
 use postgres_backend::AuthType;
 use remote_storage::GenericRemoteStorage;
@@ -745,7 +746,7 @@ fn start_pageserver(
    // Spawn a task to listen for libpq connections. It will spawn further tasks
    // for each connection. We created the listener earlier already.
    let perf_trace_dispatch = otel_guard.as_ref().map(|g| g.dispatch.clone());
-    let page_service = page_service::spawn(
+    let compute_service = compute_service::spawn(
        conf,
        tenant_manager.clone(),
        pg_auth,
@@ -782,7 +783,7 @@ fn start_pageserver(
        pageserver::shutdown_pageserver(
            http_endpoint_listener,
            https_endpoint_listener,
-            page_service,
+            compute_service,
            consumption_metrics_tasks,
            disk_usage_eviction_task,
            &tenant_manager,
--- a/pageserver/src/compute_service.rs
+++ b/pageserver/src/compute_service.rs
@@ -0,0 +1,286 @@
+//!
+//! The Compute Service listens for compute connections, and serves requests like
+//! the GetPage@LSN requests.
+//!
+//! We support two protocols:
+//!
+//! 1. Legacy, connection-oriented libpq based protocol. That's
+//!    handled by the code in page_service.rs.
+//!
+//! 2. gRPC based protocol. See compute_service_grpc.rs.
+//!
+//! To make the transition smooth, without having to open up new firewall ports
+//! etc, both protocols are served on the same port. When a new TCP connection
+//! is accepted, we peek at the first few bytes incoming from the client to
+//! determine which protocol it speaks.
+//!
+//! TODO: This gets easier once we drop the legacy protocol support. Or if we
+//! open a separate port for them.
+
+use std::sync::Arc;
+
+use anyhow::Context;
+use futures::FutureExt;
+use pageserver_api::config::PageServicePipeliningConfig;
+use postgres_backend::AuthType;
+use tokio::task::JoinHandle;
+use tokio_util::sync::CancellationToken;
+use tracing::*;
+use utils::auth::SwappableJwtAuth;
+use utils::sync::gate::{Gate, GateGuard};
+
+use crate::compute_service_grpc::launch_compute_service_grpc_server;
+use crate::config::PageServerConf;
+use crate::context::{DownloadBehavior, RequestContext, RequestContextBuilder};
+use crate::page_service::libpq_page_service_conn_main;
+use crate::task_mgr::{self, COMPUTE_REQUEST_RUNTIME, TaskKind};
+use crate::tenant::mgr::TenantManager;
+
+///////////////////////////////////////////////////////////////////////////////
+
+pub type ConnectionHandlerResult = anyhow::Result<()>;
+
+pub struct Connections {
+    cancel: CancellationToken,
+    tasks: tokio::task::JoinSet<ConnectionHandlerResult>,
+    gate: Gate,
+}
+
+impl Connections {
+    pub(crate) async fn shutdown(self) {
+        let Self {
+            cancel,
+            mut tasks,
+            gate,
+        } = self;
+        cancel.cancel();
+        while let Some(res) = tasks.join_next().await {
+            Self::handle_connection_completion(res);
+        }
+        gate.close().await;
+    }
+
+    fn handle_connection_completion(res: Result<anyhow::Result<()>, tokio::task::JoinError>) {
+        match res {
+            Ok(Ok(())) => {}
+            Ok(Err(e)) => error!("error in page_service connection task: {:?}", e),
+            Err(e) => error!("page_service connection task panicked: {:?}", e),
+        }
+    }
+}
+
+pub struct Listener {
+    cancel: CancellationToken,
+    /// Cancel the listener task through `listen_cancel` to shut down the listener
+    /// and get a handle on the existing connections.
+    task: JoinHandle<Connections>,
+}
+
+pub fn spawn(
+    conf: &'static PageServerConf,
+    tenant_manager: Arc<TenantManager>,
+    pg_auth: Option<Arc<SwappableJwtAuth>>,
+    perf_trace_dispatch: Option<Dispatch>,
+    tcp_listener: tokio::net::TcpListener,
+    tls_config: Option<Arc<rustls::ServerConfig>>,
+) -> Listener {
+    let cancel = CancellationToken::new();
+    let libpq_ctx = RequestContext::todo_child(
+        TaskKind::LibpqEndpointListener,
+        // listener task shouldn't need to download anything. (We will
+        // create a separate sub-contexts for each connection, with their
+        // own download behavior. This context is used only to listen and
+        // accept connections.)
+        DownloadBehavior::Error,
+    );
+
+    let task = COMPUTE_REQUEST_RUNTIME.spawn(task_mgr::exit_on_panic_or_error(
+        "compute connection listener",
+        compute_connection_listener_main(
+            conf,
+            tenant_manager,
+            pg_auth,
+            perf_trace_dispatch,
+            tcp_listener,
+            conf.pg_auth_type,
+            tls_config,
+            conf.page_service_pipelining.clone(),
+            libpq_ctx,
+            cancel.clone(),
+        )
+        .map(anyhow::Ok),
+    ));
+
+    Listener { cancel, task }
+}
+
+impl Listener {
+    pub async fn stop_accepting(self) -> Connections {
+        self.cancel.cancel();
+        self.task
+            .await
+            .expect("unreachable: we wrap the listener task in task_mgr::exit_on_panic_or_error")
+    }
+}
+
+/// Listener loop. Listens for connections, and launches a new handler
+/// task for each.
+///
+/// Returns Ok(()) upon cancellation via `cancel`, returning the set of
+/// open connections.
+///
+#[allow(clippy::too_many_arguments)]
+pub async fn compute_connection_listener_main(
+    conf: &'static PageServerConf,
+    tenant_manager: Arc<TenantManager>,
+    auth: Option<Arc<SwappableJwtAuth>>,
+    perf_trace_dispatch: Option<Dispatch>,
+    listener: tokio::net::TcpListener,
+    auth_type: AuthType,
+    tls_config: Option<Arc<rustls::ServerConfig>>,
+    pipelining_config: PageServicePipeliningConfig,
+    listener_ctx: RequestContext,
+    listener_cancel: CancellationToken,
+) -> Connections {
+    let connections_cancel = CancellationToken::new();
+    let connections_gate = Gate::default();
+    let mut connection_handler_tasks = tokio::task::JoinSet::default();
+
+    // The connection handling task passes the gRPC protocol
+    // connections to this channel. The tonic gRPC server reads the
+    // channel and takes over the connections from there.
+    let (grpc_connections_tx, grpc_connections_rx) = tokio::sync::mpsc::channel(1000);
+
+    // Set up the gRPC service
+    launch_compute_service_grpc_server(
+        grpc_connections_rx,
+        conf,
+        tenant_manager.clone(),
+        auth.clone(),
+        auth_type,
+        connections_cancel.clone(),
+        &listener_ctx,
+    );
+
+    // Main listener loop
+    loop {
+        let gate_guard = match connections_gate.enter() {
+            Ok(guard) => guard,
+            Err(_) => break,
+        };
+
+        let accepted = tokio::select! {
+            biased;
+            _ = listener_cancel.cancelled() => break,
+            next = connection_handler_tasks.join_next(), if !connection_handler_tasks.is_empty() => {
+                let res = next.expect("we dont poll while empty");
+                Connections::handle_connection_completion(res);
+                continue;
+            }
+            accepted = listener.accept() => accepted,
+        };
+
+        match accepted {
+            Ok((socket, peer_addr)) => {
+                // Connection established. Spawn a new task to handle it.
+                debug!("accepted connection from {}", peer_addr);
+                let local_auth = auth.clone();
+                let connection_ctx = RequestContextBuilder::from(&listener_ctx)
+                    .task_kind(TaskKind::PageRequestHandler)
+                    .download_behavior(DownloadBehavior::Download)
+                    .perf_span_dispatch(perf_trace_dispatch.clone())
+                    .detached_child();
+
+                connection_handler_tasks.spawn(page_service_conn_main(
+                    conf,
+                    tenant_manager.clone(),
+                    local_auth,
+                    socket,
+                    auth_type,
+                    tls_config.clone(),
+                    pipelining_config.clone(),
+                    connection_ctx,
+                    connections_cancel.child_token(),
+                    gate_guard,
+                    grpc_connections_tx.clone(),
+                ));
+            }
+            Err(err) => {
+                // accept() failed. Log the error, and loop back to retry on next connection.
+                error!("accept() failed: {:?}", err);
+            }
+        }
+    }
+
+    debug!("page_service listener loop terminated");
+
+    Connections {
+        cancel: connections_cancel,
+        tasks: connection_handler_tasks,
+        gate: connections_gate,
+    }
+}
+
+/// Handle a new incoming connection.
+///
+/// This peeks at the first few incoming bytes and dispatches the connection
+/// to the legacy libpq handler or the new gRPC handler accordingly.
+#[instrument(skip_all, fields(peer_addr, application_name, compute_mode))]
+#[allow(clippy::too_many_arguments)]
+pub async fn page_service_conn_main(
+    conf: &'static PageServerConf,
+    tenant_manager: Arc<TenantManager>,
+    auth: Option<Arc<SwappableJwtAuth>>,
+    socket: tokio::net::TcpStream,
+    auth_type: AuthType,
+    tls_config: Option<Arc<rustls::ServerConfig>>,
+    pipelining_config: PageServicePipeliningConfig,
+    connection_ctx: RequestContext,
+    cancel: CancellationToken,
+    gate_guard: GateGuard,
+    grpc_connections_tx: tokio::sync::mpsc::Sender<tokio::io::Result<tokio::net::TcpStream>>,
+) -> ConnectionHandlerResult {
+    let mut buf: [u8; 4] = [0; 4];
+
+    socket
+        .set_nodelay(true)
+        .context("could not set TCP_NODELAY")?;
+
+    // Peek
+    socket.peek(&mut buf).await?;
+
+    let mut grpc = false;
+    if buf[0] == 0x16 {
+        // looks like a TLS handshake. Assume gRPC.
+        // XXX: Starting with v17, PostgreSQL also supports "direct TLS mode". But
+        // the compute doesn't use it.
+        grpc = true;
+    }
+
+    if buf[0] == b'G' || buf[0] == b'P' {
+        // Looks like 'GET' or 'POST'
+        // or 'PRI', indicating gRPC over HTTP/2 with prior knowledge
+        grpc = true;
+    }
+
+    // Dispatch
+    if grpc {
+        grpc_connections_tx.send(Ok(socket)).await?;
+        info!("connection sent to channel");
+        Ok(())
+    } else {
+        libpq_page_service_conn_main(
+            conf,
+            tenant_manager,
+            auth,
+            socket,
+            auth_type,
+            tls_config,
+            pipelining_config,
+            connection_ctx,
+            cancel,
+            gate_guard,
+        )
+        .await
+    }
+}
--- a/pageserver/src/compute_service_grpc.rs
+++ b/pageserver/src/compute_service_grpc.rs
@@ -0,0 +1,746 @@
+//!
+//! Compute <-> Pageserver API handler. This is for the new gRPC-based protocol
+//!
+//! TODO:
+//!
+//! - Many of the API endpoints are still missing
+//!
+//! - This is very much not optimized.
+//!
+//! - Much of the code was copy-pasted from page_service.rs. Like the code to get the
+//!   Timeline object, and the JWT auth. Could refactor and share.
+//!
+//!
+
+use std::pin::Pin;
+use std::str::FromStr;
+use std::sync::Arc;
+use std::task::Poll;
+use std::time::Duration;
+use std::time::Instant;
+
+use crate::TenantManager;
+use crate::auth::check_permission;
+use crate::basebackup;
+use crate::basebackup::BasebackupError;
+use crate::config::PageServerConf;
+use crate::context::{DownloadBehavior, RequestContext, RequestContextBuilder};
+use crate::task_mgr::TaskKind;
+use crate::tenant::Timeline;
+use crate::tenant::mgr::ShardResolveResult;
+use crate::tenant::mgr::ShardSelector;
+use crate::tenant::storage_layer::IoConcurrency;
+use crate::tenant::timeline::WaitLsnTimeout;
+use tokio::io::{AsyncWriteExt, ReadHalf, SimplexStream};
+use tokio::task::JoinHandle;
+use tokio_util::codec::{Decoder, FramedRead};
+use tokio_util::sync::CancellationToken;
+
+use futures::stream::StreamExt;
+
+use pageserver_data_api::model;
+use pageserver_data_api::proto::page_service_server::PageService;
+use pageserver_data_api::proto::page_service_server::PageServiceServer;
+
+use anyhow::Context;
+use bytes::BytesMut;
+use jsonwebtoken::TokenData;
+use tracing::Instrument;
+use tracing::{debug, error};
+use utils::auth::SwappableJwtAuth;
+
+use utils::id::{TenantId, TenantTimelineId, TimelineId};
+use utils::lsn::Lsn;
+use utils::simple_rcu::RcuReadGuard;
+
+use crate::tenant::PageReconstructError;
+
+use postgres_ffi::BLCKSZ;
+
+use tonic;
+use tonic::codec::CompressionEncoding;
+use tonic::service::interceptor::InterceptedService;
+
+use pageserver_api::key::rel_block_to_key;
+
+use crate::pgdatadir_mapping::Version;
+use postgres_ffi::pg_constants::DEFAULTTABLESPACE_OID;
+
+use postgres_backend::AuthType;
+
+pub use pageserver_data_api::proto;
+
+pub(super) fn launch_compute_service_grpc_server(
+    tcp_connections_rx: tokio::sync::mpsc::Receiver<tokio::io::Result<tokio::net::TcpStream>>,
+    conf: &'static PageServerConf,
+    tenant_manager: Arc<TenantManager>,
+    auth: Option<Arc<SwappableJwtAuth>>,
+    auth_type: AuthType,
+    connections_cancel: CancellationToken,
+    listener_ctx: &RequestContext,
+) {
+    // Set up the gRPC service
+    let service_ctx = RequestContextBuilder::from(listener_ctx)
+        .task_kind(TaskKind::PageRequestHandler)
+        .download_behavior(DownloadBehavior::Download)
+        .attached_child();
+    let service = crate::compute_service_grpc::PageServiceService {
+        conf,
+        tenant_mgr: tenant_manager.clone(),
+        ctx: Arc::new(service_ctx),
+    };
+    let authenticator = PageServiceAuthenticator {
+        auth: auth.clone(),
+        auth_type,
+    };
+
+    let server = InterceptedService::new(
+        PageServiceServer::new(service).send_compressed(CompressionEncoding::Gzip),
+        authenticator,
+    );
+
+    let cc = connections_cancel.clone();
+    tokio::spawn(async move {
+        tonic::transport::Server::builder()
+            .add_service(server)
+            .serve_with_incoming_shutdown(
+                tokio_stream::wrappers::ReceiverStream::new(tcp_connections_rx),
+                cc.cancelled(),
+            )
+            .await
+    });
+}
+
+struct PageServiceService {
+    conf: &'static PageServerConf,
+    tenant_mgr: Arc<TenantManager>,
+    ctx: Arc<RequestContext>,
+}
+
+/// An error happened in a get() operation.
+impl From<PageReconstructError> for tonic::Status {
+    fn from(e: PageReconstructError) -> Self {
+        match e {
+            PageReconstructError::Other(err) => tonic::Status::unknown(err.to_string()),
+            PageReconstructError::AncestorLsnTimeout(_) => {
+                tonic::Status::unavailable(e.to_string())
+            }
+            PageReconstructError::Cancelled => tonic::Status::aborted(e.to_string()),
+            PageReconstructError::WalRedo(_) => tonic::Status::internal(e.to_string()),
+            PageReconstructError::MissingKey(_) => tonic::Status::internal(e.to_string()),
+        }
+    }
+}
+
+fn convert_reltag(value: &model::RelTag) -> pageserver_api::reltag::RelTag {
+    pageserver_api::reltag::RelTag {
+        spcnode: value.spc_oid,
+        dbnode: value.db_oid,
+        relnode: value.rel_number,
+        forknum: value.fork_number,
+    }
+}
+
+#[tonic::async_trait]
+impl PageService for PageServiceService {
+    type GetBaseBackupStream = GetBaseBackupStream;
+
+    async fn rel_exists(
+        &self,
+        request: tonic::Request<proto::RelExistsRequest>,
+    ) -> std::result::Result<tonic::Response<proto::RelExistsResponse>, tonic::Status> {
+        let ttid = self.extract_ttid(request.metadata())?;
+        let req: model::RelExistsRequest = request.get_ref().try_into()?;
+
+        let rel = convert_reltag(&req.rel);
+        let span = tracing::info_span!("rel_exists", tenant_id = %ttid.tenant_id, timeline_id = %ttid.timeline_id, rel = %rel, req_lsn = %req.common.request_lsn);
+
+        async {
+            let timeline = self.get_timeline(ttid, ShardSelector::Zero).await?;
+            let ctx = self.ctx.with_scope_timeline(&timeline);
+            let latest_gc_cutoff_lsn = timeline.get_applied_gc_cutoff_lsn();
+            let lsn = Self::wait_or_get_last_lsn(
+                &timeline,
+                req.common.request_lsn,
+                req.common.not_modified_since_lsn,
+                &latest_gc_cutoff_lsn,
+                &ctx,
+            )
+            .await?;
+
+            let exists = timeline
+                .get_rel_exists(rel, Version::Lsn(lsn), &ctx)
+                .await?;
+
+            Ok(tonic::Response::new(proto::RelExistsResponse { exists }))
+        }
+        .instrument(span)
+        .await
+    }
+
+    /// Returns size of a relation, as # of blocks
+    async fn rel_size(
+        &self,
+        request: tonic::Request<proto::RelSizeRequest>,
+    ) -> std::result::Result<tonic::Response<proto::RelSizeResponse>, tonic::Status> {
+        let ttid = self.extract_ttid(request.metadata())?;
+        let req: model::RelSizeRequest = request.get_ref().try_into()?;
+        let rel = convert_reltag(&req.rel);
+
+        let span = tracing::info_span!("rel_size", tenant_id = %ttid.tenant_id, timeline_id = %ttid.timeline_id, rel = %rel, req_lsn = %req.common.request_lsn);
+
+        async {
+            let timeline = self.get_timeline(ttid, ShardSelector::Zero).await?;
+            let ctx = self.ctx.with_scope_timeline(&timeline);
+            let latest_gc_cutoff_lsn = timeline.get_applied_gc_cutoff_lsn();
+            let lsn = Self::wait_or_get_last_lsn(
+                &timeline,
+                req.common.request_lsn,
+                req.common.not_modified_since_lsn,
+                &latest_gc_cutoff_lsn,
+                &ctx,
+            )
+            .await?;
+
+            let num_blocks = timeline.get_rel_size(rel, Version::Lsn(lsn), &ctx).await?;
+
+            Ok(tonic::Response::new(proto::RelSizeResponse { num_blocks }))
+        }
+        .instrument(span)
+        .await
+    }
+
+    async fn get_page(
+        &self,
+        request: tonic::Request<proto::GetPageRequest>,
+    ) -> std::result::Result<tonic::Response<proto::GetPageResponse>, tonic::Status> {
+        let ttid = self.extract_ttid(request.metadata())?;
+        let req: model::GetPageRequest = request.get_ref().try_into()?;
+
+        // Calculate shard number.
+        //
+        // FIXME: this should probably be part of the data_api crate.
+        let rel = convert_reltag(&req.rel);
+        let key = rel_block_to_key(rel, req.block_number);
+        let timeline = self.get_timeline(ttid, ShardSelector::Page(key)).await?;
+
+        let ctx = self.ctx.with_scope_timeline(&timeline);
+        let latest_gc_cutoff_lsn = timeline.get_applied_gc_cutoff_lsn();
+        let lsn = Self::wait_or_get_last_lsn(
+            &timeline,
+            req.common.request_lsn,
+            req.common.not_modified_since_lsn,
+            &latest_gc_cutoff_lsn,
+            &ctx,
+        )
+        .await?;
+
+        let shard_id = timeline.tenant_shard_id.shard_number;
+        let span = tracing::info_span!("get_page", tenant_id = %ttid.tenant_id, shard_id = %shard_id, timeline_id = %ttid.timeline_id, rel = %rel, block_number = %req.block_number, req_lsn = %req.common.request_lsn);
+
+        async {
+            let gate_guard = match timeline.gate.enter() {
+                Ok(guard) => guard,
+                Err(_) => {
+                    return Err(tonic::Status::unavailable("timeline is shutting down"));
+                }
+            };
+
+            let io_concurrency = IoConcurrency::spawn_from_conf(self.conf, gate_guard);
+
+            let page_image = timeline
+                .get_rel_page_at_lsn(
+                    rel,
+                    req.block_number,
+                    Version::Lsn(lsn),
+                    &ctx,
+                    io_concurrency,
+                )
+                .await?;
+
+            Ok(tonic::Response::new(proto::GetPageResponse {
+                page_image: page_image,
+            }))
+        }
+        .instrument(span)
+        .await
+    }
+
+    async fn db_size(
+        &self,
+        request: tonic::Request<proto::DbSizeRequest>,
+    ) -> Result<tonic::Response<proto::DbSizeResponse>, tonic::Status> {
+        let ttid = self.extract_ttid(request.metadata())?;
+        let req: model::DbSizeRequest = request.get_ref().try_into()?;
+
+        let span = tracing::info_span!("get_page", tenant_id = %ttid.tenant_id, timeline_id = %ttid.timeline_id, db_oid = %req.db_oid, req_lsn = %req.common.request_lsn);
+
+        async {
+            let timeline = self.get_timeline(ttid, ShardSelector::Zero).await?;
+            let ctx = self.ctx.with_scope_timeline(&timeline);
+            let latest_gc_cutoff_lsn = timeline.get_applied_gc_cutoff_lsn();
+            let lsn = Self::wait_or_get_last_lsn(
+                &timeline,
+                req.common.request_lsn,
+                req.common.not_modified_since_lsn,
+                &latest_gc_cutoff_lsn,
+                &ctx,
+            )
+            .await?;
+
+            let total_blocks = timeline
+                .get_db_size(DEFAULTTABLESPACE_OID, req.db_oid, Version::Lsn(lsn), &ctx)
+                .await?;
+
+            Ok(tonic::Response::new(proto::DbSizeResponse {
+                num_bytes: total_blocks as u64 * BLCKSZ as u64,
+            }))
+        }
+        .instrument(span)
+        .await
+    }
+
+    async fn get_base_backup(
+        &self,
+        request: tonic::Request<proto::GetBaseBackupRequest>,
+    ) -> Result<tonic::Response<Self::GetBaseBackupStream>, tonic::Status> {
+        let ttid = self.extract_ttid(request.metadata())?;
+        let req: model::GetBaseBackupRequest = request.get_ref().try_into()?;
+
+        let timeline = self.get_timeline(ttid, ShardSelector::Zero).await?;
+
+        let ctx = self.ctx.with_scope_timeline(&timeline);
+        let latest_gc_cutoff_lsn = timeline.get_applied_gc_cutoff_lsn();
+        let lsn = Self::wait_or_get_last_lsn(
+            &timeline,
+            req.common.request_lsn,
+            req.common.not_modified_since_lsn,
+            &latest_gc_cutoff_lsn,
+            &ctx,
+        )
+        .await?;
+
+        let span = tracing::info_span!("get_base_backup", tenant_id = %ttid.tenant_id, timeline_id = %ttid.timeline_id, req_lsn = %req.common.request_lsn);
+
+        tracing::info!("starting basebackup");
+
+        #[allow(dead_code)]
+        enum TestMode {
+            /// Create real basebackup, in streaming fashion
+            Streaming,
+            /// Create real basebackup, but fully materialize it in the 'simplex' pipe buffer first
+            Materialize,
+            /// Create a dummy all-zeros basebackup, in streaming fashion
+            DummyStreaming,
+            /// Create a dummy all-zeros basebackup, but fully materialize it first
+            DummyMaterialize,
+        }
+        let mode = TestMode::Streaming;
+
+        let buf_size = match mode {
+            TestMode::Streaming | TestMode::DummyStreaming => 64 * 1024,
+            TestMode::Materialize | TestMode::DummyMaterialize => 64 * 1024 * 1024,
+        };
+
+        let (simplex_read, mut simplex_write) = tokio::io::simplex(buf_size);
+
+        let basebackup_task = match mode {
+            TestMode::DummyStreaming => {
+                tokio::spawn(
+                    async move {
+                        // hold onto the guard for as long as the basebackup runs
+                        let _latest_gc_cutoff_lsn = latest_gc_cutoff_lsn;
+
+                        let zerosbuf: [u8; 1024] = [0; 1024];
+                        let nbytes = 16900000;
+                        let mut bytes_written = 0;
+                        while bytes_written < nbytes {
+                            let s = std::cmp::min(1024, nbytes - bytes_written);
+                            let _ = simplex_write.write_all(&zerosbuf[0..s]).await;
+                            bytes_written += s;
+                        }
+                        simplex_write
+                            .shutdown()
+                            .await
+                            .context("shutdown of basebackup pipe")?;
+
+                        Ok(())
+                    }
+                    .instrument(span),
+                )
+            }
+            TestMode::DummyMaterialize => {
+                let zerosbuf: [u8; 1024] = [0; 1024];
+                let nbytes = 16900000;
+                let mut bytes_written = 0;
+                while bytes_written < nbytes {
+                    let s = std::cmp::min(1024, nbytes - bytes_written);
+                    let _ = simplex_write.write_all(&zerosbuf[0..s]).await;
+                    bytes_written += s;
+                }
+                simplex_write
+                    .shutdown()
+                    .await
+                    .expect("shutdown of basebackup pipe");
+                tracing::info!("basebackup (dummy) materialized");
+                let result = Ok(());
+
+                tokio::spawn(std::future::ready(result))
+            }
+            TestMode::Materialize => {
+                let result = basebackup::send_basebackup_tarball(
+                    &mut simplex_write,
+                    &timeline,
+                    Some(lsn),
+                    None,
+                    false,
+                    req.replica,
+                    &ctx,
+                )
+                .await;
+                simplex_write
+                    .shutdown()
+                    .await
+                    .expect("shutdown of basebackup pipe");
+                tracing::info!("basebackup materialized");
+
+                // Launch a task that writes the basebackup tarball to the simplex pipe
+                tokio::spawn(std::future::ready(result))
+            }
+            TestMode::Streaming => {
+                tokio::spawn(
+                    async move {
+                        // hold onto the guard for as long as the basebackup runs
+                        let _latest_gc_cutoff_lsn = latest_gc_cutoff_lsn;
+
+                        let result = basebackup::send_basebackup_tarball(
+                            &mut simplex_write,
+                            &timeline,
+                            Some(lsn),
+                            None,
+                            false,
+                            req.replica,
+                            &ctx,
+                        )
+                        .await;
+                        simplex_write
+                            .shutdown()
+                            .await
+                            .context("shutdown of basebackup pipe")?;
+                        result
+                    }
+                    .instrument(span),
+                )
+            }
+        };
+
+        let response = new_basebackup_response_stream(simplex_read, basebackup_task);
+
+        Ok(tonic::Response::new(response))
+    }
+}
+
+/// NB: this is a different value than [`crate::http::routes::ACTIVE_TENANT_TIMEOUT`].
+/// NB: and also different from page_service::ACTIVE_TENANT_TIMEOUT
+const ACTIVE_TENANT_TIMEOUT: Duration = Duration::from_millis(30000);
+
+impl PageServiceService {
+    async fn get_timeline(
+        &self,
+        ttid: TenantTimelineId,
+        shard_selector: ShardSelector,
+    ) -> Result<Arc<Timeline>, tonic::Status> {
+        let timeout = ACTIVE_TENANT_TIMEOUT;
+        let wait_start = Instant::now();
+        let deadline = wait_start + timeout;
+
+        let tenant_shard = loop {
+            let resolved = self
+                .tenant_mgr
+                .resolve_attached_shard(&ttid.tenant_id, shard_selector);
+
+            match resolved {
+                ShardResolveResult::Found(tenant_shard) => break tenant_shard,
+                ShardResolveResult::NotFound => {
+                    return Err(tonic::Status::not_found("tenant not found"));
+                }
+                ShardResolveResult::InProgress(barrier) => {
+                    // We can't authoritatively answer right now: wait for InProgress state
+                    // to end, then try again
+                    tokio::select! {
+                        _  = barrier.wait() => {
+                            // The barrier completed: proceed around the loop to try looking up again
+                        },
+                        _ = tokio::time::sleep(deadline.duration_since(Instant::now())) => {
+                            return Err(tonic::Status::unavailable("tenant is in InProgress state"));
+                        }
+                    }
+                }
+            }
+        };
+
+        tracing::debug!("Waiting for tenant to enter active state...");
+        tenant_shard
+            .wait_to_become_active(deadline.duration_since(Instant::now()))
+            .await
+            .map_err(|e| {
+                tonic::Status::unavailable(format!("tenant is not in active state: {e}"))
+            })?;
+
+        let timeline = tenant_shard
+            .get_timeline(ttid.timeline_id, true)
+            .map_err(|e| tonic::Status::unavailable(format!("could not get timeline: {e}")))?;
+
+        // FIXME: need to do something with the 'gate' here?
+
+        Ok(timeline)
+    }
+
+    /// Extract TenantTimelineId from the request metadata
+    ///
+    /// Note: the interceptor has already authenticated the request
+    ///
+    /// TOOD: Could we use "binary" metadata for these, for efficiency? gRPC has such a concept
+    fn extract_ttid(
+        &self,
+        metadata: &tonic::metadata::MetadataMap,
+    ) -> Result<TenantTimelineId, tonic::Status> {
+        let tenant_id = metadata
+            .get("neon-tenant-id")
+            .ok_or(tonic::Status::invalid_argument(
+                "neon-tenant-id metadata missing",
+            ))?;
+        let tenant_id = tenant_id.to_str().map_err(|_| {
+            tonic::Status::invalid_argument("invalid UTF-8 characters in neon-tenant-id metadata")
+        })?;
+        let tenant_id = TenantId::from_str(tenant_id)
+            .map_err(|_| tonic::Status::invalid_argument("invalid neon-tenant-id metadata"))?;
+
+        let timeline_id =
+            metadata
+                .get("neon-timeline-id")
+                .ok_or(tonic::Status::invalid_argument(
+                    "neon-timeline-id metadata missing",
+                ))?;
+        let timeline_id = timeline_id.to_str().map_err(|_| {
+            tonic::Status::invalid_argument("invalid UTF-8 characters in neon-timeline-id metadata")
+        })?;
+        let timeline_id = TimelineId::from_str(timeline_id)
+            .map_err(|_| tonic::Status::invalid_argument("invalid neon-timelineid metadata"))?;
+
+        Ok(TenantTimelineId::new(tenant_id, timeline_id))
+    }
+
+    // XXX: copied from PageServerHandler
+    async fn wait_or_get_last_lsn(
+        timeline: &Timeline,
+        request_lsn: Lsn,
+        not_modified_since: Lsn,
+        latest_gc_cutoff_lsn: &RcuReadGuard<Lsn>,
+        ctx: &RequestContext,
+    ) -> Result<Lsn, tonic::Status> {
+        let last_record_lsn = timeline.get_last_record_lsn();
+
+        // Sanity check the request
+        if request_lsn < not_modified_since {
+            return Err(tonic::Status::invalid_argument(format!(
+                "invalid request with request LSN {} and not_modified_since {}",
+                request_lsn, not_modified_since,
+            )));
+        }
+
+        // Check explicitly for INVALID just to get a less scary error message if the request is obviously bogus
+        if request_lsn == Lsn::INVALID {
+            return Err(tonic::Status::invalid_argument("invalid LSN(0) in request"));
+        }
+
+        // Clients should only read from recent LSNs on their timeline, or from locations holding an LSN lease.
+        //
+        // We may have older data available, but we make a best effort to detect this case and return an error,
+        // to distinguish a misbehaving client (asking for old LSN) from a storage issue (data missing at a legitimate LSN).
+        if request_lsn < **latest_gc_cutoff_lsn && !timeline.is_gc_blocked_by_lsn_lease_deadline() {
+            let gc_info = &timeline.gc_info.read().unwrap();
+            if !gc_info.lsn_covered_by_lease(request_lsn) {
+                return Err(tonic::Status::not_found(format!(
+                    "tried to request a page version that was garbage collected. requested at {} gc cutoff {}",
+                    request_lsn, **latest_gc_cutoff_lsn
+                )));
+            }
+        }
+
+        // Wait for WAL up to 'not_modified_since' to arrive, if necessary
+        if not_modified_since > last_record_lsn {
+            timeline
+                .wait_lsn(
+                    not_modified_since,
+                    crate::tenant::timeline::WaitLsnWaiter::PageService,
+                    WaitLsnTimeout::Default,
+                    ctx,
+                )
+                .await
+                .map_err(|_| {
+                    tonic::Status::unavailable("not_modified_since LSN not arrived yet")
+                })?;
+            // Since we waited for 'not_modified_since' to arrive, that is now the last
+            // record LSN. (Or close enough for our purposes; the last-record LSN can
+            // advance immediately after we return anyway)
+            Ok(not_modified_since)
+        } else {
+            // It might be better to use max(not_modified_since, latest_gc_cutoff_lsn)
+            // here instead. That would give the same result, since we know that there
+            // haven't been any modifications since 'not_modified_since'. Using an older
+            // LSN might be faster, because that could allow skipping recent layers when
+            // finding the page. However, we have historically used 'last_record_lsn', so
+            // stick to that for now.
+            Ok(std::cmp::min(last_record_lsn, request_lsn))
+        }
+    }
+}
+
+#[derive(Clone)]
+pub struct PageServiceAuthenticator {
+    pub auth: Option<Arc<SwappableJwtAuth>>,
+    pub auth_type: AuthType,
+}
+
+impl tonic::service::Interceptor for PageServiceAuthenticator {
+    fn call(
+        &mut self,
+        req: tonic::Request<()>,
+    ) -> std::result::Result<tonic::Request<()>, tonic::Status> {
+        // Check the tenant_id in any case
+        let tenant_id =
+            req.metadata()
+                .get("neon-tenant-id")
+                .ok_or(tonic::Status::invalid_argument(
+                    "neon-tenant-id metadata missing",
+                ))?;
+        let tenant_id = tenant_id.to_str().map_err(|_| {
+            tonic::Status::invalid_argument("invalid UTF-8 characters in neon-tenant-id metadata")
+        })?;
+        let tenant_id = TenantId::from_str(tenant_id)
+            .map_err(|_| tonic::Status::invalid_argument("invalid neon-tenant-id metadata"))?;
+
+        // when accessing management api supply None as an argument
+        // when using to authorize tenant pass corresponding tenant id
+        let auth = if let Some(auth) = &self.auth {
+            auth
+        } else {
+            // auth is set to Trust, nothing to check so just return ok
+            return Ok(req);
+        };
+
+        let jwt = req
+            .metadata()
+            .get("neon-auth-token")
+            .ok_or(tonic::Status::unauthenticated("no neon-auth-token"))?;
+        let jwt = jwt.to_str().map_err(|_| {
+            tonic::Status::invalid_argument("invalid UTF-8 characters in neon-auth-token metadata")
+        })?;
+
+        let jwtdata: TokenData<utils::auth::Claims> = auth
+            .decode(jwt)
+            .map_err(|err| tonic::Status::unauthenticated(format!("invalid JWT token: {}", err)))?;
+        let claims = jwtdata.claims;
+
+        if matches!(claims.scope, utils::auth::Scope::Tenant) && claims.tenant_id.is_none() {
+            return Err(tonic::Status::unauthenticated(
+                "jwt token scope is Tenant, but tenant id is missing",
+            ));
+        }
+
+        debug!(
+            "jwt scope check succeeded for scope: {:#?} by tenant id: {:?}",
+            claims.scope, claims.tenant_id,
+        );
+
+        // The token is valid. Check if it's allowed to access the tenant ID
+        // given in the request.
+
+        check_permission(&claims, Some(tenant_id))
+            .map_err(|err| tonic::Status::permission_denied(err.to_string()))?;
+
+        // All checks out
+        Ok(req)
+    }
+}
+
+/// Stream of GetBaseBackupResponseChunk messages.
+///
+/// The first part of the Chain chunks the tarball. The second part checks the return value
+/// of the send_basebackup_tarball Future that created the tarball.
+
+type GetBaseBackupStream = futures::stream::Chain<BasebackupChunkedStream, CheckResultStream>;
+
+fn new_basebackup_response_stream(
+    simplex_read: ReadHalf<SimplexStream>,
+    basebackup_task: JoinHandle<Result<(), BasebackupError>>,
+) -> GetBaseBackupStream {
+    let framed = FramedRead::new(simplex_read, GetBaseBackupResponseDecoder {});
+
+    framed.chain(CheckResultStream { basebackup_task })
+}
+
+/// Stream that uses GetBaseBackupResponseDecoder
+type BasebackupChunkedStream =
+    tokio_util::codec::FramedRead<ReadHalf<SimplexStream>, GetBaseBackupResponseDecoder>;
+
+struct GetBaseBackupResponseDecoder;
+impl Decoder for GetBaseBackupResponseDecoder {
+    type Item = proto::GetBaseBackupResponseChunk;
+    type Error = tonic::Status;
+
+    fn decode(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
+        if src.len() < 64 * 1024 {
+            return Ok(None);
+        }
+
+        let item = proto::GetBaseBackupResponseChunk {
+            chunk: bytes::Bytes::from(std::mem::take(src)),
+        };
+
+        Ok(Some(item))
+    }
+
+    fn decode_eof(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
+        if src.is_empty() {
+            return Ok(None);
+        }
+
+        let item = proto::GetBaseBackupResponseChunk {
+            chunk: bytes::Bytes::from(std::mem::take(src)),
+        };
+
+        Ok(Some(item))
+    }
+}
+
+struct CheckResultStream {
+    basebackup_task: tokio::task::JoinHandle<Result<(), BasebackupError>>,
+}
+impl futures::Stream for CheckResultStream {
+    type Item = Result<proto::GetBaseBackupResponseChunk, tonic::Status>;
+
+    fn poll_next(
+        mut self: Pin<&mut Self>,
+        ctx: &mut std::task::Context<'_>,
+    ) -> Poll<Option<Self::Item>> {
+        let task = Pin::new(&mut self.basebackup_task);
+        match task.poll(ctx) {
+            Poll::Pending => Poll::Pending,
+            Poll::Ready(Ok(Ok(()))) => Poll::Ready(None),
+            Poll::Ready(Ok(Err(basebackup_err))) => {
+                error!(error=%basebackup_err, "error getting basebackup");
+                Poll::Ready(Some(Err(tonic::Status::internal(
+                    "could not get basebackup",
+                ))))
+            }
+            Poll::Ready(Err(join_err)) => {
+                error!(error=%join_err, "JoinError getting basebackup");
+                Poll::Ready(Some(Err(tonic::Status::internal(
+                    "could not get basebackup",
+                ))))
+            }
+        }
+    }
+}
--- a/pageserver/src/lib.rs
+++ b/pageserver/src/lib.rs
@@ -21,6 +21,8 @@ pub use pageserver_api::keyspace;
 use tokio_util::sync::CancellationToken;
 mod assert_u64_eq_usize;
 pub mod aux_file;
+pub mod compute_service;
+pub mod compute_service_grpc;
 pub mod metrics;
 pub mod page_cache;
 pub mod page_service;
@@ -82,7 +84,7 @@ impl CancellableTask {
 pub async fn shutdown_pageserver(
    http_listener: HttpEndpointListener,
    https_listener: Option<HttpsEndpointListener>,
-    page_service: page_service::Listener,
+    compute_service: compute_service::Listener,
    consumption_metrics_worker: ConsumptionMetricsTasks,
    disk_usage_eviction_task: Option<DiskUsageEvictionTask>,
    tenant_manager: &TenantManager,
@@ -167,11 +169,11 @@ pub async fn shutdown_pageserver(
        }
    });

-    // Shut down the libpq endpoint task. This prevents new connections from
+    // Shut down the compute service endpoint task. This prevents new connections from
    // being accepted.
    let remaining_connections = timed(
-        page_service.stop_accepting(),
-        "shutdown LibpqEndpointListener",
+        compute_service.stop_accepting(),
+        "shutdown compte service listener",
        Duration::from_secs(1),
    )
    .await;
--- a/pageserver/src/metrics.rs
+++ b/pageserver/src/metrics.rs
@@ -1774,8 +1774,12 @@ static SMGR_QUERY_STARTED_PER_TENANT_TIMELINE: Lazy<IntCounterVec> = Lazy::new(|
    .expect("failed to define a metric")
 });

-// Alias so all histograms recording per-timeline smgr timings use the same buckets.
-static SMGR_QUERY_TIME_PER_TENANT_TIMELINE_BUCKETS: &[f64] = CRITICAL_OP_BUCKETS;
+/// Per-timeline smgr histogram buckets should be the same as the compute buckets, such that the
+/// metrics are comparable across compute and Pageserver. See also:
+/// <https://github.com/neondatabase/neon/blob/1a87975d956a8ad17ec8b85da32a137ec4893fcc/pgxn/neon/neon_perf_counters.h#L18-L27>
+/// <https://github.com/neondatabase/flux-fleet/blob/556182a939edda87ff1d85a6b02e5cec901e0e9e/apps/base/compute-metrics/scrape-compute-sql-exporter.yaml#L29-L35>
+static SMGR_QUERY_TIME_PER_TENANT_TIMELINE_BUCKETS: &[f64] =
+    &[0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.1, 1.0, 3.0];

 static SMGR_QUERY_TIME_PER_TENANT_TIMELINE: Lazy<HistogramVec> = Lazy::new(|| {
    register_histogram_vec!(
--- a/pageserver/src/page_service.rs
+++ b/pageserver/src/page_service.rs
@@ -13,7 +13,6 @@ use crate::PERF_TRACE_TARGET;
 use anyhow::{Context, bail};
 use async_compression::tokio::write::GzipEncoder;
 use bytes::Buf;
-use futures::FutureExt;
 use itertools::Itertools;
 use jsonwebtoken::TokenData;
 use once_cell::sync::OnceCell;
@@ -40,7 +39,6 @@ use pq_proto::framed::ConnectionError;
 use pq_proto::{BeMessage, FeMessage, FeStartupPacket, RowDescriptor};
 use strum_macros::IntoStaticStr;
 use tokio::io::{AsyncRead, AsyncWrite, AsyncWriteExt, BufWriter};
-use tokio::task::JoinHandle;
 use tokio_util::sync::CancellationToken;
 use tracing::*;
 use utils::auth::{Claims, Scope, SwappableJwtAuth};
@@ -49,15 +47,13 @@ use utils::id::{TenantId, TimelineId};
 use utils::logging::log_slow;
 use utils::lsn::Lsn;
 use utils::simple_rcu::RcuReadGuard;
-use utils::sync::gate::{Gate, GateGuard};
+use utils::sync::gate::GateGuard;
 use utils::sync::spsc_fold;

 use crate::auth::check_permission;
 use crate::basebackup::BasebackupError;
 use crate::config::PageServerConf;
-use crate::context::{
-    DownloadBehavior, PerfInstrumentFutureExt, RequestContext, RequestContextBuilder,
-};
+use crate::context::{PerfInstrumentFutureExt, RequestContext, RequestContextBuilder};
 use crate::metrics::{
    self, COMPUTE_COMMANDS_COUNTERS, ComputeCommandKind, GetPageBatchBreakReason, LIVE_CONNECTIONS,
    SmgrOpTimer, TimelineMetrics,
@@ -67,7 +63,6 @@ use crate::span::{
    debug_assert_current_span_has_tenant_and_timeline_id,
    debug_assert_current_span_has_tenant_and_timeline_id_no_shard_id,
 };
-use crate::task_mgr::{self, COMPUTE_REQUEST_RUNTIME, TaskKind};
 use crate::tenant::mgr::{
    GetActiveTenantError, GetTenantError, ShardResolveResult, ShardSelector, TenantManager,
 };
@@ -85,171 +80,6 @@ const ACTIVE_TENANT_TIMEOUT: Duration = Duration::from_millis(30000);
 /// Threshold at which to log slow GetPage requests.
 const LOG_SLOW_GETPAGE_THRESHOLD: Duration = Duration::from_secs(30);

-///////////////////////////////////////////////////////////////////////////////
-
-pub struct Listener {
-    cancel: CancellationToken,
-    /// Cancel the listener task through `listen_cancel` to shut down the listener
-    /// and get a handle on the existing connections.
-    task: JoinHandle<Connections>,
-}
-
-pub struct Connections {
-    cancel: CancellationToken,
-    tasks: tokio::task::JoinSet<ConnectionHandlerResult>,
-    gate: Gate,
-}
-
-pub fn spawn(
-    conf: &'static PageServerConf,
-    tenant_manager: Arc<TenantManager>,
-    pg_auth: Option<Arc<SwappableJwtAuth>>,
-    perf_trace_dispatch: Option<Dispatch>,
-    tcp_listener: tokio::net::TcpListener,
-    tls_config: Option<Arc<rustls::ServerConfig>>,
-) -> Listener {
-    let cancel = CancellationToken::new();
-    let libpq_ctx = RequestContext::todo_child(
-        TaskKind::LibpqEndpointListener,
-        // listener task shouldn't need to download anything. (We will
-        // create a separate sub-contexts for each connection, with their
-        // own download behavior. This context is used only to listen and
-        // accept connections.)
-        DownloadBehavior::Error,
-    );
-    let task = COMPUTE_REQUEST_RUNTIME.spawn(task_mgr::exit_on_panic_or_error(
-        "libpq listener",
-        libpq_listener_main(
-            conf,
-            tenant_manager,
-            pg_auth,
-            perf_trace_dispatch,
-            tcp_listener,
-            conf.pg_auth_type,
-            tls_config,
-            conf.page_service_pipelining.clone(),
-            libpq_ctx,
-            cancel.clone(),
-        )
-        .map(anyhow::Ok),
-    ));
-
-    Listener { cancel, task }
-}
-
-impl Listener {
-    pub async fn stop_accepting(self) -> Connections {
-        self.cancel.cancel();
-        self.task
-            .await
-            .expect("unreachable: we wrap the listener task in task_mgr::exit_on_panic_or_error")
-    }
-}
-impl Connections {
-    pub(crate) async fn shutdown(self) {
-        let Self {
-            cancel,
-            mut tasks,
-            gate,
-        } = self;
-        cancel.cancel();
-        while let Some(res) = tasks.join_next().await {
-            Self::handle_connection_completion(res);
-        }
-        gate.close().await;
-    }
-
-    fn handle_connection_completion(res: Result<anyhow::Result<()>, tokio::task::JoinError>) {
-        match res {
-            Ok(Ok(())) => {}
-            Ok(Err(e)) => error!("error in page_service connection task: {:?}", e),
-            Err(e) => error!("page_service connection task panicked: {:?}", e),
-        }
-    }
-}
-
-///
-/// Main loop of the page service.
-///
-/// Listens for connections, and launches a new handler task for each.
-///
-/// Returns Ok(()) upon cancellation via `cancel`, returning the set of
-/// open connections.
-///
-#[allow(clippy::too_many_arguments)]
-pub async fn libpq_listener_main(
-    conf: &'static PageServerConf,
-    tenant_manager: Arc<TenantManager>,
-    auth: Option<Arc<SwappableJwtAuth>>,
-    perf_trace_dispatch: Option<Dispatch>,
-    listener: tokio::net::TcpListener,
-    auth_type: AuthType,
-    tls_config: Option<Arc<rustls::ServerConfig>>,
-    pipelining_config: PageServicePipeliningConfig,
-    listener_ctx: RequestContext,
-    listener_cancel: CancellationToken,
-) -> Connections {
-    let connections_cancel = CancellationToken::new();
-    let connections_gate = Gate::default();
-    let mut connection_handler_tasks = tokio::task::JoinSet::default();
-
-    loop {
-        let gate_guard = match connections_gate.enter() {
-            Ok(guard) => guard,
-            Err(_) => break,
-        };
-
-        let accepted = tokio::select! {
-            biased;
-            _ = listener_cancel.cancelled() => break,
-            next = connection_handler_tasks.join_next(), if !connection_handler_tasks.is_empty() => {
-                let res = next.expect("we dont poll while empty");
-                Connections::handle_connection_completion(res);
-                continue;
-            }
-            accepted = listener.accept() => accepted,
-        };
-
-        match accepted {
-            Ok((socket, peer_addr)) => {
-                // Connection established. Spawn a new task to handle it.
-                debug!("accepted connection from {}", peer_addr);
-                let local_auth = auth.clone();
-                let connection_ctx = RequestContextBuilder::from(&listener_ctx)
-                    .task_kind(TaskKind::PageRequestHandler)
-                    .download_behavior(DownloadBehavior::Download)
-                    .perf_span_dispatch(perf_trace_dispatch.clone())
-                    .detached_child();
-
-                connection_handler_tasks.spawn(page_service_conn_main(
-                    conf,
-                    tenant_manager.clone(),
-                    local_auth,
-                    socket,
-                    auth_type,
-                    tls_config.clone(),
-                    pipelining_config.clone(),
-                    connection_ctx,
-                    connections_cancel.child_token(),
-                    gate_guard,
-                ));
-            }
-            Err(err) => {
-                // accept() failed. Log the error, and loop back to retry on next connection.
-                error!("accept() failed: {:?}", err);
-            }
-        }
-    }
-
-    debug!("page_service listener loop terminated");
-
-    Connections {
-        cancel: connections_cancel,
-        tasks: connection_handler_tasks,
-        gate: connections_gate,
-    }
-}
-
 type ConnectionHandlerResult = anyhow::Result<()>;

 /// Perf root spans start at the per-request level, after shard routing.
@@ -261,9 +91,10 @@ struct ConnectionPerfSpanFields {
    compute_mode: Option<String>,
 }

+/// note: the caller has already set TCP_NODELAY on the socket
 #[instrument(skip_all, fields(peer_addr, application_name, compute_mode))]
 #[allow(clippy::too_many_arguments)]
-async fn page_service_conn_main(
+pub async fn libpq_page_service_conn_main(
    conf: &'static PageServerConf,
    tenant_manager: Arc<TenantManager>,
    auth: Option<Arc<SwappableJwtAuth>>,
@@ -279,10 +110,6 @@ async fn page_service_conn_main(
        .with_label_values(&["page_service"])
        .guard();

-    socket
-        .set_nodelay(true)
-        .context("could not set TCP_NODELAY")?;
-
    let socket_fd = socket.as_raw_fd();

    let peer_addr = socket.peer_addr().context("get peer address")?;
@@ -393,7 +220,7 @@ struct PageServerHandler {
    gate_guard: GateGuard,
 }

-struct TimelineHandles {
+pub struct TimelineHandles {
    wrapper: TenantManagerWrapper,
    /// Note on size: the typical size of this map is 1.  The largest size we expect
    /// to see is the number of shards divided by the number of pageservers (typically < 2),
--- a/pgxn/neon/Makefile
+++ b/pgxn/neon/Makefile
@@ -1,10 +1,10 @@
 # pgxs/neon/Makefile

-
 MODULE_big = neon
 OBJS = \
 	$(WIN32RES) \
 	communicator.o \
+	communicator_new.o \
 	extension_server.o \
 	file_cache.o \
 	hll.o \
@@ -22,7 +22,8 @@ OBJS = \
 	walproposer.o \
 	walproposer_pg.o \
 	control_plane_connector.o \
-	walsender_hooks.o
+	walsender_hooks.o \
+	$(LIBCOMMUNICATOR_PATH)/libcommunicator.a

 PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK_INTERNAL = $(libpq)
--- a/pgxn/neon/communicator.c
+++ b/pgxn/neon/communicator.c
@@ -88,9 +88,6 @@ typedef PGAlignedBlock PGIOAlignedBlock;

 page_server_api *page_server;

-static uint32 local_request_counter;
-#define GENERATE_REQUEST_ID() (((NeonRequestId)MyProcPid << 32) | ++local_request_counter)
-
 /*
 * Various settings related to prompt (fast) handling of PageStream responses
 * at any CHECK_FOR_INTERRUPTS point.
@@ -788,6 +785,27 @@ prefetch_read(PrefetchRequest *slot)
 	}
 }

+
+/*
+ * Wait completion of previosly registered prefetch request.
+ * Prefetch result should be placed in LFC by prefetch_wait_for.
+ */
+bool
+communicator_prefetch_receive(BufferTag tag)
+{
+	PrfHashEntry *entry;
+	PrefetchRequest hashkey;
+
+	hashkey.buftag = tag;
+	entry = prfh_lookup(MyPState->prf_hash, &hashkey);
+	if (entry != NULL && prefetch_wait_for(entry->slot->my_ring_index))
+	{
+		prefetch_set_unused(entry->slot->my_ring_index);
+		return true;
+	}
+	return false;
+}
+
 /*
 * Disconnect hook - drop prefetches when the connection drops
 *
@@ -906,7 +924,6 @@ prefetch_do_request(PrefetchRequest *slot, neon_request_lsns *force_request_lsns

 	NeonGetPageRequest request = {
 		.hdr.tag = T_NeonGetPageRequest,
-		.hdr.reqid = GENERATE_REQUEST_ID(),
 		/* lsn and not_modified_since are filled in below */
 		.rinfo = BufTagGetNRelFileInfo(slot->buftag),
 		.forknum = slot->buftag.forkNum,
@@ -915,8 +932,6 @@ prefetch_do_request(PrefetchRequest *slot, neon_request_lsns *force_request_lsns

 	Assert(mySlotNo == MyPState->ring_unused);

-	slot->reqid = request.hdr.reqid;
-
 	if (force_request_lsns)
 		slot->request_lsns = *force_request_lsns;
 	else
@@ -934,6 +949,7 @@ prefetch_do_request(PrefetchRequest *slot, neon_request_lsns *force_request_lsns
 		Assert(mySlotNo == MyPState->ring_unused);
 		/* loop */
 	}
+	slot->reqid = request.hdr.reqid;

 	/* update prefetch state */
 	MyPState->n_requests_inflight += 1;
@@ -1937,7 +1953,6 @@ communicator_exists(NRelFileInfo rinfo, ForkNumber forkNum, neon_request_lsns *r
 	{
 		NeonExistsRequest request = {
 			.hdr.tag = T_NeonExistsRequest,
-			.hdr.reqid = GENERATE_REQUEST_ID(),
 			.hdr.lsn = request_lsns->request_lsn,
 			.hdr.not_modified_since = request_lsns->not_modified_since,
 			.rinfo = rinfo,
@@ -2212,7 +2227,6 @@ communicator_nblocks(NRelFileInfo rinfo, ForkNumber forknum, neon_request_lsns *
 	{
 		NeonNblocksRequest request = {
 			.hdr.tag = T_NeonNblocksRequest,
-			.hdr.reqid = GENERATE_REQUEST_ID(),
 			.hdr.lsn = request_lsns->request_lsn,
 			.hdr.not_modified_since = request_lsns->not_modified_since,
 			.rinfo = rinfo,
@@ -2285,7 +2299,6 @@ communicator_dbsize(Oid dbNode, neon_request_lsns *request_lsns)
 	{
 		NeonDbSizeRequest request = {
 			.hdr.tag = T_NeonDbSizeRequest,
-			.hdr.reqid = GENERATE_REQUEST_ID(),
 			.hdr.lsn = request_lsns->request_lsn,
 			.hdr.not_modified_since = request_lsns->not_modified_since,
 			.dbNode = dbNode,
@@ -2353,7 +2366,6 @@ communicator_read_slru_segment(SlruKind kind, int64 segno, neon_request_lsns *re

 	request = (NeonGetSlruSegmentRequest) {
 		.hdr.tag = T_NeonGetSlruSegmentRequest,
-		.hdr.reqid = GENERATE_REQUEST_ID(),
 		.hdr.lsn = request_lsns->request_lsn,
 		.hdr.not_modified_since = request_lsns->not_modified_since,
 		.kind = kind,
--- a/pgxn/neon/communicator.h
+++ b/pgxn/neon/communicator.h
@@ -37,6 +37,8 @@ extern int communicator_prefetch_lookupv(NRelFileInfo rinfo, ForkNumber forknum,
 										 BlockNumber nblocks, void **buffers, bits8 *mask);
 extern void communicator_prefetch_register_bufferv(BufferTag tag, neon_request_lsns *frlsns,
 												   BlockNumber nblocks, const bits8 *mask);
+extern bool communicator_prefetch_receive(BufferTag tag);
+
 extern int communicator_read_slru_segment(SlruKind kind, int64 segno,
 										  neon_request_lsns *request_lsns,
 										  void *buffer);
--- a/pgxn/neon/communicator/Cargo.lock
+++ b/pgxn/neon/communicator/Cargo.lock
@@ -0,0 +1,372 @@
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 4
+
+[[package]]
+name = "addr2line"
+version = "0.24.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dfbe277e56a376000877090da837660b4427aad530e3028d44e0bffe4f89a1c1"
+dependencies = [
+ "gimli",
+]
+
+[[package]]
+name = "adler2"
+version = "2.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "512761e0bb2578dd7380c6baaa0f4ce03e84f95e960231d1dec8bf4d7d6e2627"
+
+[[package]]
+name = "backtrace"
+version = "0.3.74"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8d82cb332cdfaed17ae235a638438ac4d4839913cc2af585c3c6746e8f8bee1a"
+dependencies = [
+ "addr2line",
+ "cfg-if",
+ "libc",
+ "miniz_oxide",
+ "object",
+ "rustc-demangle",
+ "windows-targets",
+]
+
+[[package]]
+name = "base64"
+version = "0.22.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6"
+
+[[package]]
+name = "bytes"
+version = "1.10.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a"
+
+[[package]]
+name = "cfg-if"
+version = "1.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
+
+[[package]]
+name = "communicator"
+version = "0.1.0"
+dependencies = [
+ "tonic",
+]
+
+[[package]]
+name = "fnv"
+version = "1.0.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
+
+[[package]]
+name = "futures-core"
+version = "0.3.31"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "05f29059c0c2090612e8d742178b0580d2dc940c837851ad723096f87af6663e"
+
+[[package]]
+name = "gimli"
+version = "0.31.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "07e28edb80900c19c28f1072f2e8aeca7fa06b23cd4169cefe1af5aa3260783f"
+
+[[package]]
+name = "http"
+version = "1.3.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f4a85d31aea989eead29a3aaf9e1115a180df8282431156e533de47660892565"
+dependencies = [
+ "bytes",
+ "fnv",
+ "itoa",
+]
+
+[[package]]
+name = "http-body"
+version = "1.0.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
+dependencies = [
+ "bytes",
+ "http",
+]
+
+[[package]]
+name = "http-body-util"
+version = "0.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
+dependencies = [
+ "bytes",
+ "futures-core",
+ "http",
+ "http-body",
+ "pin-project-lite",
+]
+
+[[package]]
+name = "itoa"
+version = "1.0.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4a5f13b858c8d314ee3e8f639011f7ccefe71f97f96e50151fb991f267928e2c"
+
+[[package]]
+name = "libc"
+version = "0.2.171"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c19937216e9d3aa9956d9bb8dfc0b0c8beb6058fc4f7a4dc4d850edf86a237d6"
+
+[[package]]
+name = "memchr"
+version = "2.7.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "78ca9ab1a0babb1e7d5695e3530886289c18cf2f87ec19a575a0abdce112e3a3"
+
+[[package]]
+name = "miniz_oxide"
+version = "0.8.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ff70ce3e48ae43fa075863cef62e8b43b71a4f2382229920e0df362592919430"
+dependencies = [
+ "adler2",
+]
+
+[[package]]
+name = "object"
+version = "0.36.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "62948e14d923ea95ea2c7c86c71013138b66525b86bdc08d2dcc262bdb497b87"
+dependencies = [
+ "memchr",
+]
+
+[[package]]
+name = "once_cell"
+version = "1.21.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d"
+
+[[package]]
+name = "percent-encoding"
+version = "2.3.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e3148f5046208a5d56bcfc03053e3ca6334e51da8dfb19b6cdc8b306fae3283e"
+
+[[package]]
+name = "pin-project"
+version = "1.1.10"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "677f1add503faace112b9f1373e43e9e054bfdd22ff1a63c1bc485eaec6a6a8a"
+dependencies = [
+ "pin-project-internal",
+]
+
+[[package]]
+name = "pin-project-internal"
+version = "1.1.10"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6e918e4ff8c4549eb882f14b3a4bc8c8bc93de829416eacf579f1207a8fbf861"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "pin-project-lite"
+version = "0.2.16"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3b3cff922bd51709b605d9ead9aa71031d81447142d828eb4a6eba76fe619f9b"
+
+[[package]]
+name = "proc-macro2"
+version = "1.0.94"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a31971752e70b8b2686d7e46ec17fb38dad4051d94024c88df49b667caea9c84"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "quote"
+version = "1.0.40"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1885c039570dc00dcb4ff087a89e185fd56bae234ddc7f056a945bf36467248d"
+dependencies = [
+ "proc-macro2",
+]
+
+[[package]]
+name = "rustc-demangle"
+version = "0.1.24"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "719b953e2095829ee67db738b3bfa9fa368c94900df327b3f07fe6e794d2fe1f"
+
+[[package]]
+name = "syn"
+version = "2.0.100"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b09a44accad81e1ba1cd74a32461ba89dee89095ba17b32f5d03683b1b1fc2a0"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "unicode-ident",
+]
+
+[[package]]
+name = "tokio"
+version = "1.44.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e6b88822cbe49de4185e3a4cbf8321dd487cf5fe0c5c65695fef6346371e9c48"
+dependencies = [
+ "backtrace",
+ "pin-project-lite",
+]
+
+[[package]]
+name = "tokio-stream"
+version = "0.1.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "eca58d7bba4a75707817a2c44174253f9236b2d5fbd055602e9d5c07c139a047"
+dependencies = [
+ "futures-core",
+ "pin-project-lite",
+ "tokio",
+]
+
+[[package]]
+name = "tonic"
+version = "0.13.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "85839f0b32fd242bb3209262371d07feda6d780d16ee9d2bc88581b89da1549b"
+dependencies = [
+ "base64",
+ "bytes",
+ "http",
+ "http-body",
+ "http-body-util",
+ "percent-encoding",
+ "pin-project",
+ "tokio-stream",
+ "tower-layer",
+ "tower-service",
+ "tracing",
+]
+
+[[package]]
+name = "tower-layer"
+version = "0.3.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "121c2a6cda46980bb0fcd1647ffaf6cd3fc79a013de288782836f6df9c48780e"
+
+[[package]]
+name = "tower-service"
+version = "0.3.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8df9b6e13f2d32c91b9bd719c00d1958837bc7dec474d94952798cc8e69eeec3"
+
+[[package]]
+name = "tracing"
+version = "0.1.41"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "784e0ac535deb450455cbfa28a6f0df145ea1bb7ae51b821cf5e7927fdcfbdd0"
+dependencies = [
+ "pin-project-lite",
+ "tracing-attributes",
+ "tracing-core",
+]
+
+[[package]]
+name = "tracing-attributes"
+version = "0.1.28"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "395ae124c09f9e6918a2310af6038fba074bcf474ac352496d5910dd59a2226d"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "tracing-core"
+version = "0.1.33"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e672c95779cf947c5311f83787af4fa8fffd12fb27e4993211a84bdfd9610f9c"
+dependencies = [
+ "once_cell",
+]
+
+[[package]]
+name = "unicode-ident"
+version = "1.0.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5a5f39404a5da50712a4c1eecf25e90dd62b613502b7e925fd4e4d19b5c96512"
+
+[[package]]
+name = "windows-targets"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9b724f72796e036ab90c1021d4780d4d3d648aca59e491e6b98e725b84e99973"
+dependencies = [
+ "windows_aarch64_gnullvm",
+ "windows_aarch64_msvc",
+ "windows_i686_gnu",
+ "windows_i686_gnullvm",
+ "windows_i686_msvc",
+ "windows_x86_64_gnu",
+ "windows_x86_64_gnullvm",
+ "windows_x86_64_msvc",
+]
+
+[[package]]
+name = "windows_aarch64_gnullvm"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "32a4622180e7a0ec044bb555404c800bc9fd9ec262ec147edd5989ccd0c02cd3"
+
+[[package]]
+name = "windows_aarch64_msvc"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "09ec2a7bb152e2252b53fa7803150007879548bc709c039df7627cabbd05d469"
+
+[[package]]
+name = "windows_i686_gnu"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8e9b5ad5ab802e97eb8e295ac6720e509ee4c243f69d781394014ebfe8bbfa0b"
+
+[[package]]
+name = "windows_i686_gnullvm"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0eee52d38c090b3caa76c563b86c3a4bd71ef1a819287c19d586d7334ae8ed66"
+
+[[package]]
+name = "windows_i686_msvc"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "240948bc05c5e7c6dabba28bf89d89ffce3e303022809e73deaefe4f6ec56c66"
+
+[[package]]
+name = "windows_x86_64_gnu"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "147a5c80aabfbf0c7d901cb5895d1de30ef2907eb21fbbab29ca94c5b08b1a78"
+
+[[package]]
+name = "windows_x86_64_gnullvm"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "24d5b23dc417412679681396f2b49f3de8c1473deb516bd34410872eff51ed0d"
+
+[[package]]
+name = "windows_x86_64_msvc"
+version = "0.52.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "589f6da84c646204747d1270a2a5661ea66ed1cced2631d546fdfb155959f9ec"
--- a/pgxn/neon/communicator/Cargo.toml
+++ b/pgxn/neon/communicator/Cargo.toml
@@ -0,0 +1,35 @@
+[package]
+name = "communicator"
+version = "0.1.0"
+edition = "2024"
+
+[lib]
+crate-type = ["staticlib"]
+
+[dependencies]
+bytes.workspace = true
+http.workspace = true
+libc.workspace = true
+nix.workspace = true
+atomic_enum = "0.3.0"
+prost.workspace = true
+tonic = { version = "0.12.0", default-features = false, features=["codegen", "prost", "transport"] }
+tokio = { version = "1.43.1", features = ["macros", "net", "io-util", "rt", "rt-multi-thread"] }
+tokio-pipe = { version = "0.2.12" }
+thiserror.workspace = true
+tracing.workspace = true
+tracing-subscriber.workspace = true
+zerocopy = "0.8.0"
+zerocopy-derive = "0.8.0"
+
+tokio-epoll-uring.workspace = true
+uring-common.workspace = true
+
+pageserver_client_grpc.workspace = true
+pageserver_data_api.workspace = true
+
+neonart.workspace = true
+utils.workspace = true
+
+[build-dependencies]
+cbindgen.workspace = true
--- a/pgxn/neon/communicator/README.md
+++ b/pgxn/neon/communicator/README.md
@@ -0,0 +1,123 @@
+# Communicator
+
+This package provides the so-called "compute-pageserver communicator",
+or just "communicator" in short. It runs in a PostgreSQL server, as
+part of the neon extension, and handles the communication with the
+pageservers. On the PostgreSQL side, the glue code in pgxn/neon/ uses
+the communicator to implement the PostgreSQL Storage Manager (SMGR)
+interface.
+
+## Design criteria
+
+- Low latency
+- Saturate a 10 Gbit / s network interface without becoming a bottleneck
+
+## Source code view
+
+pgxn/neon/communicator_new.c
+	Contains the glue that interact with PostgreSQL code and the Rust
+	communicator code.
+
+pgxn/neon/communicator/src/backend_interface.rs
+	The entry point for calls from each backend.
+
+pgxn/neon/communicator/src/init.rs
+	Initialization at server startup
+
+pgxn/neon/communicator/src/worker_process/
+    Worker process main loop and glue code
+
+At compilation time, pgxn/neon/communicator/ produces a static
+library, libcommunicator.a. It is linked to the neon.so extension
+library.
+
+The real networking code, which is independent of PostgreSQL, is in
+the pageserver/client_grpc crate.
+
+## Process view
+
+The communicator runs in a dedicated background worker process, the
+"communicator process". The communicator uses a multi-threaded Tokio
+runtime to execute the IO requests. So the communicator process has
+multiple threads running. That's unusual for Postgres processes and
+care must be taken to make that work.
+
+### Backend <-> worker communication
+
+Each backend has a number of I/O request slots in shared memory. The
+slots are statically allocated for each backend, and must not be
+accessed by other backends. The worker process reads requests from the
+shared memory slots, and writes responses back to the slots.
+
+To submit an IO request, first pick one of your backend's free slots,
+and write the details of the IO request in the slot. Finally, update
+the 'state' field of the slot to Submitted. That informs the worker
+process that it can start processing the request. Once the state has
+been set to Submitted, the backend *must not* access the slot anymore,
+until the worker process sets its state to 'Completed'. In other
+words, each slot is owned by either the backend or the worker process
+at all times, and the 'state' field indicates who has ownership at the
+moment.
+
+To inform the worker process that a request slot has a pending IO
+request, there's a pipe shared by the worker process and all backend
+processes. After you have changed the slot's state to Submitted, write
+the index of the request slot to the pipe. This wakes up the worker
+process.
+
+(Note that the pipe is just used for wakeups, but the worker process
+is free to pick up Submitted IO requests even without receiving the
+wakeup. As of this writing, it doesn't do that, but it might be useful
+in the future to reduce latency even further, for example.)
+
+When the worker process has completed processing the request, it
+writes the result back in the request slot. A GetPage request can also
+contain a pointer to buffer in the shared buffer cache. In that case,
+the worker process writes the resulting page contents directly to the
+buffer, and just a result code in the request slot. It then updates
+the 'state' field to Completed, which passes the owner ship back to
+the originating backend. Finally, it signals the process Latch of the
+originating backend, waking it up.
+
+### Differences between PostgreSQL v16, v17 and v18
+
+PostgreSQL v18 introduced the new AIO mechanism. The PostgreSQL AIO
+mechanism uses a very similar mechanism as described in the previous
+section, for the communication between AIO worker processes and
+backends. With our communicator, the AIO worker processes are not
+used, but we use the same PgAioHandle request slots as in upstream.
+For Neon-specific IO requests like GetDbSize, a neon request slot is
+used. But for the actual IO requests, the request slot merely contains
+a pointer to the PgAioHandle slot. The worker process updates the
+status of that, calls the IO callbacks upon completionetc, just like
+the upstream AIO worker processes do.
+
+## Sequence diagram
+
+                      neon
+    PostgreSQL     extension       backend_interface.rs  worker_process.rs    processor    tonic
+       |               .                    .                   .                 .
+	   | smgr_read()   .                    .                   .                 .
+	   +-------------> +                    .                   .                 .
+	   .               |                    .                   .                 .
+	   .               |  rcommunicator_    .                   .                 .
+	   .               | get_page_at_lsn    .                   .                 .
+	   .               +------------------> +                   .                 .
+                                            |                   .                 .
+                                            | write request to  .                 .                 .
+                                            | slot              .                 .
+                                            |                   .                 .
+                                            |                   .                 .
+											| submit_request()  .                 .
+											+-----------------> +                 .
+											|                   |                 .
+											|					| db_size_request .               .
+																+---------------->.
+																                  . TODO
+
+
+
+### Compute <-> pageserver protocol
+
+The protocol between Compute and the pageserver is based on gRPC. See `protos/`.
+
--- a/pgxn/neon/communicator/build.rs
+++ b/pgxn/neon/communicator/build.rs
@@ -0,0 +1,24 @@
+use cbindgen;
+
+use std::env;
+
+fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();
+
+    cbindgen::generate(crate_dir).map_or_else(
+        |error| match error {
+            cbindgen::Error::ParseSyntaxError { .. } => {
+                // This means there was a syntax error in the Rust sources. Don't panic, because
+                // we want the build to continue and the Rust compiler to hit the error. The
+                // Rust compiler produces a better error message than cbindgen.
+                eprintln!("Generating C bindings failed because of a Rust syntax error");
+            }
+            e => panic!("Unable to generate C bindings: {:?}", e),
+        },
+        |bindings| {
+            bindings.write_to_file("communicator_bindings.h");
+        },
+    );
+
+    Ok(())
+}
--- a/pgxn/neon/communicator/cbindgen.toml
+++ b/pgxn/neon/communicator/cbindgen.toml
@@ -0,0 +1,4 @@
+language = "C"
+
+[enum]
+prefix_with_name = true
--- a/pgxn/neon/communicator/src/backend_comms.rs
+++ b/pgxn/neon/communicator/src/backend_comms.rs
@@ -0,0 +1,204 @@
+//! This module implements a request/response "slot" for submitting requests from backends
+//! to the communicator process.
+//!
+//! NB: The "backend" side of this code runs in Postgres backend processes,
+//! which means that it is not safe to use the 'tracing' crate for logging, nor
+//! to launch threads or use tokio tasks.
+use std::cell::UnsafeCell;
+use std::sync::atomic::fence;
+use std::sync::atomic::{AtomicI32, Ordering};
+
+use crate::neon_request::{NeonIORequest, NeonIOResult};
+
+use atomic_enum::atomic_enum;
+
+/// One request/response slot. Each backend has its own set of slots that it uses.
+///
+/// This is the moral equivalent of PgAioHandle for Postgres AIO requests
+/// Like PgAioHandle, try to keep this small.
+///
+/// There is an array of these in shared memory. Therefore, this must be Sized.
+///
+/// ## Lifecycle of a request
+///
+/// The slot is always owned by either the backend process or the communicator
+/// process, depending on the 'state'. Only the owning process is allowed to
+/// read or modify the slot, except for reading the 'state' itself to check who
+/// owns it.
+///
+/// A slot begins in the Idle state, where it is owned by the backend process.
+/// To submit a request, the backend process fills the slot with the request
+/// data, and changes it to the Submitted state. After changing the state, the
+/// slot is owned by the communicator process, and the backend is not allowed
+/// to access it until the communicator process marks it as Completed.
+///
+/// When the communicator process sees that the slot is in Submitted state, it
+/// starts to process the request. After processing the request, it stores the
+/// result in the slot, and changes the state to Completed. It is now owned by
+/// the backend process again, which may now read the result, and reuse the
+/// slot for a new request.
+///
+/// For correctness of the above protocol, we really only need two states:
+/// "owned by backend" and "owned by communicator process. But to help with
+/// debugging, there are a few more states. When the backend starts to fill in
+/// the request details in the slot, it first sets the state from Idle to
+/// Filling, and when it's done with that, from Filling to Submitted. In the
+/// Filling state, the slot is still owned by the backend. Similarly, when the
+/// communicator process starts to process a request, it sets it to Processing
+/// state first, but the slot is still owned by the communicator process.
+///
+/// This struct doesn't handle waking up the communicator process when a request
+/// has been submitted or when a response is ready. We only store the 'owner_procno'
+/// which can be used for waking up the backend on completion, but the wakeups are
+/// performed elsewhere.
+pub struct NeonIOHandle {
+    /// similar to PgAioHandleState
+    state: AtomicNeonIOHandleState,
+
+    /// The owning process's ProcNumber. The worker process uses this to set the process's
+    /// latch on completion.
+    ///
+    /// (This could be calculated from num_neon_request_slots_per_backend and the index of
+    /// this slot in the overall 'neon_requst_slots array')
+    owner_procno: AtomicI32,
+
+    /// SAFETY: This is modified by fill_request(), after it has established ownership
+    /// of the slot by setting state from Idle to Filling
+    request: UnsafeCell<NeonIORequest>,
+
+    /// valid when state is Completed
+    ///
+    /// SAFETY: This is modified by RequestProcessingGuard::complete(). There can be
+    /// only one RequestProcessingGuard outstanding for a slot at a time, because
+    /// it is returned by start_processing_request() which checks the state, so
+    /// RequestProcessingGuard has exclusive access to the slot.
+    result: UnsafeCell<NeonIOResult>,
+}
+
+// The protocol described in the "Lifecycle of a request" section above ensures
+// the safe access to the fields
+unsafe impl Send for NeonIOHandle {}
+unsafe impl Sync for NeonIOHandle {}
+
+impl Default for NeonIOHandle {
+    fn default() -> NeonIOHandle {
+        NeonIOHandle {
+            owner_procno: AtomicI32::new(-1),
+            request: UnsafeCell::new(NeonIORequest::Empty),
+            result: UnsafeCell::new(NeonIOResult::Empty),
+            state: AtomicNeonIOHandleState::new(NeonIOHandleState::Idle),
+        }
+    }
+}
+
+#[atomic_enum]
+#[derive(Eq, PartialEq)]
+pub enum NeonIOHandleState {
+    Idle,
+
+    /// backend is filling in the request
+    Filling,
+
+    /// Backend has submitted the request to the communicator, but the
+    /// communicator process has not yet started processing it.
+    Submitted,
+
+    /// Communicator is processing the request
+    Processing,
+
+    /// Communicator has completed the request, and the 'result' field is now
+    /// valid, but the backend has not read the result yet.
+    Completed,
+}
+
+pub struct RequestProcessingGuard<'a>(&'a NeonIOHandle);
+
+unsafe impl<'a> Send for RequestProcessingGuard<'a> {}
+unsafe impl<'a> Sync for RequestProcessingGuard<'a> {}
+
+impl<'a> RequestProcessingGuard<'a> {
+    pub fn get_request(&self) -> &NeonIORequest {
+        unsafe { &*self.0.request.get() }
+    }
+
+    pub fn get_owner_procno(&self) -> i32 {
+        self.0.owner_procno.load(Ordering::Relaxed)
+    }
+
+    pub fn completed(self, result: NeonIOResult) {
+        unsafe {
+            *self.0.result.get() = result;
+        };
+
+        // Ok, we have completed the IO. Mark the request as completed. After that,
+        // we no longer have ownership of the slot, and must not modify it.
+        let old_state = self
+            .0
+            .state
+            .swap(NeonIOHandleState::Completed, Ordering::Release);
+        assert!(old_state == NeonIOHandleState::Processing);
+    }
+}
+
+impl NeonIOHandle {
+    pub fn fill_request(&self, request: &NeonIORequest, proc_number: i32) {
+        // Verify that the slot is in Idle state previously, and start filling it.
+        //
+        // XXX: This step isn't strictly necessary. Assuming the caller didn't screw up
+        // and try to use a slot that's already in use, we could fill the slot and
+        // switch it directly from Idle to Submitted state.
+        if let Err(s) = self.state.compare_exchange(
+            NeonIOHandleState::Idle,
+            NeonIOHandleState::Filling,
+            Ordering::Relaxed,
+            Ordering::Relaxed,
+        ) {
+            panic!("unexpected state in request slot: {s:?}");
+        }
+
+        // This fence synchronizes-with store/swap in `communicator_process_main_loop`.
+        fence(Ordering::Acquire);
+
+        self.owner_procno.store(proc_number, Ordering::Relaxed);
+        unsafe { *self.request.get() = *request }
+        self.state
+            .store(NeonIOHandleState::Submitted, Ordering::Release);
+    }
+
+    pub fn try_get_result(&self) -> Option<NeonIOResult> {
+        // FIXME: ordering?
+        let state = self.state.load(Ordering::Relaxed);
+        if state == NeonIOHandleState::Completed {
+            // This fence synchronizes-with store/swap in `communicator_process_main_loop`.
+            fence(Ordering::Acquire);
+            let result = unsafe { *self.result.get() };
+            self.state.store(NeonIOHandleState::Idle, Ordering::Relaxed);
+            Some(result)
+        } else {
+            None
+        }
+    }
+
+    pub fn start_processing_request<'a>(&'a self) -> Option<RequestProcessingGuard<'a>> {
+        // Read the IO request from the slot indicated in the wakeup
+        //
+        // XXX: using compare_exchange for this is not strictly necessary, as long as
+        // the communicator process has _some_ means of tracking which requests it's
+        // already processing. That could be a flag somewhere in communicator's private
+        // memory, for example.
+        if let Err(s) = self.state.compare_exchange(
+            NeonIOHandleState::Submitted,
+            NeonIOHandleState::Processing,
+            Ordering::Relaxed,
+            Ordering::Relaxed,
+        ) {
+            // FIXME surprising state. This is unexpected at the moment, but if we
+            // started to process requests more aggressively, without waiting for the
+            // read from the pipe, then this could happen
+            panic!("unexpected state in request slot: {s:?}");
+        }
+        fence(Ordering::Acquire);
+
+        Some(RequestProcessingGuard(self))
+    }
+}
--- a/pgxn/neon/communicator/src/backend_interface.rs
+++ b/pgxn/neon/communicator/src/backend_interface.rs
@@ -0,0 +1,196 @@
+//! This code runs in each backend process. That means that launching Rust threads, panicking
+//! etc. is forbidden!
+
+use crate::backend_comms::NeonIOHandle;
+use crate::init::CommunicatorInitStruct;
+use crate::integrated_cache::{BackendCacheReadOp, IntegratedCacheReadAccess};
+use crate::neon_request::CCachedGetPageVResult;
+use crate::neon_request::{NeonIORequest, NeonIOResult};
+
+pub struct CommunicatorBackendStruct<'t> {
+    my_proc_number: i32,
+
+    next_neon_request_idx: u32,
+
+    my_start_idx: u32, // First request slot that belongs to this backend
+    my_end_idx: u32,   // end + 1 request slot that belongs to this backend
+
+    neon_request_slots: &'t [NeonIOHandle],
+
+    submission_pipe_write_fd: std::ffi::c_int,
+
+    pending_cache_read_op: Option<BackendCacheReadOp<'t>>,
+
+    integrated_cache: &'t IntegratedCacheReadAccess<'t>,
+}
+
+#[unsafe(no_mangle)]
+pub extern "C" fn rcommunicator_backend_init(
+    cis: Box<CommunicatorInitStruct>,
+    my_proc_number: i32,
+) -> &'static mut CommunicatorBackendStruct<'static> {
+    let start_idx = my_proc_number as u32 * cis.num_neon_request_slots_per_backend;
+    let end_idx = start_idx + cis.num_neon_request_slots_per_backend;
+
+    let integrated_cache = Box::leak(Box::new(cis.integrated_cache_init_struct.backend_init()));
+
+    let bs: &'static mut CommunicatorBackendStruct =
+        Box::leak(Box::new(CommunicatorBackendStruct {
+            my_proc_number,
+            next_neon_request_idx: start_idx,
+            my_start_idx: start_idx,
+            my_end_idx: end_idx,
+            neon_request_slots: cis.neon_request_slots,
+
+            submission_pipe_write_fd: cis.submission_pipe_write_fd,
+            pending_cache_read_op: None,
+
+            integrated_cache,
+        }));
+    bs
+}
+
+/// Start a request. You can poll for its completion and get the result by
+/// calling bcomm_poll_dbsize_request_completion(). The communicator will wake
+/// us up by setting our process latch, so to wait for the completion, wait on
+/// the latch and call bcomm_poll_dbsize_request_completion() every time the
+/// latch is set.
+///
+/// Safety: The C caller must ensure that the references are valid.
+#[unsafe(no_mangle)]
+pub extern "C" fn bcomm_start_io_request<'t>(
+    bs: &'t mut CommunicatorBackendStruct,
+    request: &NeonIORequest,
+    immediate_result_ptr: &mut NeonIOResult,
+) -> i32 {
+    assert!(bs.pending_cache_read_op.is_none());
+
+    // Check if the request can be satisfied from the cache first
+    if let NeonIORequest::RelSize(req) = request {
+        if let Some(nblocks) = bs.integrated_cache.get_rel_size(&req.reltag()) {
+            *immediate_result_ptr = NeonIOResult::RelSize(nblocks);
+            return -1;
+        }
+    }
+
+    // Create neon request and submit it
+    let request_idx = bs.start_neon_request(request);
+
+    // Tell the communicator about it
+    bs.submit_request(request_idx);
+
+    return request_idx;
+}
+
+#[unsafe(no_mangle)]
+pub extern "C" fn bcomm_start_get_page_v_request<'t>(
+    bs: &'t mut CommunicatorBackendStruct,
+    request: &NeonIORequest,
+    immediate_result_ptr: &mut CCachedGetPageVResult,
+) -> i32 {
+    let NeonIORequest::GetPageV(get_pagev_request) = request else {
+        panic!("invalid request passed to bcomm_start_get_page_v_request()");
+    };
+    assert!(matches!(request, NeonIORequest::GetPageV(_)));
+    assert!(bs.pending_cache_read_op.is_none());
+
+    // Check if the request can be satisfied from the cache first
+    let mut all_cached = true;
+    let read_op = bs.integrated_cache.start_read_op();
+    for i in 0..get_pagev_request.nblocks {
+        if let Some(cache_block) = read_op.get_page(
+            &get_pagev_request.reltag(),
+            get_pagev_request.block_number + i as u32,
+        ) {
+            (*immediate_result_ptr).cache_block_numbers[i as usize] = cache_block;
+        } else {
+            // not found in cache
+            all_cached = false;
+            break;
+        }
+    }
+    if all_cached {
+        bs.pending_cache_read_op = Some(read_op);
+        return -1;
+    }
+
+    // Create neon request and submit it
+    let request_idx = bs.start_neon_request(request);
+
+    // Tell the communicator about it
+    bs.submit_request(request_idx);
+
+    return request_idx;
+}
+
+/// Check if a request has completed. Returns:
+///
+/// -1 if the request is still being processed
+/// 0 on success
+#[unsafe(no_mangle)]
+pub extern "C" fn bcomm_poll_request_completion(
+    bs: &mut CommunicatorBackendStruct,
+    request_idx: u32,
+    result_p: &mut NeonIOResult,
+) -> i32 {
+    match bs.neon_request_slots[request_idx as usize].try_get_result() {
+        None => -1, // still processing
+        Some(result) => {
+            *result_p = result;
+            0
+        }
+    }
+}
+
+// LFC functions
+
+/// Finish a local file cache read
+///
+//
+#[unsafe(no_mangle)]
+pub extern "C" fn bcomm_finish_cache_read(bs: &mut CommunicatorBackendStruct) -> bool {
+    if let Some(op) = bs.pending_cache_read_op.take() {
+        op.finish()
+    } else {
+        panic!("bcomm_finish_cache_read() called with no cached read pending");
+    }
+}
+
+impl<'t> CommunicatorBackendStruct<'t> {
+    /// Send a wakeup to the communicator process
+    fn submit_request(self: &CommunicatorBackendStruct<'t>, request_idx: i32) {
+        // wake up communicator by writing the idx to the submission pipe
+        //
+        // This can block, if the pipe is full. That should be very rare,
+        // because the communicator tries hard to drain the pipe to prevent
+        // that. Also, there's a natural upper bound on how many wakeups can be
+        // queued up: there is only a limited number of request slots for each
+        // backend.
+        //
+        // If it does block very briefly, that's not too serious.
+        let idxbuf = request_idx.to_ne_bytes();
+        let _res = nix::unistd::write(self.submission_pipe_write_fd, &idxbuf);
+        // FIXME: check result, return any errors
+    }
+
+    /// Note: there's no guarantee on when the communicator might pick it up. You should ring
+    /// the doorbell. But it might pick it up immediately.
+    pub(crate) fn start_neon_request(&mut self, request: &NeonIORequest) -> i32 {
+        let my_proc_number = self.my_proc_number;
+
+        // Grab next free slot
+        // FIXME: any guarantee that there will be any?
+        let idx = self.next_neon_request_idx;
+
+        let next_idx = idx + 1;
+        self.next_neon_request_idx = if next_idx == self.my_end_idx {
+            self.my_start_idx
+        } else {
+            next_idx
+        };
+
+        self.neon_request_slots[idx as usize].fill_request(request, my_proc_number);
+
+        return idx as i32;
+    }
+}
--- a/pgxn/neon/communicator/src/file_cache.rs
+++ b/pgxn/neon/communicator/src/file_cache.rs
@@ -0,0 +1,109 @@
+//! Implement the "low-level" parts of the file cache.
+//!
+//! This module just deals with reading and writing the file, and keeping track
+//! which blocks in the cache file are in use and which are free. The "high
+//! level" parts of tracking which block in the cache file corresponds to which
+//! relation block is handled in 'integrated_cache' instead.
+//!
+//! This module is only used to access the file from the communicator
+//! process. The backend processes *also* read the file (and sometimes also
+//! write it? ), but the backends use direct C library calls for that.
+use std::fs::File;
+use std::path::Path;
+use std::sync::Arc;
+use std::sync::atomic::{AtomicU64, Ordering};
+
+use tokio_epoll_uring;
+
+use crate::BLCKSZ;
+
+pub type CacheBlock = u64;
+
+pub struct FileCache {
+    uring_system: tokio_epoll_uring::SystemHandle,
+
+    file: Arc<File>,
+
+    // TODO: there's no reclamation mechanism, the cache grows
+    // indefinitely. This is the next free block, i.e. the current
+    // size of the file
+    next_free_block: AtomicU64,
+}
+
+impl FileCache {
+    pub fn new(
+        file_cache_path: &Path,
+        uring_system: tokio_epoll_uring::SystemHandle,
+    ) -> Result<FileCache, std::io::Error> {
+        let file = std::fs::OpenOptions::new()
+            .read(true)
+            .write(true)
+            .truncate(true)
+            .create(true)
+            .open(file_cache_path)?;
+
+        tracing::info!("Created cache file {file_cache_path:?}");
+
+        Ok(FileCache {
+            file: Arc::new(file),
+            uring_system,
+            next_free_block: AtomicU64::new(0),
+        })
+    }
+
+    // File cache management
+
+    pub async fn read_block(
+        &self,
+        cache_block: CacheBlock,
+        dst: impl uring_common::buf::IoBufMut + Send + Sync,
+    ) -> Result<(), std::io::Error> {
+        assert!(dst.bytes_total() == BLCKSZ);
+        let file = self.file.clone();
+
+        let ((_file, _buf), res) = self
+            .uring_system
+            .read(file, cache_block as u64 * BLCKSZ as u64, dst)
+            .await;
+
+        let res = res.map_err(map_io_uring_error)?;
+        if res != BLCKSZ {
+            panic!("unexpected read result");
+        }
+
+        Ok(())
+    }
+
+    pub async fn write_block(
+        &self,
+        cache_block: CacheBlock,
+        src: impl uring_common::buf::IoBuf + Send + Sync,
+    ) -> Result<(), std::io::Error> {
+        assert!(src.bytes_init() == BLCKSZ);
+        let file = self.file.clone();
+
+        let ((_file, _buf), res) = self
+            .uring_system
+            .write(file, cache_block as u64 * BLCKSZ as u64, src)
+            .await;
+        let res = res.map_err(map_io_uring_error)?;
+        if res != BLCKSZ {
+            panic!("unexpected read result");
+        }
+
+        Ok(())
+    }
+
+    pub fn alloc_block(&self) -> CacheBlock {
+        self.next_free_block.fetch_add(1, Ordering::Relaxed)
+    }
+}
+
+fn map_io_uring_error(err: tokio_epoll_uring::Error<std::io::Error>) -> std::io::Error {
+    match err {
+        tokio_epoll_uring::Error::Op(err) => err,
+        tokio_epoll_uring::Error::System(err) => {
+            std::io::Error::new(std::io::ErrorKind::Other, err)
+        }
+    }
+}
--- a/pgxn/neon/communicator/src/init.rs
+++ b/pgxn/neon/communicator/src/init.rs
@@ -0,0 +1,130 @@
+//! Initialization functions. These are executed in the postmaster process,
+//! at different stages of server startup.
+//!
+//!
+//! Communicator initialization steps:
+//!
+//! 1. At postmaster startup, before shared memory is allocated,
+//!    rcommunicator_shmem_size() is called to get the amount of
+//!    shared memory that this module needs.
+//!
+//! 2. Later, after the shared memory has been allocated,
+//!    rcommunicator_shmem_init() is called to initialize the shmem
+//!    area.
+//!
+//! Per process initialization:
+//!
+//! When a backend process starts up, it calls rcommunicator_backend_init().
+//! In the communicator worker process, other functions are called, see
+//! `worker_process` module.
+
+use std::ffi::c_int;
+use std::mem;
+
+use crate::backend_comms::NeonIOHandle;
+use crate::integrated_cache::IntegratedCacheInitStruct;
+
+const NUM_NEON_REQUEST_SLOTS_PER_BACKEND: u32 = 5;
+
+/// This struct is created in the postmaster process, and inherited to
+/// the communicator process and all backend processes through fork()
+#[repr(C)]
+pub struct CommunicatorInitStruct {
+    #[allow(dead_code)]
+    pub max_procs: u32,
+
+    pub submission_pipe_read_fd: std::ffi::c_int,
+    pub submission_pipe_write_fd: std::ffi::c_int,
+
+    // Shared memory data structures
+    pub num_neon_request_slots_per_backend: u32,
+
+    pub neon_request_slots: &'static [NeonIOHandle],
+
+    pub integrated_cache_init_struct: IntegratedCacheInitStruct<'static>,
+}
+
+impl std::fmt::Debug for CommunicatorInitStruct {
+    fn fmt(&self, fmt: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
+        fmt.debug_struct("CommunicatorInitStruct")
+            .field("max_procs", &self.max_procs)
+            .field("submission_pipe_read_fd", &self.submission_pipe_read_fd)
+            .field("submission_pipe_write_fd", &self.submission_pipe_write_fd)
+            .field(
+                "num_neon_request_slots_per_backend",
+                &self.num_neon_request_slots_per_backend,
+            )
+            .field("neon_request_slots length", &self.neon_request_slots.len())
+            .finish()
+    }
+}
+
+#[unsafe(no_mangle)]
+pub extern "C" fn rcommunicator_shmem_size(max_procs: u32) -> u64 {
+    let mut size = 0;
+
+    let num_neon_request_slots = max_procs * NUM_NEON_REQUEST_SLOTS_PER_BACKEND;
+    size += mem::size_of::<NeonIOHandle>() * num_neon_request_slots as usize;
+
+    // For integrated_cache's Allocator. TODO: make this adjustable
+    size += IntegratedCacheInitStruct::shmem_size(max_procs);
+
+    size as u64
+}
+
+/// Initialize the shared memory segment. Returns a backend-private
+/// struct, which will be inherited by backend processes through fork
+#[unsafe(no_mangle)]
+pub extern "C" fn rcommunicator_shmem_init(
+    submission_pipe_read_fd: c_int,
+    submission_pipe_write_fd: c_int,
+    max_procs: u32,
+    shmem_area_ptr: *mut u8,
+    shmem_area_len: u64,
+) -> &'static mut CommunicatorInitStruct {
+    let mut ptr = shmem_area_ptr;
+
+    // Carve out the request slots from the shmem area and initialize them
+    let num_neon_request_slots_per_backend = NUM_NEON_REQUEST_SLOTS_PER_BACKEND;
+    let num_neon_request_slots = max_procs * num_neon_request_slots_per_backend;
+
+    let len_used;
+    let neon_request_slots: &mut [NeonIOHandle] = unsafe {
+        ptr = ptr.add(ptr.align_offset(std::mem::align_of::<NeonIOHandle>()));
+        let neon_request_slots_ptr: *mut NeonIOHandle = ptr.cast();
+        for _i in 0..num_neon_request_slots {
+            let slot: *mut NeonIOHandle = ptr.cast();
+            *slot = NeonIOHandle::default();
+            ptr = ptr.byte_add(mem::size_of::<NeonIOHandle>());
+        }
+        len_used = ptr.byte_offset_from(shmem_area_ptr) as usize;
+        assert!(len_used <= shmem_area_len as usize);
+
+        std::slice::from_raw_parts_mut(neon_request_slots_ptr, num_neon_request_slots as usize)
+    };
+
+    let remaining_area =
+        unsafe { std::slice::from_raw_parts_mut(ptr, shmem_area_len as usize - len_used) };
+
+    // Give the rest of the area to the integrated cache
+    let integrated_cache_init_struct =
+        IntegratedCacheInitStruct::shmem_init(max_procs, remaining_area);
+
+    eprintln!(
+        "PIPE READ {} WRITE {}",
+        submission_pipe_read_fd, submission_pipe_write_fd
+    );
+
+    let cis: &'static mut CommunicatorInitStruct = Box::leak(Box::new(CommunicatorInitStruct {
+        max_procs,
+        submission_pipe_read_fd,
+        submission_pipe_write_fd,
+
+        num_neon_request_slots_per_backend: NUM_NEON_REQUEST_SLOTS_PER_BACKEND,
+        neon_request_slots,
+
+        integrated_cache_init_struct,
+    }));
+
+    cis
+}
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Heikki Linnakangas	e58d0fece1	New communicator, with "integrated" cache accessible from all processes	2025-04-29 11:52:44 +03:00
Alex Chi Z.	11f6044338	fix(pageserver): report synthetic size = 1 if all tls offloaded (2) (#11731 ) ## Problem https://github.com/neondatabase/neon/pull/11648 did this for resident size instead of synthetic size. ## Summary of changes Report synthetic_size == 1 if all timelines are offloaded. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-28 13:45:45 +00:00
Konstantin Knizhnik	692c0f3fb8	Prepare to prewarm support (#11740 ) ## Problem See (original prewarm implementation) https://github.com/neondatabase/neon/pull/9197 (functions for storing/restoring LFC state) https://github.com/neondatabase/neon/pull/9587 (store prefetch results in LFC) https://github.com/neondatabase/neon/pull/10442 ## Summary of changes Preparation for prewarm implementation. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-28 13:24:18 +00:00
Alexander Bayandin	2b1d2a55d6	CI: fix typo oicd -> oidc (#11747 ) ## Problem It's OIDC (OpenID Connect), not OICD ## Summary of changes - Rename actions input `aws-oicd-role-arn` -> `aws-oidc-role-arn`	2025-04-28 12:44:28 +00:00
Konstantin Knizhnik	60b9fb1baf	Ignore unlogged LSNs in set last written LSN (#11743 ) ## Problem See https://github.com/neondatabase/neon/issues/11718 and https://neondb.slack.com/archives/C033RQ5SPDH/p1745122797538509 GIST other indexes performing "unlogged build" are using so called fake LSNs - not a real LSN, but something like 0/1. Been stored in lwlsn cache they cause incorrect lookup at PS. ## Summary of changes Do not store fake LSNs in LwLSN hash. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-28 12:16:29 +00:00
Erik Grinaker	606f14034e	pageserver: improve `pageserver_smgr_query_seconds` buckets (#11680 ) ## Problem The `pageserver_smgr_query_seconds` buckets are too coarse, using powers of 10: 1 µs, 10 µs, 100 µs, 1 ms, 10 ms, 100 ms, 1 s, 10 s, 100 s. This is one of our most crucial latency metrics, and needs better resolution. Touches #11594. ## Summary of changes This patch uses buckets with better resolution around 1 ms (the typical latency): * 0.6 ms * 1 ms * 3 ms * 6 ms * 10 ms * 30 ms * 100 ms * 1 s * 3 s These will be the same as the compute's `compute_getpage_wait_seconds`, to make them comparable across the compute and Pageserver: https://github.com/neondatabase/flux-fleet/pull/579. We sacrifice buckets above 3 s, since these can already be considered "too slow". This does not change the previously used `CRITICAL_OP_BUCKETS`, which is also used for other operations on different timescales (e.g. LSN waits). We should consider replacing this with more appropriate buckets for specific operations, since it covers a large span with low resolution.	2025-04-28 11:52:44 +00:00
Conrad Ludgate	32393b4393	pg-sni-router: support compute TLS on different port (#11732 ) ## Problem pg-sni-router isn't aware of compute TLS ## Summary of changes If connections come in on port 4433, we require TLS to compute from pg-sni-router	2025-04-28 11:29:44 +00:00
Alexander Bayandin	1a29f5672a	CI(check-macos-build): trigger workflow automatically for PRs (#11706 ) ## Problem - if-conditions for the `check-macos-build` workflow don't trigger it on PRs with relevant changes (in Rust code or Postgres submodules). - Jobs in the workflow depend on the presence of a cache, which is not guaranteed. ## Summary of changes - Fix if-conditions - Use artifacts on top of cache whenever the workflow depends on it — the cache might not be available	2025-04-28 09:03:10 +00:00
a-masterov	b8d47b5acf	Run the extensions' tests on staging (#11164 ) ## Problem We currently don't run end-to-end tests for PostgreSQL extensions on our cloud infrastructure, which means we might miss problems that only occur in a real cloud environment. ## Summary of changes - Added a workflow to run extension tests against a cloud staging instance - Set up proper project configuration for extension testing - Implemented test execution with appropriate environment settings - Added error handling and reporting for test failures --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-04-28 08:13:49 +00:00
Alexander Lakhin	97e01ae6fd	Add workflow to run particular test(s) N times (#11050 ) ## Problem Provide an easy way to run particular test(s) N times on CI. ## Summary of changes * Allow for passing the test selection and the number of test runs to the existing "Build and Test Locally" workflow * Allow for running multiple selected tests by the "Pytest regression tests" step * Introduce a new workflow to run specified test(s) several times * Store results in a separate database to distinguish between testing tests for stability and usual testing	2025-04-28 04:04:37 +00:00
Lokesh	459d51974c	doc: minor updates to consumption-metrics document (#7153 ) ## Problem Proposed minor changes to the `consumption_metrics` document. ## Summary of changes - Fixed minor typos in the document. - Minor formatting in the description of metrics `timeline_logical_size` and `synthetic_storage_size`. Makes this consistent as with description of other metrics in the document. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Mikhail Kot <mikhail@neon.tech>	2025-04-25 19:46:40 +00:00