Add pyo3 prototype

Cleanup
2026-03-10 03:40:37 +00:00 · 2022-08-30 11:28:22 -04:00 · 2022-08-18 15:17:30 -04:00 · 2022-08-18 15:16:14 -04:00 · 2022-08-18 13:20:01 -04:00 · 2022-08-18 13:14:05 -04:00
494 changed files with 85266 additions and 22668 deletions
--- a/.cargo/config.toml
+++ b/.cargo/config.toml
@@ -0,0 +1,13 @@
+# The binaries are really slow, if you compile them in 'dev' mode with the defaults.
+# Enable some optimizations even in 'dev' mode, to make tests faster. The basic
+# optimizations enabled by "opt-level=1" don't affect debuggability too much.
+#
+# See https://www.reddit.com/r/rust/comments/gvrgca/this_is_a_neat_trick_for_getting_good_runtime/
+#
+[profile.dev.package."*"]
+# Set the default for dependencies in Development mode.
+opt-level = 3
+
+[profile.dev]
+# Turn on a small amount of optimization in Development mode.
+opt-level = 1
--- a/.config/hakari.toml
+++ b/.config/hakari.toml
@@ -0,0 +1,26 @@
+# This file contains settings for `cargo hakari`.
+# See https://docs.rs/cargo-hakari/latest/cargo_hakari/config for a full list of options.
+
+hakari-package = "workspace_hack"
+
+# Format for `workspace-hack = ...` lines in other Cargo.tomls. Requires cargo-hakari 0.9.8 or above.
+dep-format-version = "2"
+
+# Setting workspace.resolver = "2" in the root Cargo.toml is HIGHLY recommended.
+# Hakari works much better with the new feature resolver.
+# For more about the new feature resolver, see:
+# https://blog.rust-lang.org/2021/03/25/Rust-1.51.0.html#cargos-new-feature-resolver
+# Have to keep the resolver still here since hakari requires this field,
+# despite it's now the default for 2021 edition & cargo.
+resolver = "2"
+
+# Add triples corresponding to platforms commonly used by developers here.
+# https://doc.rust-lang.org/rustc/platform-support.html
+platforms = [
+    # "x86_64-unknown-linux-gnu",
+    # "x86_64-apple-darwin",
+    # "x86_64-pc-windows-msvc",
+]
+
+# Write out exact versions rather than a semver range. (Defaults to false.)
+# exact-versions = true
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,18 @@
+**/.git/
+**/__pycache__
+**/.pytest_cache
+
+.git
+target
+tmp_check
+tmp_install
+tmp_check_cli
+test_output
+.vscode
+.neon
+integration_tests/.neon
+.mypy_cache
+
+Dockerfile
+.dockerignore
+
--- a/.github/actions/download/action.yml
+++ b/.github/actions/download/action.yml
@@ -0,0 +1,56 @@
+name: "Download an artifact"
+description: "Custom download action"
+inputs:
+  name:
+    description: "Artifact name"
+    required: true
+  path:
+    description: "A directory to put artifact into"
+    default: "."
+    required: false
+  skip-if-does-not-exist:
+    description: "Allow to skip if file doesn't exist, fail otherwise"
+    default: false
+    required: false
+
+runs:
+  using: "composite"
+  steps:
+    - name: Download artifact
+      id: download-artifact
+      shell: bash -euxo pipefail {0}
+      env:
+        TARGET: ${{ inputs.path }}
+        ARCHIVE: /tmp/downloads/${{ inputs.name }}.tar.zst
+        SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}
+      run: |
+        BUCKET=neon-github-public-dev
+        PREFIX=artifacts/${GITHUB_RUN_ID}
+        FILENAME=$(basename $ARCHIVE)
+
+        S3_KEY=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${PREFIX} | jq -r '.Contents[].Key' | grep ${FILENAME} | sort --version-sort | tail -1 || true)
+        if [ -z "${S3_KEY}" ]; then
+          if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then
+            echo '::set-output name=SKIPPED::true'
+            exit 0
+          else
+            echo 2>&1 "Neither s3://${BUCKET}/${PREFIX}/${GITHUB_RUN_ATTEMPT}/${FILENAME} nor its version from previous attempts exist"
+            exit 1
+          fi
+        fi
+
+        echo '::set-output name=SKIPPED::false'
+
+        mkdir -p $(dirname $ARCHIVE)
+        time aws s3 cp --only-show-errors s3://${BUCKET}/${S3_KEY} ${ARCHIVE}
+
+    - name: Extract artifact
+      if: ${{ steps.download-artifact.outputs.SKIPPED == 'false' }}
+      shell: bash -euxo pipefail {0}
+      env:
+        TARGET: ${{ inputs.path }}
+        ARCHIVE: /tmp/downloads/${{ inputs.name }}.tar.zst
+      run: |
+        mkdir -p ${TARGET}
+        time tar -xf ${ARCHIVE} -C ${TARGET}
+        rm -f ${ARCHIVE}
--- a/.github/actions/run-python-test-set/action.yml
+++ b/.github/actions/run-python-test-set/action.yml
@@ -0,0 +1,162 @@
+name: 'Run python test'
+description: 'Runs a Neon python test set, performing all the required preparations before'
+
+inputs:
+  build_type:
+    description: 'Type of Rust (neon) and C (postgres) builds. Must be "release" or "debug".'
+    required: true
+  rust_toolchain:
+    description: 'Rust toolchain version to fetch the caches'
+    required: true
+  test_selection:
+    description: 'A python test suite to run'
+    required: true
+  extra_params:
+    description: 'Arbitrary parameters to pytest. For example "-s" to prevent capturing stdout/stderr'
+    required: false
+    default: ''
+  needs_postgres_source:
+    description: 'Set to true if the test suite requires postgres source checked out'
+    required: false
+    default: 'false'
+  run_in_parallel:
+    description: 'Whether to run tests in parallel'
+    required: false
+    default: 'true'
+  save_perf_report:
+    description: 'Whether to upload the performance report'
+    required: false
+    default: 'false'
+  run_with_real_s3:
+    description: 'Whether to pass real s3 credentials to the test suite'
+    required: false
+    default: 'false'
+  real_s3_bucket:
+    description: 'Bucket name for real s3 tests'
+    required: false
+    default: ''
+  real_s3_region:
+    description: 'Region name for real s3 tests'
+    required: false
+    default: ''
+  real_s3_access_key_id:
+    description: 'Access key id'
+    required: false
+    default: ''
+  real_s3_secret_access_key:
+    description: 'Secret access key'
+    required: false
+    default: ''
+
+runs:
+  using: "composite"
+  steps:
+    - name: Get Neon artifact
+      uses: ./.github/actions/download
+      with:
+        name: neon-${{ runner.os }}-${{ inputs.build_type }}-${{ inputs.rust_toolchain }}-artifact
+        path: /tmp/neon
+
+    - name: Checkout
+      if: inputs.needs_postgres_source == 'true'
+      uses: actions/checkout@v3
+      with:
+        submodules: true
+        fetch-depth: 1
+
+    - name: Cache poetry deps
+      id: cache_poetry
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pypoetry/virtualenvs
+        key: v1-${{ runner.os }}-python-deps-${{ hashFiles('poetry.lock') }}
+
+    - name: Install Python deps
+      shell: bash -euxo pipefail {0}
+      run: ./scripts/pysync
+
+    - name: Run pytest
+      env:
+        NEON_BIN: /tmp/neon/bin
+        POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
+        TEST_OUTPUT: /tmp/test_output
+        # this variable will be embedded in perf test report
+        # and is needed to distinguish different environments
+        PLATFORM: github-actions-selfhosted
+        BUILD_TYPE: ${{ inputs.build_type }}
+        AWS_ACCESS_KEY_ID: ${{ inputs.real_s3_access_key_id }}
+        AWS_SECRET_ACCESS_KEY: ${{ inputs.real_s3_secret_access_key }}
+      shell: bash -euxo pipefail {0}
+      run: |
+        PERF_REPORT_DIR="$(realpath test_runner/perf-report-local)"
+        rm -rf $PERF_REPORT_DIR
+
+        TEST_SELECTION="test_runner/${{ inputs.test_selection }}"
+        EXTRA_PARAMS="${{ inputs.extra_params }}"
+        if [ -z "$TEST_SELECTION" ]; then
+          echo "test_selection must be set"
+          exit 1
+        fi
+        if [[ "${{ inputs.run_in_parallel }}" == "true" ]]; then
+          EXTRA_PARAMS="-n4 $EXTRA_PARAMS"
+        fi
+
+        if [[ "${{ inputs.run_with_real_s3 }}" == "true" ]]; then
+          echo "REAL S3 ENABLED"
+          export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty
+          export REMOTE_STORAGE_S3_BUCKET=${{ inputs.real_s3_bucket }}
+          export REMOTE_STORAGE_S3_REGION=${{ inputs.real_s3_region }}
+        fi
+
+        if [[ "${{ inputs.save_perf_report }}" == "true" ]]; then
+          if [[ "$GITHUB_REF" == "refs/heads/main" ]]; then
+            mkdir -p "$PERF_REPORT_DIR"
+            EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
+          fi
+        fi
+
+        if [[ "${{ inputs.build_type }}" == "debug" ]]; then
+          cov_prefix=(scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage run)
+        elif [[ "${{ inputs.build_type }}" == "release" ]]; then
+          cov_prefix=()
+        fi
+
+        # Run the tests.
+        #
+        # The junit.xml file allows CI tools to display more fine-grained test information
+        # in its "Tests" tab in the results page.
+        # --verbose prints name of each test (helpful when there are
+        # multiple tests in one file)
+        # -rA prints summary in the end
+        # -n4 uses four processes to run tests via pytest-xdist
+        # -s is not used to prevent pytest from capturing output, because tests are running
+        # in parallel and logs are mixed between different tests
+        "${cov_prefix[@]}" ./scripts/pytest \
+          --junitxml=$TEST_OUTPUT/junit.xml \
+          --tb=short \
+          --verbose \
+          -m "not remote_cluster" \
+          -rA $TEST_SELECTION $EXTRA_PARAMS
+
+        if [[ "${{ inputs.save_perf_report }}" == "true" ]]; then
+          if [[ "$GITHUB_REF" == "refs/heads/main" ]]; then
+            export REPORT_FROM="$PERF_REPORT_DIR"
+            export REPORT_TO=local
+            scripts/generate_and_push_perf_report.sh
+          fi
+        fi
+
+    - name: Delete all data but logs
+      shell: bash -euxo pipefail {0}
+      if: always()
+      run: |
+        du -sh /tmp/test_output/*
+        find /tmp/test_output -type f ! -name "*.log" ! -name "regression.diffs" ! -name "junit.xml" ! -name "*.filediff" ! -name "*.stdout" ! -name "*.stderr" ! -name "flamegraph.svg" ! -name "*.metrics" -delete
+        du -sh /tmp/test_output/*
+
+    - name: Upload python test logs
+      if: always()
+      uses: ./.github/actions/upload
+      with:
+        name: python-test-${{ inputs.test_selection }}-${{ runner.os }}-${{ inputs.build_type }}-${{ inputs.rust_toolchain }}-logs
+        path: /tmp/test_output/
--- a/.github/actions/save-coverage-data/action.yml
+++ b/.github/actions/save-coverage-data/action.yml
@@ -0,0 +1,22 @@
+name: 'Merge and upload coverage data'
+description: 'Compresses and uploads the coverage data as an artifact'
+
+runs:
+  using: "composite"
+  steps:
+    - name: Merge coverage data
+      shell: bash -euxo pipefail {0}
+      run: scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage merge
+
+    - name: Download previous coverage data into the same directory
+      uses: ./.github/actions/download
+      with:
+        name: coverage-data-artifact
+        path: /tmp/coverage
+        skip-if-does-not-exist: true # skip if there's no previous coverage to download
+
+    - name: Upload coverage data
+      uses: ./.github/actions/upload
+      with:
+        name: coverage-data-artifact
+        path: /tmp/coverage
--- a/.github/actions/upload/action.yml
+++ b/.github/actions/upload/action.yml
@@ -0,0 +1,55 @@
+name: "Upload an artifact"
+description: "Custom upload action"
+inputs:
+  name:
+    description: "Artifact name"
+    required: true
+  path:
+    description: "A directory or file to upload"
+    required: true
+
+runs:
+  using: "composite"
+  steps:
+    - name: Prepare artifact
+      shell: bash -euxo pipefail {0}
+      env:
+        SOURCE: ${{ inputs.path }}
+        ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst
+      run: |
+        mkdir -p $(dirname $ARCHIVE)
+
+        if [ -f ${ARCHIVE} ]; then
+          echo 2>&1 "File ${ARCHIVE} already exist. Something went wrong before"
+          exit 1
+        fi
+
+        ZSTD_NBTHREADS=0
+        if [ -d  ${SOURCE} ]; then
+          time tar -C ${SOURCE} -cf ${ARCHIVE} --zstd .
+        elif [ -f ${SOURCE} ]; then
+          time tar -cf ${ARCHIVE} --zstd ${SOURCE}
+        elif ! ls ${SOURCE} > /dev/null 2>&1; then
+          echo 2>&1 "${SOURCE} does not exist"
+          exit 2
+        else
+          echo 2>&1 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
+          exit 3
+        fi
+
+    - name: Upload artifact
+      shell: bash -euxo pipefail {0}
+      env:
+        SOURCE: ${{ inputs.path }}
+        ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst
+      run: |
+        BUCKET=neon-github-public-dev
+        PREFIX=artifacts/${GITHUB_RUN_ID}
+        FILENAME=$(basename $ARCHIVE)
+
+        FILESIZE=$(du -sh ${ARCHIVE} | cut -f1)
+
+        time aws s3 mv --only-show-errors ${ARCHIVE} s3://${BUCKET}/${PREFIX}/${GITHUB_RUN_ATTEMPT}/${FILENAME}
+
+        # Ref https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#adding-a-job-summary
+        echo "[${FILENAME}](https://${BUCKET}.s3.amazonaws.com/${PREFIX}/${GITHUB_RUN_ATTEMPT}/${FILENAME}) ${FILESIZE}" >> ${GITHUB_STEP_SUMMARY}
--- a/.github/ansible/.gitignore
+++ b/.github/ansible/.gitignore
@@ -0,0 +1,4 @@
+zenith_install.tar.gz
+.zenith_current_version
+neon_install.tar.gz
+.neon_current_version
--- a/.github/ansible/ansible.cfg
+++ b/.github/ansible/ansible.cfg
@@ -0,0 +1,12 @@
+[defaults]
+
+localhost_warning = False
+host_key_checking = False
+timeout = 30
+
+[ssh_connection]
+ssh_args   = -F ./ansible.ssh.cfg
+# teleport doesn't support sftp yet https://github.com/gravitational/teleport/issues/7127
+# and scp neither worked for me
+transfer_method = piped
+pipelining = True
--- a/.github/ansible/ansible.ssh.cfg
+++ b/.github/ansible/ansible.ssh.cfg
@@ -0,0 +1,15 @@
+# Remove this once https://github.com/gravitational/teleport/issues/10918 is fixed
+# (use pre 8.5 option name to cope with old ssh in CI)
+PubkeyAcceptedKeyTypes +ssh-rsa-cert-v01@openssh.com
+
+Host tele.zenith.tech
+    User admin
+    Port 3023
+    StrictHostKeyChecking no
+    UserKnownHostsFile /dev/null
+
+Host * !tele.zenith.tech
+    User admin
+    StrictHostKeyChecking no
+    UserKnownHostsFile /dev/null
+    ProxyJump tele.zenith.tech
--- a/.github/ansible/deploy.yaml
+++ b/.github/ansible/deploy.yaml
@@ -0,0 +1,176 @@
+- name: Upload Neon binaries
+  hosts: storage
+  gather_facts: False
+  remote_user: admin
+
+  tasks:
+
+    - name: get latest version of Neon binaries
+      register: current_version_file
+      set_fact:
+        current_version: "{{ lookup('file', '.neon_current_version') | trim }}"
+      tags:
+      - pageserver
+      - safekeeper
+
+    - name: inform about versions
+      debug: msg="Version to deploy - {{ current_version }}"
+      tags:
+      - pageserver
+      - safekeeper
+
+    - name: upload and extract Neon binaries to /usr/local
+      ansible.builtin.unarchive:
+        owner: root
+        group: root
+        src: neon_install.tar.gz
+        dest: /usr/local
+      become: true
+      tags:
+      - pageserver
+      - safekeeper
+      - binaries
+      - putbinaries
+
+- name: Deploy pageserver
+  hosts: pageservers
+  gather_facts: False
+  remote_user: admin
+
+  tasks:
+
+    - name: upload init script
+      when: console_mgmt_base_url is defined
+      ansible.builtin.template:
+        src: scripts/init_pageserver.sh
+        dest: /tmp/init_pageserver.sh
+        owner: root
+        group: root
+        mode: '0755'
+      become: true
+      tags:
+      - pageserver
+
+    - name: init pageserver
+      shell:
+        cmd: /tmp/init_pageserver.sh
+      args:
+        creates: "/storage/pageserver/data/tenants"
+      environment:
+        NEON_REPO_DIR: "/storage/pageserver/data"
+        LD_LIBRARY_PATH: "/usr/local/lib"
+      become: true
+      tags:
+      - pageserver
+
+    - name: update remote storage (s3) config
+      lineinfile:
+        path: /storage/pageserver/data/pageserver.toml
+        line: "{{ item }}"
+      loop:
+        - "[remote_storage]"
+        - "bucket_name = '{{ bucket_name }}'"
+        - "bucket_region = '{{ bucket_region }}'"
+        - "prefix_in_bucket = '{{ inventory_hostname }}'"
+      become: true
+      tags:
+      - pageserver
+
+    - name: upload systemd service definition
+      ansible.builtin.template:
+        src: systemd/pageserver.service
+        dest: /etc/systemd/system/pageserver.service
+        owner: root
+        group: root
+        mode: '0644'
+      become: true
+      tags:
+      - pageserver
+
+    - name: start systemd service
+      ansible.builtin.systemd:
+        daemon_reload: yes
+        name: pageserver
+        enabled: yes
+        state: restarted
+      become: true
+      tags:
+      - pageserver
+
+    - name: post version to console
+      when: console_mgmt_base_url is defined
+      shell:
+        cmd: |
+          INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
+          curl -sfS -d '{"version": {{ current_version }} }' -X PATCH {{ console_mgmt_base_url }}/api/v1/pageservers/$INSTANCE_ID
+      tags:
+      - pageserver
+
+- name: Deploy safekeeper
+  hosts: safekeepers
+  gather_facts: False
+  remote_user: admin
+
+  tasks:
+
+    - name: upload init script
+      when: console_mgmt_base_url is defined
+      ansible.builtin.template:
+        src: scripts/init_safekeeper.sh
+        dest: /tmp/init_safekeeper.sh
+        owner: root
+        group: root
+        mode: '0755'
+      become: true
+      tags:
+      - safekeeper
+
+    - name: init safekeeper
+      shell:
+        cmd: /tmp/init_safekeeper.sh
+      args:
+        creates: "/storage/safekeeper/data/safekeeper.id"
+      environment:
+        NEON_REPO_DIR: "/storage/safekeeper/data"
+        LD_LIBRARY_PATH: "/usr/local/lib"
+      become: true
+      tags:
+      - safekeeper
+
+    # in the future safekeepers should discover pageservers byself
+    # but currently use first pageserver that was discovered
+    - name: set first pageserver var for safekeepers
+      set_fact:
+        first_pageserver: "{{ hostvars[groups['pageservers'][0]]['inventory_hostname'] }}"
+      tags:
+      - safekeeper
+
+    - name: upload systemd service definition
+      ansible.builtin.template:
+        src: systemd/safekeeper.service
+        dest: /etc/systemd/system/safekeeper.service
+        owner: root
+        group: root
+        mode: '0644'
+      become: true
+      tags:
+      - safekeeper
+
+    - name: start systemd service
+      ansible.builtin.systemd:
+        daemon_reload: yes
+        name: safekeeper
+        enabled: yes
+        state: restarted
+      become: true
+      tags:
+      - safekeeper
+
+    - name: post version to console
+      when: console_mgmt_base_url is defined
+      shell:
+        cmd: |
+          INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
+          curl -sfS -d '{"version": {{ current_version }} }' -X PATCH {{ console_mgmt_base_url }}/api/v1/safekeepers/$INSTANCE_ID
+      tags:
+      - safekeeper
--- a/.github/ansible/get_binaries.sh
+++ b/.github/ansible/get_binaries.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+set -e
+
+if [ -n "${DOCKER_TAG}" ]; then
+  # Verson is DOCKER_TAG but without prefix
+  VERSION=$(echo $DOCKER_TAG | sed 's/^.*-//g')
+else
+  echo "Please set DOCKER_TAG environment variable"
+  exit 1
+fi
+
+
+# do initial cleanup
+rm -rf neon_install postgres_install.tar.gz neon_install.tar.gz .neon_current_version
+mkdir neon_install
+
+# retrieve binaries from docker image
+echo "getting binaries from docker image"
+docker pull --quiet neondatabase/neon:${DOCKER_TAG}
+ID=$(docker create neondatabase/neon:${DOCKER_TAG})
+docker cp ${ID}:/data/postgres_install.tar.gz .
+tar -xzf postgres_install.tar.gz -C neon_install
+docker cp ${ID}:/usr/local/bin/pageserver neon_install/bin/
+docker cp ${ID}:/usr/local/bin/safekeeper neon_install/bin/
+docker cp ${ID}:/usr/local/bin/proxy neon_install/bin/
+docker cp ${ID}:/usr/local/bin/postgres neon_install/bin/
+docker rm -vf ${ID}
+
+# store version to file (for ansible playbooks) and create binaries tarball
+echo ${VERSION} > neon_install/.neon_current_version
+echo ${VERSION} > .neon_current_version
+tar -czf neon_install.tar.gz -C neon_install .
+
+# do final cleaup
+rm -rf neon_install postgres_install.tar.gz
--- a/.github/ansible/neon-stress.hosts
+++ b/.github/ansible/neon-stress.hosts
@@ -0,0 +1,20 @@
+[pageservers]
+neon-stress-ps-1 console_region_id=1
+neon-stress-ps-2 console_region_id=1
+
+[safekeepers]
+neon-stress-sk-1 console_region_id=1
+neon-stress-sk-2 console_region_id=1
+neon-stress-sk-3 console_region_id=1
+
+[storage:children]
+pageservers
+safekeepers
+
+[storage:vars]
+env_name = neon-stress
+console_mgmt_base_url = http://neon-stress-console.local
+bucket_name           = neon-storage-ireland
+bucket_region         = eu-west-1
+etcd_endpoints        = etcd-stress.local:2379
+safekeeper_enable_s3_offload = false
--- a/.github/ansible/production.hosts
+++ b/.github/ansible/production.hosts
@@ -0,0 +1,20 @@
+[pageservers]
+#zenith-1-ps-1 console_region_id=1
+zenith-1-ps-2 console_region_id=1
+zenith-1-ps-3 console_region_id=1
+
+[safekeepers]
+zenith-1-sk-1 console_region_id=1
+zenith-1-sk-2 console_region_id=1
+zenith-1-sk-3 console_region_id=1
+
+[storage:children]
+pageservers
+safekeepers
+
+[storage:vars]
+env_name = prod-1
+console_mgmt_base_url = http://console-release.local
+bucket_name           = zenith-storage-oregon
+bucket_region         = us-west-2
+etcd_endpoints        = zenith-1-etcd.local:2379
--- a/.github/ansible/scripts/init_pageserver.sh
+++ b/.github/ansible/scripts/init_pageserver.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+# get instance id from meta-data service
+INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
+
+# store fqdn hostname in var
+HOST=$(hostname -f)
+
+
+cat <<EOF | tee /tmp/payload
+{
+  "version": 1,
+  "host": "${HOST}",
+  "port": 6400,
+  "region_id": {{ console_region_id }},
+  "instance_id": "${INSTANCE_ID}",
+  "http_host": "${HOST}",
+  "http_port": 9898
+}
+EOF
+
+# check if pageserver already registered or not
+if ! curl -sf -X PATCH -d '{}' {{ console_mgmt_base_url }}/api/v1/pageservers/${INSTANCE_ID} -o /dev/null; then
+
+    # not registered, so register it now
+    ID=$(curl -sf -X POST {{ console_mgmt_base_url }}/api/v1/pageservers -d@/tmp/payload | jq -r '.ID')
+
+    # init pageserver
+    sudo -u pageserver /usr/local/bin/pageserver -c "id=${ID}" -c "pg_distrib_dir='/usr/local'" --init -D /storage/pageserver/data
+fi
--- a/.github/ansible/scripts/init_safekeeper.sh
+++ b/.github/ansible/scripts/init_safekeeper.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+# fetch params from meta-data service
+INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
+AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
+
+# store fqdn hostname in var
+HOST=$(hostname -f)
+
+
+cat <<EOF | tee /tmp/payload
+{
+  "version": 1,
+  "host": "${HOST}",
+  "port": 6500,
+  "http_port": 7676,
+  "region_id": {{ console_region_id }},
+  "instance_id": "${INSTANCE_ID}",
+  "availability_zone_id": "${AZ_ID}"
+}
+EOF
+
+# check if safekeeper already registered or not
+if ! curl -sf -X PATCH -d '{}' {{ console_mgmt_base_url }}/api/v1/safekeepers/${INSTANCE_ID} -o /dev/null; then
+
+    # not registered, so register it now
+    ID=$(curl -sf -X POST {{ console_mgmt_base_url }}/api/v1/safekeepers -d@/tmp/payload | jq -r '.ID')
+
+    # init safekeeper
+    sudo -u safekeeper /usr/local/bin/safekeeper --id ${ID} --init -D /storage/safekeeper/data
+fi
--- a/.github/ansible/staging.hosts
+++ b/.github/ansible/staging.hosts
@@ -0,0 +1,20 @@
+[pageservers]
+#zenith-us-stage-ps-1 console_region_id=27
+zenith-us-stage-ps-2 console_region_id=27
+zenith-us-stage-ps-3 console_region_id=27
+
+[safekeepers]
+zenith-us-stage-sk-4 console_region_id=27
+zenith-us-stage-sk-5 console_region_id=27
+zenith-us-stage-sk-6 console_region_id=27
+
+[storage:children]
+pageservers
+safekeepers
+
+[storage:vars]
+env_name = us-stage
+console_mgmt_base_url = http://console-staging.local
+bucket_name           = zenith-staging-storage-us-east-1
+bucket_region         = us-east-1
+etcd_endpoints        = zenith-us-stage-etcd.local:2379
--- a/.github/ansible/systemd/pageserver.service
+++ b/.github/ansible/systemd/pageserver.service
@@ -0,0 +1,18 @@
+[Unit]
+Description=Zenith pageserver
+After=network.target auditd.service
+
+[Service]
+Type=simple
+User=pageserver
+Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/pageserver LD_LIBRARY_PATH=/usr/local/lib
+ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoints=['{{ etcd_endpoints }}']" -D /storage/pageserver/data
+ExecReload=/bin/kill -HUP $MAINPID
+KillMode=mixed
+KillSignal=SIGINT
+Restart=on-failure
+TimeoutSec=10
+LimitNOFILE=30000000
+
+[Install]
+WantedBy=multi-user.target
--- a/.github/ansible/systemd/safekeeper.service
+++ b/.github/ansible/systemd/safekeeper.service
@@ -0,0 +1,18 @@
+[Unit]
+Description=Zenith safekeeper
+After=network.target auditd.service
+
+[Service]
+Type=simple
+User=safekeeper
+Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/safekeeper/data LD_LIBRARY_PATH=/usr/local/lib
+ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}.local:6500 --listen-http {{ inventory_hostname }}.local:7676 -D /storage/safekeeper/data --broker-endpoints={{ etcd_endpoints }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ env_name }}/wal"}'
+ExecReload=/bin/kill -HUP $MAINPID
+KillMode=mixed
+KillSignal=SIGINT
+Restart=on-failure
+TimeoutSec=10
+LimitNOFILE=30000000
+
+[Install]
+WantedBy=multi-user.target
--- a/.github/helm-values/neon-stress.proxy-scram.yaml
+++ b/.github/helm-values/neon-stress.proxy-scram.yaml
@@ -0,0 +1,26 @@
+fullnameOverride: "neon-stress-proxy-scram"
+
+settings:
+  authBackend: "console"
+  authEndpoint: "http://neon-stress-console.local/management/api/v2"
+  domain: "*.stress.neon.tech"
+
+podLabels:
+  zenith_service: proxy-scram
+  zenith_env: staging
+  zenith_region: eu-west-1
+  zenith_region_slug: ireland
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: '*.stress.neon.tech'
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/helm-values/neon-stress.proxy.yaml
+++ b/.github/helm-values/neon-stress.proxy.yaml
@@ -0,0 +1,34 @@
+fullnameOverride: "neon-stress-proxy"
+
+settings:
+  authEndpoint: "https://console.dev.neon.tech/authenticate_proxy_request/"
+  uri: "https://console.dev.neon.tech/psql_session/"
+
+# -- Additional labels for zenith-proxy pods
+podLabels:
+  zenith_service: proxy
+  zenith_env: staging
+  zenith_region: eu-west-1
+  zenith_region_slug: ireland
+
+service:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internal
+    external-dns.alpha.kubernetes.io/hostname: neon-stress-proxy.local
+  type: LoadBalancer
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: connect.dev.neon.tech
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/helm-values/production.proxy-scram.yaml
+++ b/.github/helm-values/production.proxy-scram.yaml
@@ -0,0 +1,24 @@
+settings:
+  authBackend: "console"
+  authEndpoint: "http://console-release.local/management/api/v2"
+  domain: "*.cloud.neon.tech"
+
+podLabels:
+  zenith_service: proxy-scram
+  zenith_env: production
+  zenith_region: us-west-2
+  zenith_region_slug: oregon
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: '*.cloud.neon.tech'
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/helm-values/production.proxy.yaml
+++ b/.github/helm-values/production.proxy.yaml
@@ -0,0 +1,32 @@
+settings:
+  authEndpoint: "https://console.neon.tech/authenticate_proxy_request/"
+  uri: "https://console.neon.tech/psql_session/"
+
+# -- Additional labels for zenith-proxy pods
+podLabels:
+  zenith_service: proxy
+  zenith_env: production
+  zenith_region: us-west-2
+  zenith_region_slug: oregon
+
+service:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internal
+    external-dns.alpha.kubernetes.io/hostname: proxy-release.local
+  type: LoadBalancer
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: connect.neon.tech,pg.neon.tech
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/helm-values/staging.proxy-scram.yaml
+++ b/.github/helm-values/staging.proxy-scram.yaml
@@ -0,0 +1,31 @@
+# Helm chart values for zenith-proxy.
+# This is a YAML-formatted file.
+
+image:
+  repository: neondatabase/neon
+
+settings:
+  authBackend: "console"
+  authEndpoint: "http://console-staging.local/management/api/v2"
+  domain: "*.cloud.stage.neon.tech"
+
+# -- Additional labels for zenith-proxy pods
+podLabels:
+  zenith_service: proxy-scram
+  zenith_env: staging
+  zenith_region: us-east-1
+  zenith_region_slug: virginia
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: cloud.stage.neon.tech
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/helm-values/staging.proxy.yaml
+++ b/.github/helm-values/staging.proxy.yaml
@@ -0,0 +1,30 @@
+# Helm chart values for zenith-proxy.
+# This is a YAML-formatted file.
+
+image:
+  repository: neondatabase/neon
+
+settings:
+  authEndpoint: "https://console.stage.neon.tech/authenticate_proxy_request/"
+  uri: "https://console.stage.neon.tech/psql_session/"
+
+# -- Additional labels for zenith-proxy pods
+podLabels:
+  zenith_service: proxy
+  zenith_env: staging
+  zenith_region: us-east-1
+  zenith_region_slug: virginia
+
+exposedService:
+  annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
+    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
+    external-dns.alpha.kubernetes.io/hostname: connect.stage.neon.tech
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    selector:
+      release: kube-prometheus-stack
--- a/.github/workflows/benchmarking.yml
+++ b/.github/workflows/benchmarking.yml
@@ -0,0 +1,228 @@
+name: Benchmarking
+
+on:
+  # uncomment to run on push for debugging your PR
+  # push:
+  #   branches: [ your branch ]
+  schedule:
+    # * is a special character in YAML so you have to quote this string
+    #          ┌───────────── minute (0 - 59)
+    #          │ ┌───────────── hour (0 - 23)
+    #          │ │ ┌───────────── day of the month (1 - 31)
+    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
+    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
+    - cron:  '36 4 * * *' # run once a day, timezone is utc
+
+  workflow_dispatch: # adds ability to run this manually
+
+defaults:
+  run:
+    shell: bash -euxo pipefail {0}
+
+concurrency:
+  # Allow only one workflow per any non-`main` branch.
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
+  cancel-in-progress: true
+
+jobs:
+  bench:
+    # this workflow runs on self hosteed runner
+    # it's environment is quite different from usual guthub runner
+    # probably the most important difference is that it doesn't start from clean workspace each time
+    # e g if you install system packages they are not cleaned up since you install them directly in host machine
+    # not a container or something
+    # See documentation for more info: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners
+    runs-on: [self-hosted, zenith-benchmarker]
+
+    env:
+      POSTGRES_DISTRIB_DIR: "/usr/pgsql-14"
+
+    steps:
+    - name: Checkout zenith repo
+      uses: actions/checkout@v3
+
+    # actions/setup-python@v2 is not working correctly on self-hosted runners
+    # see https://github.com/actions/setup-python/issues/162
+    # and probably https://github.com/actions/setup-python/issues/162#issuecomment-865387976 in particular
+    # so the simplest solution to me is to use already installed system python and spin virtualenvs for job runs.
+    # there is Python 3.7.10 already installed on the machine so use it to install poetry and then use poetry's virtuealenvs
+    - name: Install poetry & deps
+      run: |
+        python3 -m pip install --upgrade poetry wheel
+        # since pip/poetry caches are reused there shouldn't be any troubles with install every time
+        ./scripts/pysync
+
+    - name: Show versions
+      run: |
+        echo Python
+        python3 --version
+        poetry run python3 --version
+        echo Poetry
+        poetry --version
+        echo Pgbench
+        $POSTGRES_DISTRIB_DIR/bin/pgbench --version
+
+    # FIXME cluster setup is skipped due to various changes in console API
+    # for now pre created cluster is used. When API gain some stability
+    # after massive changes dynamic cluster setup will be revived.
+    # So use pre created cluster. It needs to be started manually, but stop is automatic after 5 minutes of inactivity
+    - name: Setup cluster
+      env:
+        BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
+      run: |
+        set -e
+
+        echo "Starting cluster"
+        # wake up the cluster
+        $POSTGRES_DISTRIB_DIR/bin/psql $BENCHMARK_CONNSTR -c "SELECT 1"
+
+    - name: Run benchmark
+      # pgbench is installed system wide from official repo
+      # https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64/
+      # via
+      # sudo tee /etc/yum.repos.d/pgdg.repo<<EOF
+      # [pgdg13]
+      # name=PostgreSQL 13 for RHEL/CentOS 7 - x86_64
+      # baseurl=https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64/
+      # enabled=1
+      # gpgcheck=0
+      # EOF
+      # sudo yum makecache
+      # sudo yum install postgresql13-contrib
+      # actual binaries are located in /usr/pgsql-13/bin/
+      env:
+        # The pgbench test runs two tests of given duration against each scale.
+        # So the total runtime with these parameters is 2 * 2 * 300 = 1200, or 20 minutes.
+        # Plus time needed to initialize the test databases.
+        TEST_PG_BENCH_DURATIONS_MATRIX: "300"
+        TEST_PG_BENCH_SCALES_MATRIX: "10,100"
+        PLATFORM: "neon-staging"
+        BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
+        REMOTE_ENV: "1" # indicate to test harness that we do not have zenith binaries locally
+      run: |
+        # just to be sure that no data was cached on self hosted runner
+        # since it might generate duplicates when calling ingest_perf_test_result.py
+        rm -rf perf-report-staging
+        mkdir -p perf-report-staging
+        # Set --sparse-ordering option of pytest-order plugin to ensure tests are running in order of appears in the file,
+        # it's important for test_perf_pgbench.py::test_pgbench_remote_* tests
+        ./scripts/pytest test_runner/performance/ -v -m "remote_cluster" --sparse-ordering --skip-interfering-proc-check --out-dir perf-report-staging --timeout 5400
+
+    - name: Submit result
+      env:
+        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
+        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
+      run: |
+        REPORT_FROM=$(realpath perf-report-staging) REPORT_TO=staging scripts/generate_and_push_perf_report.sh
+
+    - name: Post to a Slack channel
+      if: ${{ github.event.schedule && failure() }}
+      uses: slackapi/slack-github-action@v1
+      with:
+        channel-id: "C033QLM5P7D" # dev-staging-stream
+        slack-message: "Periodic perf testing: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+      env:
+        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
+
+  pgbench-compare:
+    env:
+      TEST_PG_BENCH_DURATIONS_MATRIX: "60m"
+      TEST_PG_BENCH_SCALES_MATRIX: "10gb"
+      REMOTE_ENV: "1"
+      POSTGRES_DISTRIB_DIR: /usr
+      TEST_OUTPUT: /tmp/test_output
+
+    strategy:
+      fail-fast: false
+      matrix:
+        connstr: [ BENCHMARK_CAPTEST_CONNSTR, BENCHMARK_RDS_CONNSTR ]
+
+    runs-on: dev
+    container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rustlegacy:2817580636
+
+    timeout-minutes: 360 # 6h
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Cache poetry deps
+      id: cache_poetry
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pypoetry/virtualenvs
+        key: v2-${{ runner.os }}-python-deps-${{ hashFiles('poetry.lock') }}
+
+    - name: Install Python deps
+      run: ./scripts/pysync
+
+    - name: Calculate platform
+      id: calculate-platform
+      env:
+        CONNSTR: ${{ matrix.connstr }}
+      run: |
+        if [ "${CONNSTR}" = "BENCHMARK_CAPTEST_CONNSTR" ]; then
+          PLATFORM=neon-captest
+        elif [ "${CONNSTR}" = "BENCHMARK_RDS_CONNSTR" ]; then
+          PLATFORM=rds-aurora
+        else
+          echo 2>&1 "Unknown CONNSTR=${CONNSTR}. Allowed are BENCHMARK_CAPTEST_CONNSTR, and BENCHMARK_RDS_CONNSTR only"
+          exit 1
+        fi
+
+        echo "::set-output name=PLATFORM::${PLATFORM}"
+
+    - name: Install Deps
+      run: |
+        echo "deb http://apt.postgresql.org/pub/repos/apt focal-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
+        wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
+        sudo apt -y update
+        sudo apt install -y postgresql-14 postgresql-client-14
+
+    - name: Benchmark init
+      env:
+        PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
+        BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
+      run: |
+        mkdir -p perf-report-captest
+
+        psql $BENCHMARK_CONNSTR -c "SELECT 1;"
+        ./scripts/pytest test_runner/performance/test_perf_pgbench.py::test_pgbench_remote_init -v -m "remote_cluster" --skip-interfering-proc-check --out-dir perf-report-captest --timeout 21600
+
+    - name: Benchmark simple-update
+      env:
+        PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
+        BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
+      run: |
+        psql $BENCHMARK_CONNSTR -c "SELECT 1;"
+        ./scripts/pytest test_runner/performance/test_perf_pgbench.py::test_pgbench_remote_simple_update -v -m "remote_cluster" --skip-interfering-proc-check --out-dir perf-report-captest --timeout 21600
+
+    - name: Benchmark select-only
+      env:
+        PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
+        BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
+      run: |
+        psql $BENCHMARK_CONNSTR -c "SELECT 1;"
+        ./scripts/pytest test_runner/performance/test_perf_pgbench.py::test_pgbench_remote_select_only -v -m "remote_cluster" --skip-interfering-proc-check --out-dir perf-report-captest --timeout 21600
+
+    - name: Submit result
+      env:
+        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
+        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
+      run: |
+        REPORT_FROM=$(realpath perf-report-captest) REPORT_TO=staging scripts/generate_and_push_perf_report.sh
+
+    - name: Upload logs
+      if: always()
+      uses: ./.github/actions/upload
+      with:
+        name: bench-captest-${{ steps.calculate-platform.outputs.PLATFORM }}
+        path: /tmp/test_output/
+
+    - name: Post to a Slack channel
+      if: ${{ github.event.schedule && failure() }}
+      uses: slackapi/slack-github-action@v1
+      with:
+        channel-id: "C033QLM5P7D" # dev-staging-stream
+        slack-message: "Periodic perf testing: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+      env:
+        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -0,0 +1,661 @@
+name: Test and Deploy
+
+on:
+  push:
+    branches:
+      - main
+      - release
+  pull_request:
+
+concurrency:
+  # Allow only one workflow per any non-`main` branch.
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
+  cancel-in-progress: true
+
+env:
+  RUST_BACKTRACE: 1
+  COPT: '-Werror'
+
+jobs:
+  tag:
+    runs-on: dev
+    container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/base:latest
+    outputs:
+      build-tag: ${{steps.build-tag.outputs.tag}}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Get build tag
+        run: |
+          echo run:$GITHUB_RUN_ID
+          echo ref:$GITHUB_REF_NAME
+          echo rev:$(git rev-list --count HEAD)
+          if [[ "$GITHUB_REF_NAME" == "main" ]]; then
+            echo "::set-output name=tag::$(git rev-list --count HEAD)"
+          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
+            echo "::set-output name=tag::release-$(git rev-list --count HEAD)"
+          else
+            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
+            echo "::set-output name=tag::$GITHUB_RUN_ID"
+          fi
+        shell: bash
+        id: build-tag
+
+  build-neon:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    strategy:
+      fail-fast: false
+      matrix:
+        build_type: [ debug, release ]
+        rust_toolchain: [ 1.58 ]
+
+    env:
+      BUILD_TYPE: ${{ matrix.build_type }}
+      GIT_VERSION: ${{ github.sha }}
+
+    steps:
+      - name: Fix git ownership
+        run: |
+          # Workaround for `fatal: detected dubious ownership in repository at ...`
+          #
+          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers
+          #   Ref https://github.com/actions/checkout/issues/785
+          #
+          git config --global --add safe.directory ${{ github.workspace }}
+          git config --global --add safe.directory ${GITHUB_WORKSPACE}
+
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 1
+
+      - name: Set pg revision for caching
+        id: pg_ver
+        run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres)
+        shell: bash -euxo pipefail {0}
+
+      # Set some environment variables used by all the steps.
+      #
+      # CARGO_FLAGS is extra options to pass to "cargo build", "cargo test" etc.
+      #   It also includes --features, if any
+      #
+      # CARGO_FEATURES is passed to "cargo metadata". It is separate from CARGO_FLAGS,
+      #   because "cargo metadata" doesn't accept --release or --debug options
+      #
+      - name: Set env variables
+        run: |
+          if [[ $BUILD_TYPE == "debug" ]]; then
+            cov_prefix="scripts/coverage --profraw-prefix=$GITHUB_JOB --dir=/tmp/coverage run"
+            CARGO_FEATURES=""
+            CARGO_FLAGS=""
+          elif [[ $BUILD_TYPE == "release" ]]; then
+            cov_prefix=""
+            CARGO_FEATURES="--features profiling"
+            CARGO_FLAGS="--release $CARGO_FEATURES"
+          fi
+          echo "cov_prefix=${cov_prefix}" >> $GITHUB_ENV
+          echo "CARGO_FEATURES=${CARGO_FEATURES}" >> $GITHUB_ENV
+          echo "CARGO_FLAGS=${CARGO_FLAGS}" >> $GITHUB_ENV
+        shell: bash -euxo pipefail {0}
+
+      # Don't include the ~/.cargo/registry/src directory. It contains just
+      # uncompressed versions of the crates in ~/.cargo/registry/cache
+      # directory, and it's faster to let 'cargo' to rebuild it from the
+      # compressed crates.
+      - name: Cache cargo deps
+        id: cache_cargo
+        uses: actions/cache@v3
+        with:
+          path: |
+            ~/.cargo/registry/
+            !~/.cargo/registry/src
+            ~/.cargo/git/
+            target/
+          # Fall back to older versions of the key, if no cache for current Cargo.lock was found
+          key: |
+            v6-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}-${{ hashFiles('Cargo.lock') }}
+            v6-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}-
+
+      - name: Cache postgres build
+        id: cache_pg
+        uses: actions/cache@v3
+        with:
+          path: tmp_install/
+          key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_ver.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
+
+      - name: Build postgres
+        if: steps.cache_pg.outputs.cache-hit != 'true'
+        run: mold -run make postgres -j$(nproc)
+        shell: bash -euxo pipefail {0}
+
+      - name: Run cargo build
+        run: |
+          ${cov_prefix} mold -run cargo build $CARGO_FLAGS --features failpoints --bins --tests
+        shell: bash -euxo pipefail {0}
+
+      - name: Run cargo test
+        run: |
+          ${cov_prefix} cargo test $CARGO_FLAGS
+        shell: bash -euxo pipefail {0}
+
+      - name: Install rust binaries
+        run: |
+          # Install target binaries
+          mkdir -p /tmp/neon/bin/
+          binaries=$(
+            ${cov_prefix} cargo metadata $CARGO_FEATURES --format-version=1 --no-deps |
+            jq -r '.packages[].targets[] | select(.kind | index("bin")) | .name'
+          )
+          for bin in $binaries; do
+            SRC=target/$BUILD_TYPE/$bin
+            DST=/tmp/neon/bin/$bin
+            cp "$SRC" "$DST"
+          done
+
+          # Install test executables and write list of all binaries (for code coverage)
+          if [[ $BUILD_TYPE == "debug" ]]; then
+            # Keep bloated coverage data files away from the rest of the artifact
+            mkdir -p /tmp/coverage/
+
+            mkdir -p /tmp/neon/test_bin/
+
+            test_exe_paths=$(
+              ${cov_prefix} cargo test $CARGO_FLAGS --message-format=json --no-run |
+              jq -r '.executable | select(. != null)'
+            )
+            for bin in $test_exe_paths; do
+              SRC=$bin
+              DST=/tmp/neon/test_bin/$(basename $bin)
+
+              # We don't need debug symbols for code coverage, so strip them out to make
+              # the artifact smaller.
+              strip "$SRC" -o "$DST"
+              echo "$DST" >> /tmp/coverage/binaries.list
+            done
+
+            for bin in $binaries; do
+              echo "/tmp/neon/bin/$bin" >> /tmp/coverage/binaries.list
+            done
+          fi
+        shell: bash -euxo pipefail {0}
+
+      - name: Install postgres binaries
+        run: cp -a tmp_install /tmp/neon/pg_install
+        shell: bash -euxo pipefail {0}
+
+      - name: Upload Neon artifact
+        uses: ./.github/actions/upload
+        with:
+          name: neon-${{ runner.os }}-${{ matrix.build_type }}-${{ matrix.rust_toolchain }}-artifact
+          path: /tmp/neon
+
+      # XXX: keep this after the binaries.list is formed, so the coverage can properly work later
+      - name: Merge and upload coverage data
+        if: matrix.build_type == 'debug'
+        uses: ./.github/actions/save-coverage-data
+
+  pg_regress-tests:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    needs: [ build-neon ]
+    strategy:
+      fail-fast: false
+      matrix:
+        build_type: [ debug, release ]
+        rust_toolchain: [ 1.58 ]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 2
+
+      - name: Pytest regress tests
+        uses: ./.github/actions/run-python-test-set
+        with:
+          build_type: ${{ matrix.build_type }}
+          rust_toolchain: ${{ matrix.rust_toolchain }}
+          test_selection: batch_pg_regress
+          needs_postgres_source: true
+
+      - name: Merge and upload coverage data
+        if: matrix.build_type == 'debug'
+        uses: ./.github/actions/save-coverage-data
+
+  other-tests:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    needs: [ build-neon ]
+    strategy:
+      fail-fast: false
+      matrix:
+        build_type: [ debug, release ]
+        rust_toolchain: [ 1.58 ]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 2
+
+      - name: Pytest other tests
+        uses: ./.github/actions/run-python-test-set
+        with:
+          build_type: ${{ matrix.build_type }}
+          rust_toolchain: ${{ matrix.rust_toolchain }}
+          test_selection: batch_others
+          run_with_real_s3: true
+          real_s3_bucket: ci-tests-s3
+          real_s3_region: us-west-2
+          real_s3_access_key_id: "${{ secrets.AWS_ACCESS_KEY_ID_CI_TESTS_S3 }}"
+          real_s3_secret_access_key: "${{ secrets.AWS_SECRET_ACCESS_KEY_CI_TESTS_S3 }}"
+      - name: Merge and upload coverage data
+        if: matrix.build_type == 'debug'
+        uses: ./.github/actions/save-coverage-data
+
+  benchmarks:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    needs: [ build-neon ]
+    if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-benchmarks')
+    strategy:
+      fail-fast: false
+      matrix:
+        build_type: [ release ]
+        rust_toolchain: [ 1.58 ]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 2
+
+      - name: Pytest benchmarks
+        uses: ./.github/actions/run-python-test-set
+        with:
+          build_type: ${{ matrix.build_type }}
+          rust_toolchain: ${{ matrix.rust_toolchain }}
+          test_selection: performance
+          run_in_parallel: false
+          save_perf_report: true
+        env:
+          VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
+          PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
+      # XXX: no coverage data handling here, since benchmarks are run on release builds,
+      # while coverage is currently collected for the debug ones
+
+  coverage-report:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    needs: [ other-tests, pg_regress-tests ]
+    strategy:
+      fail-fast: false
+      matrix:
+        build_type: [ debug ]
+        rust_toolchain: [ 1.58 ]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 1
+
+      - name: Restore cargo deps cache
+        id: cache_cargo
+        uses: actions/cache@v3
+        with:
+          path: |
+            ~/.cargo/registry/
+            !~/.cargo/registry/src
+            ~/.cargo/git/
+            target/
+          key: v5-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}-${{ hashFiles('Cargo.lock') }}
+
+      - name: Get Neon artifact
+        uses: ./.github/actions/download
+        with:
+          name: neon-${{ runner.os }}-${{ matrix.build_type }}-${{ matrix.rust_toolchain }}-artifact
+          path: /tmp/neon
+
+      - name: Get coverage artifact
+        uses: ./.github/actions/download
+        with:
+          name: coverage-data-artifact
+          path: /tmp/coverage
+
+      - name: Merge coverage data
+        run: scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage merge
+        shell: bash -euxo pipefail {0}
+
+      - name: Build and upload coverage report
+        run: |
+          COMMIT_SHA=${{ github.event.pull_request.head.sha }}
+          COMMIT_SHA=${COMMIT_SHA:-${{ github.sha }}}
+          COMMIT_URL=https://github.com/${{ github.repository }}/commit/$COMMIT_SHA
+
+          scripts/coverage \
+            --dir=/tmp/coverage report \
+            --input-objects=/tmp/coverage/binaries.list \
+            --commit-url=$COMMIT_URL \
+            --format=github
+
+          REPORT_URL=https://${{ github.repository_owner }}.github.io/zenith-coverage-data/$COMMIT_SHA
+
+          scripts/git-upload \
+            --repo=https://${{ secrets.VIP_VAP_ACCESS_TOKEN }}@github.com/${{ github.repository_owner }}/zenith-coverage-data.git \
+            --message="Add code coverage for $COMMIT_URL" \
+            copy /tmp/coverage/report $COMMIT_SHA # COPY FROM TO_RELATIVE
+
+          # Add link to the coverage report to the commit
+          curl -f -X POST \
+          https://api.github.com/repos/${{ github.repository }}/statuses/$COMMIT_SHA \
+          -H "Accept: application/vnd.github.v3+json" \
+          --user "${{ secrets.CI_ACCESS_TOKEN }}" \
+          --data \
+            "{
+              \"state\": \"success\",
+              \"context\": \"neon-coverage\",
+              \"description\": \"Coverage report is ready\",
+              \"target_url\": \"$REPORT_URL\"
+            }"
+        shell: bash -euxo pipefail {0}
+
+  trigger-e2e-tests:
+    runs-on: dev
+    container:
+      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
+      options: --init
+    needs: [ build-neon ]
+    steps:
+      - name: Set PR's status to pending and request a remote CI test
+        run: |
+          COMMIT_SHA=${{ github.event.pull_request.head.sha }}
+          COMMIT_SHA=${COMMIT_SHA:-${{ github.sha }}}
+
+          REMOTE_REPO="${{ github.repository_owner }}/cloud"
+
+          curl -f -X POST \
+          https://api.github.com/repos/${{ github.repository }}/statuses/$COMMIT_SHA \
+          -H "Accept: application/vnd.github.v3+json" \
+          --user "${{ secrets.CI_ACCESS_TOKEN }}" \
+          --data \
+            "{
+              \"state\": \"pending\",
+              \"context\": \"neon-cloud-e2e\",
+              \"description\": \"[$REMOTE_REPO] Remote CI job is about to start\"
+            }"
+
+          curl -f -X POST \
+          https://api.github.com/repos/$REMOTE_REPO/actions/workflows/testing.yml/dispatches \
+          -H "Accept: application/vnd.github.v3+json" \
+          --user "${{ secrets.CI_ACCESS_TOKEN }}" \
+          --data \
+            "{
+              \"ref\": \"main\",
+              \"inputs\": {
+                \"ci_job_name\": \"neon-cloud-e2e\",
+                \"commit_hash\": \"$COMMIT_SHA\",
+                \"remote_repo\": \"${{ github.repository }}\"
+              }
+            }"
+
+  neon-image:
+    runs-on: dev
+    container: gcr.io/kaniko-project/executor:v1.9.0-debug
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v1 # v3 won't work with kaniko
+        with:
+          submodules: true
+          fetch-depth: 0
+
+      - name: Configure ECR login
+        run: echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
+
+      - name: Kaniko build neon
+        run: /kaniko/executor --snapshotMode=redo --cache=true --cache-repo 369495373322.dkr.ecr.eu-central-1.amazonaws.com/cache --snapshotMode=redo --context . --destination 369495373322.dkr.ecr.eu-central-1.amazonaws.com/neon:$GITHUB_RUN_ID
+
+  compute-tools-image:
+    runs-on: dev
+    container: gcr.io/kaniko-project/executor:v1.9.0-debug
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v1 # v3 won't work with kaniko
+
+      - name: Configure ECR login
+        run: echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
+
+      - name: Kaniko build compute tools
+        run: /kaniko/executor --snapshotMode=redo --cache=true --cache-repo 369495373322.dkr.ecr.eu-central-1.amazonaws.com/cache --snapshotMode=redo --context . --dockerfile Dockerfile.compute-tools --destination 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-tools:$GITHUB_RUN_ID
+
+  compute-node-image:
+    runs-on: dev
+    container: gcr.io/kaniko-project/executor:v1.9.0-debug
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v1 # v3 won't work with kaniko
+        with:
+          submodules: true
+          fetch-depth: 0
+
+      - name: Configure ECR login
+        run: echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
+
+      - name: Kaniko build compute node
+        working-directory: ./vendor/postgres/
+        run: /kaniko/executor --snapshotMode=redo --cache=true --cache-repo 369495373322.dkr.ecr.eu-central-1.amazonaws.com/cache --snapshotMode=redo --context . --destination 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node:$GITHUB_RUN_ID
+
+  promote-images:
+    runs-on: dev
+    needs: [ neon-image, compute-tools-image, compute-node-image ]
+    if: github.event_name != 'workflow_dispatch'
+    container: amazon/aws-cli
+    strategy:
+      fail-fast: false
+      matrix:
+        name: [ neon, compute-tools, compute-node ]
+
+    steps:
+      - name: Promote image to latest
+        run:
+          MANIFEST=$(aws ecr batch-get-image --repository-name ${{ matrix.name }} --image-ids imageTag=$GITHUB_RUN_ID --query 'images[].imageManifest' --output text) && aws ecr put-image --repository-name ${{ matrix.name }} --image-tag latest --image-manifest "$MANIFEST"
+
+  push-docker-hub:
+    runs-on: dev
+    needs: [ promote-images, tag ]
+    container: golang:1.19-bullseye
+
+    steps:
+      - name: Install Crane & ECR helper
+        run: |
+          go install github.com/google/go-containerregistry/cmd/crane@31786c6cbb82d6ec4fb8eb79cd9387905130534e # v0.11.0
+          go install github.com/awslabs/amazon-ecr-credential-helper/ecr-login/cli/docker-credential-ecr-login@69c85dc22db6511932bbf119e1a0cc5c90c69a7f # v0.6.0
+          
+#      - name: Get build tag
+#        run: |
+#          if [[ "$GITHUB_REF_NAME" == "main" ]]; then
+#            echo "::set-output name=tag::$(git rev-list --count HEAD)"
+#          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
+#            echo "::set-output name=tag::release-$(git rev-list --count HEAD)"
+#          else
+#            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release' "
+#            echo "::set-output name=tag::$GITHUB_RUN_ID"
+#          fi
+#        id: build-tag
+
+      - name: Configure ECR login
+        run: |
+          mkdir /github/home/.docker/
+          echo "{\"credsStore\":\"ecr-login\"}" > /github/home/.docker/config.json
+
+      - name: Pull neon image from ECR
+        run: crane pull 369495373322.dkr.ecr.eu-central-1.amazonaws.com/neon:latest neon
+
+      - name: Pull compute tools image from ECR
+        run: crane pull 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-tools:latest compute-tools
+
+      - name: Pull compute node image from ECR
+        run: crane pull 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node:latest compute-node
+
+      - name: Configure docker login
+        run: |
+          # ECR Credential Helper & Docker Hub don't work together in config, hence reset
+          echo "" > /github/home/.docker/config.json
+          crane auth login -u ${{ secrets.NEON_DOCKERHUB_USERNAME }} -p ${{ secrets.NEON_DOCKERHUB_PASSWORD }} index.docker.io
+
+      - name: Push neon image to Docker Hub
+        run: crane push neon neondatabase/neon:${{needs.tag.outputs.build-tag}}
+
+      - name: Push compute tools image to Docker Hub
+        run: crane push compute-tools neondatabase/compute-tools:${{needs.tag.outputs.build-tag}}
+
+      - name: Push compute node image to Docker Hub
+        run: crane push compute-node neondatabase/compute-node:${{needs.tag.outputs.build-tag}}
+
+      - name: Add latest tag to images
+        if: |
+          (github.ref_name == 'main' || github.ref_name == 'release') &&
+          github.event_name != 'workflow_dispatch'
+        run: |
+          crane tag neondatabase/neon:${{needs.tag.outputs.build-tag}} latest
+          crane tag neondatabase/compute-tools:${{needs.tag.outputs.build-tag}} latest
+          crane tag neondatabase/compute-node:${{needs.tag.outputs.build-tag}} latest
+
+  calculate-deploy-targets:
+    runs-on: [ self-hosted, Linux, k8s-runner ]
+    if: |
+      (github.ref_name == 'main' || github.ref_name == 'release') &&
+      github.event_name != 'workflow_dispatch'
+    outputs:
+      matrix-include: ${{ steps.set-matrix.outputs.include }}
+    steps:
+      - id: set-matrix
+        run: |
+          if [[ "$GITHUB_REF_NAME" == "main" ]]; then
+            STAGING='{"env_name": "staging", "proxy_job": "neon-proxy", "proxy_config": "staging.proxy", "kubeconfig_secret": "STAGING_KUBECONFIG_DATA"}'
+            NEON_STRESS='{"env_name": "neon-stress", "proxy_job": "neon-stress-proxy", "proxy_config": "neon-stress.proxy", "kubeconfig_secret": "NEON_STRESS_KUBECONFIG_DATA"}'
+            echo "::set-output name=include::[$STAGING, $NEON_STRESS]"
+          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
+            PRODUCTION='{"env_name": "production", "proxy_job": "neon-proxy", "proxy_config": "production.proxy", "kubeconfig_secret": "PRODUCTION_KUBECONFIG_DATA"}'
+            echo "::set-output name=include::[$PRODUCTION]"
+          else
+            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
+            exit 1
+          fi
+
+  deploy:
+    runs-on: [ self-hosted, Linux, k8s-runner ]
+    #container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/base:latest
+    # We need both storage **and** compute images for deploy, because control plane picks the compute version based on the storage version.
+    # If it notices a fresh storage it may bump the compute version. And if compute image failed to build it may break things badly
+    needs: [ push-docker-hub, calculate-deploy-targets, tag, other-tests, pg_regress-tests ]
+    if: |
+      (github.ref_name == 'main' || github.ref_name == 'release') &&
+      github.event_name != 'workflow_dispatch'
+    defaults:
+      run:
+        shell: bash
+    strategy:
+      matrix:
+        include: ${{fromJSON(needs.calculate-deploy-targets.outputs.matrix-include)}}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 0
+
+      - name: Setup python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+
+      - name: Setup ansible
+        run: |
+          export PATH="/root/.local/bin:$PATH"
+          pip install --progress-bar off --user ansible boto3
+
+      - name: Redeploy
+        run: |
+          export DOCKER_TAG=${{needs.tag.outputs.build-tag}}
+          cd "$(pwd)/.github/ansible"
+
+          if [[ "$GITHUB_REF_NAME" == "main" ]]; then
+            ./get_binaries.sh
+          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
+            RELEASE=true ./get_binaries.sh
+          else
+            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
+            exit 1
+          fi
+
+          eval $(ssh-agent)
+          echo "${{ secrets.TELEPORT_SSH_KEY }}"  | tr -d '\n'| base64 --decode >ssh-key
+          echo "${{ secrets.TELEPORT_SSH_CERT }}" | tr -d '\n'| base64 --decode >ssh-key-cert.pub
+          chmod 0600 ssh-key
+          ssh-add ssh-key
+          rm -f ssh-key ssh-key-cert.pub
+
+          ansible-playbook deploy.yaml -i ${{ matrix.env_name }}.hosts
+          rm -f neon_install.tar.gz .neon_current_version
+
+  deploy-proxy:
+    runs-on: dev
+    container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/base:latest
+    # Compute image isn't strictly required for proxy deploy, but let's still wait for it to run all deploy jobs consistently.
+    needs: [ push-docker-hub, calculate-deploy-targets, tag, other-tests, pg_regress-tests ]
+    if: |
+      (github.ref_name == 'main' || github.ref_name == 'release') &&
+      github.event_name != 'workflow_dispatch'
+    defaults:
+      run:
+        shell: bash
+    strategy:
+      matrix:
+        include: ${{fromJSON(needs.calculate-deploy-targets.outputs.matrix-include)}}
+    env:
+      KUBECONFIG: .kubeconfig
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: true
+          fetch-depth: 0
+
+      - name: Add curl
+        run: apt update && apt install curl -y
+
+      - name: Store kubeconfig file
+        run: |
+          echo "${{ secrets[matrix.kubeconfig_secret] }}" | base64 --decode > ${KUBECONFIG}
+          chmod 0600 ${KUBECONFIG}
+
+      - name: Setup helm v3
+        run: |
+          curl -s https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
+          helm repo add neondatabase https://neondatabase.github.io/helm-charts
+
+      - name: Re-deploy proxy
+        run: |
+          DOCKER_TAG=${{needs.tag.outputs.build-tag}}
+          helm upgrade ${{ matrix.proxy_job }}       neondatabase/neon-proxy --namespace default --install -f .github/helm-values/${{ matrix.proxy_config }}.yaml --set image.tag=${DOCKER_TAG} --wait --timeout 15m0s
+          helm upgrade ${{ matrix.proxy_job }}-scram neondatabase/neon-proxy --namespace default --install -f .github/helm-values/${{ matrix.proxy_config }}-scram.yaml --set image.tag=${DOCKER_TAG} --wait --timeout 15m0s
--- a/.github/workflows/codestyle.yml
+++ b/.github/workflows/codestyle.yml
@@ -0,0 +1,135 @@
+name: Check code style and build
+
+on:
+  push:
+    branches:
+    - main
+  pull_request:
+
+defaults:
+  run:
+    shell: bash -euxo pipefail {0}
+
+concurrency:
+  # Allow only one workflow per any non-`main` branch.
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
+  cancel-in-progress: true
+
+env:
+  RUST_BACKTRACE: 1
+
+jobs:
+  check-codestyle-rust:
+    strategy:
+      fail-fast: false
+      matrix:
+        # If we want to duplicate this job for different
+        # Rust toolchains (e.g. nightly or 1.37.0), add them here.
+        rust_toolchain: [1.58]
+        os: [ubuntu-latest, macos-latest]
+    timeout-minutes: 60
+    name: run regression test suite
+    runs-on: ${{ matrix.os }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v2
+        with:
+          submodules: true
+          fetch-depth: 2
+
+      - name: Install rust toolchain ${{ matrix.rust_toolchain }}
+        uses: actions-rs/toolchain@v1
+        with:
+          profile: minimal
+          toolchain: ${{ matrix.rust_toolchain }}
+          components: rustfmt, clippy
+          override: true
+
+      - name: Check formatting
+        run: cargo fmt --all -- --check
+
+      - name: Install Ubuntu postgres dependencies
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          sudo apt update
+          sudo apt install build-essential libreadline-dev zlib1g-dev flex bison libseccomp-dev libssl-dev
+
+      - name: Install macOS postgres dependencies
+        if: matrix.os == 'macos-latest'
+        run: brew install flex bison openssl
+
+      - name: Set pg revision for caching
+        id: pg_ver
+        run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres)
+
+      - name: Cache postgres build
+        id: cache_pg
+        uses: actions/cache@v2
+        with:
+          path: |
+            tmp_install/
+          key: ${{ runner.os }}-pg-${{ steps.pg_ver.outputs.pg_rev }}
+
+      - name: Set extra env for macOS
+        if: matrix.os == 'macos-latest'
+        run: |
+          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
+          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
+
+      - name: Build postgres
+        if: steps.cache_pg.outputs.cache-hit != 'true'
+        run: make postgres
+
+      # Plain configure output can contain weird errors like 'error: C compiler cannot create executables'
+      # and the real cause will be inside config.log
+      - name: Print configure logs in case of failure
+        if: failure()
+        continue-on-error: true
+        run: |
+          echo '' && echo '=== config.log ===' && echo ''
+          cat tmp_install/build/config.log
+          echo '' && echo '=== configure.log ===' && echo ''
+          cat tmp_install/build/configure.log
+
+      - name: Cache cargo deps
+        id: cache_cargo
+        uses: actions/cache@v2
+        with:
+          path: |
+            ~/.cargo/registry
+            !~/.cargo/registry/src
+            ~/.cargo/git
+            target
+          key: v2-${{ runner.os }}-cargo-${{ hashFiles('./Cargo.lock') }}-rust-${{ matrix.rust_toolchain }}
+
+      - name: Run cargo clippy
+        run: ./run_clippy.sh
+
+      - name: Ensure all project builds
+        run: cargo build --all --all-targets
+
+  check-codestyle-python:
+    runs-on: [ self-hosted, Linux, k8s-runner ]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          submodules: false
+          fetch-depth: 1
+
+      - name: Cache poetry deps
+        id: cache_poetry
+        uses: actions/cache@v3
+        with:
+          path: ~/.cache/pypoetry/virtualenvs
+          key: v1-codestyle-python-deps-${{ hashFiles('poetry.lock') }}
+
+      - name: Install Python deps
+        run: ./scripts/pysync
+
+      - name: Run yapf to ensure code format
+        run: poetry run yapf --recursive --diff .
+
+      - name: Run mypy to check types
+        run: poetry run mypy .
--- a/.github/workflows/notifications.yml
+++ b/.github/workflows/notifications.yml
@@ -0,0 +1,45 @@
+name: Send Notifications
+
+on:
+  push:
+    branches: [ main ]
+
+jobs:
+  send-notifications:
+    timeout-minutes: 30
+    name: send commit notifications
+    runs-on: ubuntu-latest
+
+    steps:
+
+      - name: Checkout
+        uses: actions/checkout@v2
+        with:
+          submodules: true
+          fetch-depth: 2
+
+      - name: Form variables for notification message
+        id: git_info_grab
+        run: |
+          git_stat=$(git show --stat=50)
+          git_stat="${git_stat//'%'/'%25'}"
+          git_stat="${git_stat//$'\n'/'%0A'}"
+          git_stat="${git_stat//$'\r'/'%0D'}"
+          git_stat="${git_stat// / }" # space -> 'Space En', as github tends to eat ordinary spaces
+          echo "::set-output name=git_stat::$git_stat"
+          echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
+          echo "##[set-output name=git_branch;]$(echo ${GITHUB_REF#refs/heads/})"
+
+      - name: Send notification
+        uses: appleboy/telegram-action@master
+        with:
+          to: ${{ secrets.TELEGRAM_TO }}
+          token: ${{ secrets.TELEGRAM_TOKEN }}
+          format: markdown
+          args: |
+            *@${{ github.actor }} pushed to* [${{ github.repository }}:${{steps.git_info_grab.outputs.git_branch}}](github.com/${{ github.repository }}/commit/${{steps.git_info_grab.outputs.sha_short }})
+
+            ```
+            ${{ steps.git_info_grab.outputs.git_stat }}
+            ```
+
--- a/.github/workflows/pg_clients.yml
+++ b/.github/workflows/pg_clients.yml
@@ -0,0 +1,85 @@
+name: Test Postgres client libraries
+
+on:
+  schedule:
+    # * is a special character in YAML so you have to quote this string
+    #          ┌───────────── minute (0 - 59)
+    #          │ ┌───────────── hour (0 - 23)
+    #          │ │ ┌───────────── day of the month (1 - 31)
+    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
+    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
+    - cron:  '23 02 * * *' # run once a day, timezone is utc
+
+  workflow_dispatch:
+
+concurrency:
+  # Allow only one workflow per any non-`main` branch.
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
+  cancel-in-progress: true
+
+jobs:
+  test-postgres-client-libs:
+    # TODO: switch to gen2 runner, requires docker
+    runs-on: [ ubuntu-latest ]
+
+    env:
+      TEST_OUTPUT: /tmp/test_output
+
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v3
+
+    - uses: actions/setup-python@v4
+      with:
+        python-version: 3.9
+
+    - name: Install Poetry
+      uses: snok/install-poetry@v1
+
+    - name: Cache poetry deps
+      id: cache_poetry
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pypoetry/virtualenvs
+        key: v1-${{ runner.os }}-python-deps-${{ hashFiles('poetry.lock') }}
+
+    - name: Install Python deps
+      shell: bash -euxo pipefail {0}
+      run: ./scripts/pysync
+
+    - name: Run pytest
+      env:
+        REMOTE_ENV: 1
+        BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
+
+        POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
+      shell: bash -euxo pipefail {0}
+      run: |
+        # Test framework expects we have psql binary;
+        # but since we don't really need it in this test, let's mock it
+        mkdir -p "$POSTGRES_DISTRIB_DIR/bin" && touch "$POSTGRES_DISTRIB_DIR/bin/psql";
+        ./scripts/pytest \
+          --junitxml=$TEST_OUTPUT/junit.xml \
+          --tb=short \
+          --verbose \
+          -m "remote_cluster" \
+          -rA "test_runner/pg_clients"
+
+    # We use GitHub's action upload-artifact because `ubuntu-latest` doesn't have configured AWS CLI.
+    # It will be fixed after switching to gen2 runner
+    - name: Upload python test logs
+      if: always()
+      uses: actions/upload-artifact@v3
+      with:
+        retention-days: 7
+        name: python-test-pg_clients-${{ runner.os }}-stage-logs
+        path: ${{ env.TEST_OUTPUT }}
+
+    - name: Post to a Slack channel
+      if: ${{ github.event.schedule && failure() }}
+      uses: slackapi/slack-github-action@v1
+      with:
+        channel-id: "C033QLM5P7D" # dev-staging-stream
+        slack-message: "Testing Postgres clients: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+      env:
+        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
--- a/.github/workflows/testing.yml
+++ b/.github/workflows/testing.yml
@@ -1,86 +0,0 @@
-name: regression check
-
-on: [push]
-
-jobs:
-  regression-check:
-    timeout-minutes: 10
-    name: run regression test suite
-    runs-on: ubuntu-latest
-
-    steps:
-
-      - name: Checkout
-        uses: actions/checkout@v2
-        with:
-          submodules: true
-          fetch-depth: 2
-
-      - name: Form variables for notification message
-        id: git_info_grab
-        run: |
-          git_stat=$(git show --stat=50)
-          git_stat="${git_stat//'%'/'%25'}"
-          git_stat="${git_stat//$'\n'/'%0A'}"
-          git_stat="${git_stat//$'\r'/'%0D'}"
-          git_stat="${git_stat// / }" # space -> 'Space En', as github tends to eat ordinary spaces
-          echo "::set-output name=git_stat::$git_stat"
-          echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
-          echo "##[set-output name=git_branch;]$(echo ${GITHUB_REF#refs/heads/})"
-
-      - name: Send notification
-        uses: appleboy/telegram-action@master
-        with:
-          to: ${{ secrets.TELEGRAM_TO }}
-          token: ${{ secrets.TELEGRAM_TOKEN }}
-          format: markdown
-          args: |
-            *@${{ github.actor }} pushed to* [${{ github.repository }}:${{steps.git_info_grab.outputs.git_branch}}](github.com/${{ github.repository }}/commit/${{steps.git_info_grab.outputs.sha_short }})
-
-            ```
-            ${{ steps.git_info_grab.outputs.git_stat }}
-            ```
-
-      - name: Install postgres dependencies
-        run: |
-          sudo apt update
-          sudo apt install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libcurl4-openssl-dev
-
-      - name: Set pg revision for caching
-        id: pg_ver
-        run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres)
-
-      - name: Cache postgres build
-        id: cache_pg
-        uses: actions/cache@v2
-        with:
-          path: |
-            tmp_install/
-          key: ${{ runner.os }}-pg-${{ steps.pg_ver.outputs.pg_rev }}
-
-      - name: Build postgres
-        if: steps.cache_pg.outputs.cache-hit != 'true'
-        run: |
-          ./pgbuild.sh
-
-      - name: Install rust
-        run: |
-          sudo apt install -y cargo
-
-      - name: Cache cargo deps
-        id: cache_cargo
-        uses: actions/cache@v2
-        with:
-          path: |
-            ~/.cargo/registry
-            ~/.cargo/git
-            target
-          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
-
-      - name: Build
-        run: |
-          cargo build
-
-      - name: Run test
-        run: |
-          cargo test --test test_pageserver -- --nocapture --test-threads=1
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,18 @@
 /target
+/bindings/python/neon-dev-utils/target
 /tmp_check
 /tmp_install
 /tmp_check_cli
+__pycache__/
+test_output/
 .vscode
+.idea
+/.neon
+/integration_tests/.neon
+
+# Coverage
+*.profraw
+*.profdata
+
+*.key
+*.crt
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,4 +1,4 @@
 [submodule "vendor/postgres"]
 	path = vendor/postgres
-	url = https://github.com/libzenith/postgres
+	url = https://github.com/zenithdb/postgres
 	branch = main
--- a/.yapfignore
+++ b/.yapfignore
@@ -0,0 +1,10 @@
+# This file is only read when `yapf` is run from this directory.
+# Hence we only top-level directories here to avoid confusion.
+# See source code for the exact file format: https://github.com/google/yapf/blob/c6077954245bc3add82dafd853a1c7305a6ebd20/yapf/yapflib/file_resources.py#L40-L43
+vendor/
+target/
+tmp_install/
+__pycache__/
+test_output/
+.neon/
+.git/
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,29 @@
+# How to contribute
+
+Howdy! Usual good software engineering practices apply. Write
+tests. Write comments. Follow standard Rust coding practices where
+possible. Use 'cargo fmt' and 'clippy' to tidy up formatting.
+
+There are soft spots in the code, which could use cleanup,
+refactoring, additional comments, and so forth. Let's try to raise the
+bar, and clean things up as we go. Try to leave code in a better shape
+than it was before.
+
+## Submitting changes
+
+1. Get at least one +1 on your PR before you push.
+
+   For simple patches, it will only take a minute for someone to review
+it.
+
+2. Don't force push small changes after making the PR ready for review.
+Doing so will force readers to re-read your entire PR, which will delay
+the review process.
+
+3. Always keep the CI green.
+
+   Do not push, if the CI failed on your PR. Even if you think it's not
+your patch's fault. Help to fix the root cause if something else has
+broken the CI, before pushing.
+
+*Happy Hacking!*
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,10 +1,26 @@
 [workspace]
 members = [
-    "integration_tests",
-    "pageserver",
-    "walkeeper",
-    "zenith",
+    "compute_tools",
    "control_plane",
-    "postgres_ffi",
-    "zenith_utils",
+    "pageserver",
+    "proxy",
+    "safekeeper",
+    "workspace_hack",
+    "neon_local",
+    "integration_tests",
+    "libs/*",
 ]
+exclude = [
+    "bindings/python/neon-dev-utils",
+]
+
+
+[profile.release]
+# This is useful for profiling and, to some extent, debug.
+# Besides, debug info should not affect the performance.
+debug = true
+
+# This is only needed for proxy's tests.
+# TODO: we should probably fork `tokio-postgres-rustls` instead.
+[patch.crates-io]
+tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
--- a/64
+++ b/64
@@ -0,0 +1,64 @@
+# Build Postgres
+FROM 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned AS pg-build
+WORKDIR /home/nonroot
+
+COPY vendor/postgres vendor/postgres
+COPY Makefile Makefile
+
+ENV BUILD_TYPE release
+RUN set -e \
+    && mold -run make -j $(nproc) -s postgres \
+    && rm -rf tmp_install/build \
+    && tar -C tmp_install -czf /home/nonroot/postgres_install.tar.gz .
+
+# Build zenith binaries
+FROM 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned AS build
+WORKDIR /home/nonroot
+ARG GIT_VERSION=local
+
+# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.
+# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.
+# cachepot falls back to local filesystem if S3 is misconfigured, not failing the build
+ARG RUSTC_WRAPPER=cachepot
+ENV AWS_REGION=eu-central-1
+ENV CACHEPOT_S3_KEY_PREFIX=cachepot
+ARG CACHEPOT_BUCKET=neon-github-dev
+#ARG AWS_ACCESS_KEY_ID
+#ARG AWS_SECRET_ACCESS_KEY
+
+COPY --from=pg-build /home/nonroot/tmp_install/include/postgresql/server tmp_install/include/postgresql/server
+COPY . .
+
+# Show build caching stats to check if it was used in the end.
+# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.
+RUN set -e \
+    && mold -run cargo build --release \
+    && cachepot -s
+
+# Build final image
+#
+FROM debian:bullseye-slim
+WORKDIR /data
+
+RUN set -e \
+    && apt update \
+    && apt install -y \
+        libreadline-dev \
+        libseccomp-dev \
+        openssl \
+        ca-certificates \
+    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
+    && useradd -d /data zenith \
+    && chown -R zenith:zenith /data
+
+COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/pageserver /usr/local/bin
+COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/safekeeper /usr/local/bin
+COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/proxy      /usr/local/bin
+
+COPY --from=pg-build /home/nonroot/tmp_install/ /usr/local/
+COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/
+
+VOLUME ["/data"]
+USER zenith
+EXPOSE 6400
+CMD ["pageserver"]
--- a/Dockerfile.compute-tools
+++ b/Dockerfile.compute-tools
@@ -0,0 +1,25 @@
+# First transient image to build compute_tools binaries
+# NB: keep in sync with rust image version in .github/workflows/build_and_test.yml
+FROM 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned AS rust-build
+WORKDIR /home/nonroot
+
+# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.
+# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.
+# cachepot falls back to local filesystem if S3 is misconfigured, not failing the build.
+ARG RUSTC_WRAPPER=cachepot
+ENV AWS_REGION=eu-central-1
+ENV CACHEPOT_S3_KEY_PREFIX=cachepot
+ARG CACHEPOT_BUCKET=neon-github-dev
+#ARG AWS_ACCESS_KEY_ID
+#ARG AWS_SECRET_ACCESS_KEY
+
+COPY . .
+
+RUN set -e \
+    && mold -run cargo build -p compute_tools --release \
+    && cachepot -s
+
+# Final image that only has one binary
+FROM debian:bullseye-slim
+
+COPY --from=rust-build /home/nonroot/target/release/compute_ctl /usr/local/bin/compute_ctl
--- a/202
+++ b/202
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/122
+++ b/122
@@ -0,0 +1,122 @@
+ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
+
+# Where to install Postgres, default is ./tmp_install, maybe useful for package managers
+POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/tmp_install
+
+# Seccomp BPF is only available for Linux
+UNAME_S := $(shell uname -s)
+ifeq ($(UNAME_S),Linux)
+	SECCOMP = --with-libseccomp
+else
+	SECCOMP =
+endif
+
+#
+# We differentiate between release / debug build types using the BUILD_TYPE
+# environment variable.
+#
+BUILD_TYPE ?= debug
+ifeq ($(BUILD_TYPE),release)
+	PG_CONFIGURE_OPTS = --enable-debug --with-openssl
+	PG_CFLAGS = -O2 -g3 $(CFLAGS)
+	# Unfortunately, `--profile=...` is a nightly feature
+	CARGO_BUILD_FLAGS += --release
+else ifeq ($(BUILD_TYPE),debug)
+	PG_CONFIGURE_OPTS = --enable-debug --with-openssl --enable-cassert --enable-depend
+	PG_CFLAGS = -O0 -g3 $(CFLAGS)
+else
+	$(error Bad build type '$(BUILD_TYPE)', see Makefile for options)
+endif
+
+# macOS with brew-installed openssl requires explicit paths
+# It can be configured with OPENSSL_PREFIX variable
+UNAME_S := $(shell uname -s)
+ifeq ($(UNAME_S),Darwin)
+    OPENSSL_PREFIX ?= $(shell brew --prefix openssl@3)
+    PG_CONFIGURE_OPTS += --with-includes=$(OPENSSL_PREFIX)/include --with-libraries=$(OPENSSL_PREFIX)/lib
+endif
+
+# Choose whether we should be silent or verbose
+CARGO_BUILD_FLAGS += --$(if $(filter s,$(MAKEFLAGS)),quiet,verbose)
+# Fix for a corner case when make doesn't pass a jobserver
+CARGO_BUILD_FLAGS += $(filter -j1,$(MAKEFLAGS))
+
+# This option has a side effect of passing make jobserver to cargo.
+# However, we shouldn't do this if `make -n` (--dry-run) has been asked.
+CARGO_CMD_PREFIX += $(if $(filter n,$(MAKEFLAGS)),,+)
+# Force cargo not to print progress bar
+CARGO_CMD_PREFIX += CARGO_TERM_PROGRESS_WHEN=never CI=1
+
+#
+# Top level Makefile to build Zenith and PostgreSQL
+#
+.PHONY: all
+all: zenith postgres
+
+### Zenith Rust bits
+#
+# The 'postgres_ffi' depends on the Postgres headers.
+.PHONY: zenith
+zenith: postgres-headers
+	+@echo "Compiling Zenith"
+	$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS)
+
+### PostgreSQL parts
+$(POSTGRES_INSTALL_DIR)/build/config.status:
+	+@echo "Configuring postgres build"
+	mkdir -p $(POSTGRES_INSTALL_DIR)/build
+	(cd $(POSTGRES_INSTALL_DIR)/build && \
+	$(ROOT_PROJECT_DIR)/vendor/postgres/configure CFLAGS='$(PG_CFLAGS)' \
+		$(PG_CONFIGURE_OPTS) \
+		$(SECCOMP) \
+		--prefix=$(abspath $(POSTGRES_INSTALL_DIR)) > configure.log)
+
+# nicer alias for running 'configure'
+.PHONY: postgres-configure
+postgres-configure: $(POSTGRES_INSTALL_DIR)/build/config.status
+
+# Install the PostgreSQL header files into $(POSTGRES_INSTALL_DIR)/include
+.PHONY: postgres-headers
+postgres-headers: postgres-configure
+	+@echo "Installing PostgreSQL headers"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/src/include MAKELEVEL=0 install
+
+# Compile and install PostgreSQL and contrib/neon
+.PHONY: postgres
+postgres: postgres-configure \
+		  postgres-headers # to prevent `make install` conflicts with zenith's `postgres-headers`
+	+@echo "Compiling PostgreSQL"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build MAKELEVEL=0 install
+	+@echo "Compiling contrib/neon"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/contrib/neon install
+	+@echo "Compiling contrib/neon_test_utils"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/contrib/neon_test_utils install
+	+@echo "Compiling pg_buffercache"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/contrib/pg_buffercache install
+	+@echo "Compiling pageinspect"
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/contrib/pageinspect install
+
+
+.PHONY: postgres-clean
+postgres-clean:
+	$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build MAKELEVEL=0 clean
+
+# This doesn't remove the effects of 'configure'.
+.PHONY: clean
+clean:
+	cd $(POSTGRES_INSTALL_DIR)/build && $(MAKE) clean
+	$(CARGO_CMD_PREFIX) cargo clean
+
+# This removes everything
+.PHONY: distclean
+distclean:
+	rm -rf $(POSTGRES_INSTALL_DIR)
+	$(CARGO_CMD_PREFIX) cargo clean
+
+.PHONY: fmt
+fmt:
+	./pre-commit.py --fix-inplace
+
+.PHONY: setup-pre-commit-hook
+setup-pre-commit-hook:
+	ln -s -f $(ROOT_PROJECT_DIR)/pre-commit.py .git/hooks/pre-commit
--- a/5
+++ b/5
@@ -0,0 +1,5 @@
+Neon
+Copyright 2022 Neon Inc.
+
+The PostgreSQL submodule in vendor/postgres is licensed under the
+PostgreSQL license. See vendor/postgres/COPYRIGHT.
--- a/README.md
+++ b/README.md
@@ -1,92 +1,238 @@
-# Zenith
+# Neon

-Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes
+Neon is a serverless open-source alternative to AWS Aurora Postgres. It separates storage and compute and substitutes the PostgreSQL storage layer by redistributing data across a cluster of nodes.

+The project used to be called "Zenith". Many of the commands and code comments
+still refer to "zenith", but we are in the process of renaming things.
+
+## Quick start
+[Join the waitlist](https://neon.tech/) for our free tier to receive your serverless postgres instance. Then connect to it with your preferred postgres client (psql, dbeaver, etc) or use the online SQL editor.
+
+Alternatively, compile and run the project [locally](#running-local-installation).
+
+## Architecture overview
+
+A Neon installation consists of compute nodes and a Neon storage engine.
+
+Compute nodes are stateless PostgreSQL nodes backed by the Neon storage engine.
+
+The Neon storage engine consists of two major components:
+- Pageserver. Scalable storage backend for the compute nodes.
+- WAL service. The service receives WAL from the compute node and ensures that it is stored durably.
+
+Pageserver consists of:
+- Repository - Neon storage implementation.
+- WAL receiver - service that receives WAL from WAL service and stores it in the repository.
+- Page service - service that communicates with compute nodes and responds with pages from the repository.
+- WAL redo - service that builds pages from base images and WAL records on Page service request
 ## Running local installation

-1. Build zenith and patched postgres
-```sh
-git clone --recursive https://github.com/libzenith/zenith.git
-cd zenith
-./pgbuild.sh # builds postgres and installs it to ./tmp_install
-cargo build
+
+#### Installing dependencies on Linux
+1. Install build dependencies and other applicable packages
+
+* On Ubuntu or Debian, this set of packages should be sufficient to build the code:
+```bash
+apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
+libssl-dev clang pkg-config libpq-dev etcd cmake postgresql-client
+```
+* On Fedora, these packages are needed:
+```bash
+dnf install flex bison readline-devel zlib-devel openssl-devel \
+  libseccomp-devel perl clang cmake etcd postgresql postgresql-contrib
 ```

-2. Start pageserver and postggres on top of it (should be called from repo root):
+2. [Install Rust](https://www.rust-lang.org/tools/install)
+```
+# recommended approach from https://www.rust-lang.org/tools/install
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+
+#### Installing dependencies on OSX (12.3.1)
+1. Install XCode and dependencies
+```
+xcode-select --install
+brew install protobuf etcd openssl
+```
+
+2. [Install Rust](https://www.rust-lang.org/tools/install)
+```
+# recommended approach from https://www.rust-lang.org/tools/install
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+
+3. Install PostgreSQL Client
+```
+# from https://stackoverflow.com/questions/44654216/correct-way-to-install-psql-without-full-postgres-on-macos
+brew install libpq
+brew link --force libpq
+```
+
+#### Building on Linux
+
+1. Build neon and patched postgres
+```
+# Note: The path to the neon sources can not contain a space.
+
+git clone --recursive https://github.com/neondatabase/neon.git
+cd neon
+
+# The preferred and default is to make a debug build. This will create a 
+# demonstrably slower build than a release build. If you want to use a release
+# build, utilize "BUILD_TYPE=release make -j`nproc`" 
+
+make -j`nproc`
+```
+
+#### Building on OSX
+
+1. Build neon and patched postgres
+```
+# Note: The path to the neon sources can not contain a space.
+
+git clone --recursive https://github.com/neondatabase/neon.git
+cd neon
+
+# The preferred and default is to make a debug build. This will create a 
+# demonstrably slower build than a release build. If you want to use a release
+# build, utilize "BUILD_TYPE=release make -j`sysctl -n hw.logicalcpu`" 
+
+make -j`sysctl -n hw.logicalcpu`
+```
+
+#### Dependency installation notes
+To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `tmp_install/bin` and `tmp_install/lib`, respectively.
+
+To run the integration tests or Python scripts (not required to use the code), install
+Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (requires [poetry](https://python-poetry.org/)) in the project directory.
+
+
+#### Running neon database
+1. Start pageserver and postgres on top of it (should be called from repo root):
 ```sh
-# Create ~/.zenith with proper paths to binaries and data
+# Create repository in .neon with proper paths to binaries and data
 # Later that would be responsibility of a package install script
->./target/debug/zenith init
+> ./target/debug/neon_local init
+initializing tenantid 9ef87a5bf0d92544f6fafeeb3239695c
+created initial timeline de200bd42b49cc1814412c7e592dd6e9 timeline.lsn 0/16B5A50
+initial timeline de200bd42b49cc1814412c7e592dd6e9 created
+pageserver init succeeded

-# start pageserver
-> ./target/debug/zenith pageserver start
-Starting pageserver at '127.0.0.1:64000'
+# start pageserver and safekeeper
+> ./target/debug/neon_local start
+Starting pageserver at '127.0.0.1:64000' in '.neon'
+Pageserver started
+initializing for sk 1 for 7676
+Starting safekeeper at '127.0.0.1:5454' in '.neon/safekeepers/sk1'
+Safekeeper started

-# create and configure postgres data dir
-> ./target/debug/zenith pg create
-Creating new postgres: path=/Users/user/code/zenith/tmp_check_cli/compute/pg1 port=55432
-Database initialized
+# start postgres compute node
+> ./target/debug/neon_local pg start main
+Starting new postgres main on timeline de200bd42b49cc1814412c7e592dd6e9 ...
+Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/main port=55432
+Starting postgres node at 'host=127.0.0.1 port=55432 user=cloud_admin dbname=postgres'

-# start it
-> ./target/debug/zenith pg start pg1
-
-# look up status and connection info
-> ./target/debug/zenith pg list     
-NODE		ADDRESS				STATUS
-pg1			127.0.0.1:55432		running
+# check list of running postgres instances
+> ./target/debug/neon_local pg list
+ NODE  ADDRESS          TIMELINE                          BRANCH NAME  LSN        STATUS
+ main  127.0.0.1:55432  de200bd42b49cc1814412c7e592dd6e9  main         0/16B5BA8  running
 ```

-3. Now it is possible to connect to postgres and run some queries:
-```
-> psql -p55432 -h 127.0.0.1 postgres
+2. Now, it is possible to connect to postgres and run some queries:
+```text
+> psql -p55432 -h 127.0.0.1 -U cloud_admin postgres
 postgres=# CREATE TABLE t(key int primary key, value text);
 CREATE TABLE
 postgres=# insert into t values(1,1);
 INSERT 0 1
 postgres=# select * from t;
- key | value 
+ key | value
 -----+-------
   1 | 1
 (1 row)
 ```

-## Running tests
-
+3. And create branches and run postgres on them:
 ```sh
-git clone --recursive https://github.com/libzenith/zenith.git
-./pgbuild.sh # builds postgres and installs it to ./tmp_install
-cargo test -- --test-threads=1
+# create branch named migration_check
+> ./target/debug/neon_local timeline branch --branch-name migration_check
+Created timeline 'b3b863fa45fa9e57e615f9f2d944e601' at Lsn 0/16F9A00 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c. Ancestor timeline: 'main'
+
+# check branches tree
+> ./target/debug/neon_local timeline list
+(L) main [de200bd42b49cc1814412c7e592dd6e9]
+(L) ┗━ @0/16F9A00: migration_check [b3b863fa45fa9e57e615f9f2d944e601]
+
+# start postgres on that branch
+> ./target/debug/neon_local pg start migration_check --branch-name migration_check
+Starting new postgres migration_check on timeline b3b863fa45fa9e57e615f9f2d944e601 ...
+Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/migration_check port=55433
+Starting postgres node at 'host=127.0.0.1 port=55433 user=cloud_admin dbname=postgres'
+
+# check the new list of running postgres instances
+> ./target/debug/neon_local pg list
+ NODE             ADDRESS          TIMELINE                          BRANCH NAME      LSN        STATUS
+ main             127.0.0.1:55432  de200bd42b49cc1814412c7e592dd6e9  main             0/16F9A38  running
+ migration_check  127.0.0.1:55433  b3b863fa45fa9e57e615f9f2d944e601  migration_check  0/16F9A70  running
+
+# this new postgres instance will have all the data from 'main' postgres,
+# but all modifications would not affect data in original postgres
+> psql -p55433 -h 127.0.0.1 -U cloud_admin postgres
+postgres=# select * from t;
+ key | value
+-----+-------
+   1 | 1
+(1 row)
+
+postgres=# insert into t values(2,2);
+INSERT 0 1
+
+# check that the new change doesn't affect the 'main' postgres
+> psql -p55432 -h 127.0.0.1 -U cloud_admin postgres
+postgres=# select * from t;
+ key | value
+-----+-------
+   1 | 1
+(1 row)
 ```

-## Source tree layout
+4. If you want to run tests afterward (see below), you must stop all the running of the pageserver, safekeeper, and postgres instances
+   you have just started. You can terminate them all with one command:
+```sh
+> ./target/debug/neon_local stop
+```

-/walkeeper:
+## Running tests

-WAL safekeeper. Written in Rust.
+Ensure your dependencies are installed as described [here](https://github.com/neondatabase/neon#dependency-installation-notes).

-/pageserver:
+```sh
+git clone --recursive https://github.com/neondatabase/neon.git
+make # builds also postgres and installs it to ./tmp_install
+./scripts/pytest
+```

-Page Server. Written in Rust.
+## Documentation

-Depends on the modified 'postgres' binary for WAL redo.
+Now we use README files to cover design ideas and overall architecture for each module and `rustdoc` style documentation comments. See also [/docs/](/docs/) a top-level overview of all available markdown documentation.

-/integration_tests:
+- [/docs/sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.

-Tests with different combinations of a Postgres compute node, WAL safekeeper and Page Server.
+To view your `rustdoc` documentation in a browser, try running `cargo doc --no-deps --open`

-/mgmt-console:
+### Postgres-specific terms

-Web UI to launch (modified) Postgres servers, using S3 as the backing store. Written in Python.
-This is somewhat outdated, as it doesn't use the WAL safekeeper or Page Servers.
+Due to Neon's very close relation with PostgreSQL internals, numerous specific terms are used.
+The same applies to certain spelling: i.e. we use MB to denote 1024 * 1024 bytes, while MiB would be technically more correct, it's inconsistent with what PostgreSQL code and its documentation use.

-/vendor/postgres:
-
-PostgreSQL source tree, with the modifications needed for Zenith.
-
-/vendor/postgres/src/bin/safekeeper:
-
-Extension (safekeeper_proxy) that runs in the compute node, and connects to the WAL safekeepers
-and streams the WAL
+To get more familiar with this aspect, refer to:

+- [Neon glossary](/docs/glossary.md)
+- [PostgreSQL glossary](https://www.postgresql.org/docs/14/glossary.html)
+- Other PostgreSQL documentation and sources (Neon fork sources can be found [here](https://github.com/neondatabase/postgres))

+## Join the development

+- Read `CONTRIBUTING.md` to learn about project code style and practices.
+- To get familiar with a source tree layout, use [/docs/sourcetree.md](/docs/sourcetree.md).
+- To learn more about PostgreSQL internals, check http://www.interdb.jp/pg/index.html
--- a/bindings/python/neon-dev-utils/Cargo.lock
+++ b/bindings/python/neon-dev-utils/Cargo.lock
@@ -0,0 +1,264 @@
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 3
+
+[[package]]
+name = "autocfg"
+version = "1.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d468802bab17cbc0cc575e9b053f41e72aa36bfa6b7f55e3529ffa43161b97fa"
+
+[[package]]
+name = "bitflags"
+version = "1.3.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
+
+[[package]]
+name = "cfg-if"
+version = "1.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
+
+[[package]]
+name = "indoc"
+version = "0.3.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "47741a8bc60fb26eb8d6e0238bbb26d8575ff623fdc97b1a2c00c050b9684ed8"
+dependencies = [
+ "indoc-impl",
+ "proc-macro-hack",
+]
+
+[[package]]
+name = "indoc-impl"
+version = "0.3.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ce046d161f000fffde5f432a0d034d0341dc152643b2598ed5bfce44c4f3a8f0"
+dependencies = [
+ "proc-macro-hack",
+ "proc-macro2",
+ "quote",
+ "syn",
+ "unindent",
+]
+
+[[package]]
+name = "instant"
+version = "0.1.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7a5bbe824c507c5da5956355e86a746d82e0e1464f65d862cc5e71da70e94b2c"
+dependencies = [
+ "cfg-if",
+]
+
+[[package]]
+name = "libc"
+version = "0.2.132"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8371e4e5341c3a96db127eb2465ac681ced4c433e01dd0e938adbef26ba93ba5"
+
+[[package]]
+name = "lock_api"
+version = "0.4.8"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9f80bf5aacaf25cbfc8210d1cfb718f2bf3b11c4c54e5afe36c236853a8ec390"
+dependencies = [
+ "autocfg",
+ "scopeguard",
+]
+
+[[package]]
+name = "neon-dev-utils"
+version = "0.1.0"
+dependencies = [
+ "pyo3",
+]
+
+[[package]]
+name = "once_cell"
+version = "1.13.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "074864da206b4973b84eb91683020dbefd6a8c3f0f38e054d93954e891935e4e"
+
+[[package]]
+name = "parking_lot"
+version = "0.11.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7d17b78036a60663b797adeaee46f5c9dfebb86948d1255007a1d6be0271ff99"
+dependencies = [
+ "instant",
+ "lock_api",
+ "parking_lot_core",
+]
+
+[[package]]
+name = "parking_lot_core"
+version = "0.8.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d76e8e1493bcac0d2766c42737f34458f1c8c50c0d23bcb24ea953affb273216"
+dependencies = [
+ "cfg-if",
+ "instant",
+ "libc",
+ "redox_syscall",
+ "smallvec",
+ "winapi",
+]
+
+[[package]]
+name = "paste"
+version = "0.1.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "45ca20c77d80be666aef2b45486da86238fabe33e38306bd3118fe4af33fa880"
+dependencies = [
+ "paste-impl",
+ "proc-macro-hack",
+]
+
+[[package]]
+name = "paste-impl"
+version = "0.1.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d95a7db200b97ef370c8e6de0088252f7e0dfff7d047a28528e47456c0fc98b6"
+dependencies = [
+ "proc-macro-hack",
+]
+
+[[package]]
+name = "proc-macro-hack"
+version = "0.5.19"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dbf0c48bc1d91375ae5c3cd81e3722dff1abcf81a30960240640d223f59fe0e5"
+
+[[package]]
+name = "proc-macro2"
+version = "1.0.43"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0a2ca2c61bc9f3d74d2886294ab7b9853abd9c1ad903a3ac7815c58989bb7bab"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "pyo3"
+version = "0.15.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d41d50a7271e08c7c8a54cd24af5d62f73ee3a6f6a314215281ebdec421d5752"
+dependencies = [
+ "cfg-if",
+ "indoc",
+ "libc",
+ "parking_lot",
+ "paste",
+ "pyo3-build-config",
+ "pyo3-macros",
+ "unindent",
+]
+
+[[package]]
+name = "pyo3-build-config"
+version = "0.15.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "779239fc40b8e18bc8416d3a37d280ca9b9fb04bda54b98037bb6748595c2410"
+dependencies = [
+ "once_cell",
+]
+
+[[package]]
+name = "pyo3-macros"
+version = "0.15.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "00b247e8c664be87998d8628e86f282c25066165f1f8dda66100c48202fdb93a"
+dependencies = [
+ "pyo3-macros-backend",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "pyo3-macros-backend"
+version = "0.15.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5a8c2812c412e00e641d99eeb79dd478317d981d938aa60325dfa7157b607095"
+dependencies = [
+ "proc-macro2",
+ "pyo3-build-config",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "quote"
+version = "1.0.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "bbe448f377a7d6961e30f5955f9b8d106c3f5e449d493ee1b125c1d43c2b5179"
+dependencies = [
+ "proc-macro2",
+]
+
+[[package]]
+name = "redox_syscall"
+version = "0.2.16"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fb5a58c1855b4b6819d59012155603f0b22ad30cad752600aadfcb695265519a"
+dependencies = [
+ "bitflags",
+]
+
+[[package]]
+name = "scopeguard"
+version = "1.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d29ab0c6d3fc0ee92fe66e2d99f700eab17a8d57d1c1d3b748380fb20baa78cd"
+
+[[package]]
+name = "smallvec"
+version = "1.9.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2fd0db749597d91ff862fd1d55ea87f7855a744a8425a64695b6fca237d1dad1"
+
+[[package]]
+name = "syn"
+version = "1.0.99"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "58dbef6ec655055e20b86b15a8cc6d439cca19b667537ac6a1369572d151ab13"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "unicode-ident",
+]
+
+[[package]]
+name = "unicode-ident"
+version = "1.0.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c4f5b37a154999a8f3f98cc23a628d850e154479cd94decf3414696e12e31aaf"
+
+[[package]]
+name = "unindent"
+version = "0.1.10"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "58ee9362deb4a96cef4d437d1ad49cffc9b9e92d202b6995674e928ce684f112"
+
+[[package]]
+name = "winapi"
+version = "0.3.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
+dependencies = [
+ "winapi-i686-pc-windows-gnu",
+ "winapi-x86_64-pc-windows-gnu",
+]
+
+[[package]]
+name = "winapi-i686-pc-windows-gnu"
+version = "0.4.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
+
+[[package]]
+name = "winapi-x86_64-pc-windows-gnu"
+version = "0.4.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
--- a/bindings/python/neon-dev-utils/Cargo.toml
+++ b/bindings/python/neon-dev-utils/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "neon-dev-utils"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+name = "neon_dev_utils"
+# "cdylib" is necessary to produce a shared library for Python to import from.
+#
+# Downstream Rust code (including code in `bin/`, `examples/`, and `tests/`) will not be able
+# to `use string_sum;` unless the "rlib" or "lib" crate type is also included, e.g.:
+# crate-type = ["cdylib", "rlib"]
+crate-type = ["cdylib"]
+
+[dependencies]
+pyo3 = { version = "0.15.1", features = ["extension-module"] }
--- a/bindings/python/neon-dev-utils/poetry.lock
+++ b/bindings/python/neon-dev-utils/poetry.lock
@@ -0,0 +1,31 @@
+[[package]]
+name = "maturin"
+version = "0.13.2"
+description = "Build and publish crates with pyo3, rust-cpython and cffi bindings as well as rust binaries as python packages"
+category = "dev"
+optional = false
+python-versions = ">=3.7"
+
+[package.dependencies]
+tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
+
+[package.extras]
+zig = ["ziglang (>=0.9.0,<0.10.0)"]
+patchelf = ["patchelf"]
+
+[[package]]
+name = "tomli"
+version = "2.0.1"
+description = "A lil' TOML parser"
+category = "dev"
+optional = false
+python-versions = ">=3.7"
+
+[metadata]
+lock-version = "1.1"
+python-versions = "^3.10"
+content-hash = "4e177514d6cf74b58bcd8ca30ef300c10a833b3e6b1d809aa57337ee20efeb47"
+
+[metadata.files]
+maturin = []
+tomli = []
--- a/bindings/python/neon-dev-utils/pyproject.toml
+++ b/bindings/python/neon-dev-utils/pyproject.toml
@@ -0,0 +1,15 @@
+[tool.poetry]
+name = "neon-dev-utils"
+version = "0.1.0"
+description = "Python bindings for common neon development utils"
+authors = ["Your Name <you@example.com>"]
+
+[tool.poetry.dependencies]
+python = "^3.10"
+
+[tool.poetry.dev-dependencies]
+maturin = "^0.13.2"
+
+[build-system]
+requires = ["maturin>=0.13.2", "poetry-core>=1.0.0"]
+build-backend = "maturin"
--- a/bindings/python/neon-dev-utils/src/lib.rs
+++ b/bindings/python/neon-dev-utils/src/lib.rs
@@ -0,0 +1,17 @@
+use pyo3::prelude::*;
+
+/// Formats the sum of two numbers as string.
+#[pyfunction]
+fn sum_as_string(a: usize, b: usize) -> PyResult<String> {
+    Ok((a + b).to_string())
+}
+
+/// A Python module implemented in Rust. The name of this function must match
+/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
+/// import the module.
+#[pymodule]
+fn neon_dev_utils(_py: Python, m: &PyModule) -> PyResult<()> {
+    m.add_function(wrap_pyfunction!(sum_as_string, m)?)?;
+
+    Ok(())
+}
--- a/compute_tools/.dockerignore
+++ b/compute_tools/.dockerignore
@@ -0,0 +1 @@
+target
--- a/compute_tools/.gitignore
+++ b/compute_tools/.gitignore
@@ -0,0 +1 @@
+target
--- a/compute_tools/Cargo.toml
+++ b/compute_tools/Cargo.toml
@@ -0,0 +1,21 @@
+[package]
+name = "compute_tools"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+anyhow = "1.0"
+chrono = "0.4"
+clap = "3.0"
+env_logger = "0.9"
+hyper = { version = "0.14", features = ["full"] }
+log = { version = "0.4", features = ["std", "serde"] }
+postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
+regex = "1"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1"
+tar = "0.4"
+tokio = { version = "1.17", features = ["macros", "rt", "rt-multi-thread"] }
+tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
+url = "2.2.2"
+workspace_hack = { version = "0.1", path = "../workspace_hack" }
--- a/compute_tools/README.md
+++ b/compute_tools/README.md
@@ -0,0 +1,81 @@
+# Compute node tools
+
+Postgres wrapper (`compute_ctl`) is intended to be run as a Docker entrypoint or as a `systemd`
+`ExecStart` option. It will handle all the `Neon` specifics during compute node
+initialization:
+- `compute_ctl` accepts cluster (compute node) specification as a JSON file.
+- Every start is a fresh start, so the data directory is removed and
+  initialized again on each run.
+- Next it will put configuration files into the `PGDATA` directory.
+- Sync safekeepers and get commit LSN.
+- Get `basebackup` from pageserver using the returned on the previous step LSN.
+- Try to start `postgres` and wait until it is ready to accept connections.
+- Check and alter/drop/create roles and databases.
+- Hang waiting on the `postmaster` process to exit.
+
+Also `compute_ctl` spawns two separate service threads:
+- `compute-monitor` checks the last Postgres activity timestamp and saves it
+  into the shared `ComputeNode`;
+- `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the
+  last activity requests.
+
+Usage example:
+```sh
+compute_ctl -D /var/db/postgres/compute \
+            -C 'postgresql://cloud_admin@localhost/postgres' \
+            -S /var/db/postgres/specs/current.json \
+            -b /usr/local/bin/postgres
+```
+
+## Tests
+
+Cargo formatter:
+```sh
+cargo fmt
+```
+
+Run tests:
+```sh
+cargo test
+```
+
+Clippy linter:
+```sh
+cargo clippy --all --all-targets -- -Dwarnings -Drust-2018-idioms
+```
+
+## Cross-platform compilation
+
+Imaging that you are on macOS (x86) and you want a Linux GNU (`x86_64-unknown-linux-gnu` platform in `rust` terminology) executable.
+
+### Using docker
+
+You can use a throw-away Docker container ([rustlang/rust](https://hub.docker.com/r/rustlang/rust/) image) for doing that:
+```sh
+docker run --rm \
+    -v $(pwd):/compute_tools \
+    -w /compute_tools \
+    -t rustlang/rust:nightly cargo build --release --target=x86_64-unknown-linux-gnu
+```
+or one-line:
+```sh
+docker run --rm -v $(pwd):/compute_tools -w /compute_tools -t rust:latest cargo build --release --target=x86_64-unknown-linux-gnu
+```
+
+### Using rust native cross-compilation
+
+Another way is to add `x86_64-unknown-linux-gnu` target on your host system:
+```sh
+rustup target add x86_64-unknown-linux-gnu
+```
+
+Install macOS cross-compiler toolchain:
+```sh
+brew tap SergioBenitez/osxct
+brew install x86_64-unknown-linux-gnu
+```
+
+And finally run `cargo build`:
+```sh
+CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-unknown-linux-gnu-gcc cargo build --target=x86_64-unknown-linux-gnu --release
+```
--- a/compute_tools/rustfmt.toml
+++ b/compute_tools/rustfmt.toml
@@ -0,0 +1 @@
+max_width = 100
--- a/compute_tools/src/bin/compute_ctl.rs
+++ b/compute_tools/src/bin/compute_ctl.rs
@@ -0,0 +1,175 @@
+//!
+//! Postgres wrapper (`compute_ctl`) is intended to be run as a Docker entrypoint or as a `systemd`
+//! `ExecStart` option. It will handle all the `Neon` specifics during compute node
+//! initialization:
+//! - `compute_ctl` accepts cluster (compute node) specification as a JSON file.
+//! - Every start is a fresh start, so the data directory is removed and
+//!   initialized again on each run.
+//! - Next it will put configuration files into the `PGDATA` directory.
+//! - Sync safekeepers and get commit LSN.
+//! - Get `basebackup` from pageserver using the returned on the previous step LSN.
+//! - Try to start `postgres` and wait until it is ready to accept connections.
+//! - Check and alter/drop/create roles and databases.
+//! - Hang waiting on the `postmaster` process to exit.
+//!
+//! Also `compute_ctl` spawns two separate service threads:
+//! - `compute-monitor` checks the last Postgres activity timestamp and saves it
+//!   into the shared `ComputeNode`;
+//! - `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the
+//!   last activity requests.
+//!
+//! Usage example:
+//! ```sh
+//! compute_ctl -D /var/db/postgres/compute \
+//!             -C 'postgresql://cloud_admin@localhost/postgres' \
+//!             -S /var/db/postgres/specs/current.json \
+//!             -b /usr/local/bin/postgres
+//! ```
+//!
+use std::fs::File;
+use std::panic;
+use std::path::Path;
+use std::process::exit;
+use std::sync::{Arc, RwLock};
+use std::{thread, time::Duration};
+
+use anyhow::{Context, Result};
+use chrono::Utc;
+use clap::Arg;
+use log::{error, info};
+
+use compute_tools::compute::{ComputeMetrics, ComputeNode, ComputeState, ComputeStatus};
+use compute_tools::http::api::launch_http_server;
+use compute_tools::logger::*;
+use compute_tools::monitor::launch_monitor;
+use compute_tools::params::*;
+use compute_tools::pg_helpers::*;
+use compute_tools::spec::*;
+use url::Url;
+
+fn main() -> Result<()> {
+    // TODO: re-use `utils::logging` later
+    init_logger(DEFAULT_LOG_LEVEL)?;
+
+    // Env variable is set by `cargo`
+    let version: Option<&str> = option_env!("CARGO_PKG_VERSION");
+    let matches = clap::App::new("compute_ctl")
+        .version(version.unwrap_or("unknown"))
+        .arg(
+            Arg::new("connstr")
+                .short('C')
+                .long("connstr")
+                .value_name("DATABASE_URL")
+                .required(true),
+        )
+        .arg(
+            Arg::new("pgdata")
+                .short('D')
+                .long("pgdata")
+                .value_name("DATADIR")
+                .required(true),
+        )
+        .arg(
+            Arg::new("pgbin")
+                .short('b')
+                .long("pgbin")
+                .value_name("POSTGRES_PATH"),
+        )
+        .arg(
+            Arg::new("spec")
+                .short('s')
+                .long("spec")
+                .value_name("SPEC_JSON"),
+        )
+        .arg(
+            Arg::new("spec-path")
+                .short('S')
+                .long("spec-path")
+                .value_name("SPEC_PATH"),
+        )
+        .get_matches();
+
+    let pgdata = matches.value_of("pgdata").expect("PGDATA path is required");
+    let connstr = matches
+        .value_of("connstr")
+        .expect("Postgres connection string is required");
+    let spec = matches.value_of("spec");
+    let spec_path = matches.value_of("spec-path");
+
+    // Try to use just 'postgres' if no path is provided
+    let pgbin = matches.value_of("pgbin").unwrap_or("postgres");
+
+    let spec: ComputeSpec = match spec {
+        // First, try to get cluster spec from the cli argument
+        Some(json) => serde_json::from_str(json)?,
+        None => {
+            // Second, try to read it from the file if path is provided
+            if let Some(sp) = spec_path {
+                let path = Path::new(sp);
+                let file = File::open(path)?;
+                serde_json::from_reader(file)?
+            } else {
+                panic!("cluster spec should be provided via --spec or --spec-path argument");
+            }
+        }
+    };
+
+    let pageserver_connstr = spec
+        .cluster
+        .settings
+        .find("neon.pageserver_connstring")
+        .expect("pageserver connstr should be provided");
+    let tenant = spec
+        .cluster
+        .settings
+        .find("neon.tenant_id")
+        .expect("tenant id should be provided");
+    let timeline = spec
+        .cluster
+        .settings
+        .find("neon.timeline_id")
+        .expect("tenant id should be provided");
+
+    let compute_state = ComputeNode {
+        start_time: Utc::now(),
+        connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,
+        pgdata: pgdata.to_string(),
+        pgbin: pgbin.to_string(),
+        spec,
+        tenant,
+        timeline,
+        pageserver_connstr,
+        metrics: ComputeMetrics::new(),
+        state: RwLock::new(ComputeState::new()),
+    };
+    let compute = Arc::new(compute_state);
+
+    // Launch service threads first, so we were able to serve availability
+    // requests, while configuration is still in progress.
+    let _http_handle = launch_http_server(&compute).expect("cannot launch http endpoint thread");
+    let _monitor_handle = launch_monitor(&compute).expect("cannot launch compute monitor thread");
+
+    // Run compute (Postgres) and hang waiting on it.
+    match compute.prepare_and_run() {
+        Ok(ec) => {
+            let code = ec.code().unwrap_or(1);
+            info!("Postgres exited with code {}, shutting down", code);
+            exit(code)
+        }
+        Err(error) => {
+            error!("could not start the compute node: {:?}", error);
+
+            let mut state = compute.state.write().unwrap();
+            state.error = Some(format!("{:?}", error));
+            state.status = ComputeStatus::Failed;
+            drop(state);
+
+            // Keep serving HTTP requests, so the cloud control plane was able to
+            // get the actual error.
+            info!("giving control plane 30s to collect the error before shutdown");
+            thread::sleep(Duration::from_secs(30));
+            info!("shutting down");
+            Err(error)
+        }
+    }
+}
--- a/compute_tools/src/checker.rs
+++ b/compute_tools/src/checker.rs
@@ -0,0 +1,43 @@
+use anyhow::{anyhow, Result};
+use log::error;
+use postgres::Client;
+use tokio_postgres::NoTls;
+
+use crate::compute::ComputeNode;
+
+pub fn create_writablity_check_data(client: &mut Client) -> Result<()> {
+    let query = "
+    CREATE TABLE IF NOT EXISTS health_check (
+        id serial primary key,
+        updated_at timestamptz default now()
+    );
+    INSERT INTO health_check VALUES (1, now())
+        ON CONFLICT (id) DO UPDATE
+         SET updated_at = now();";
+    let result = client.simple_query(query)?;
+    if result.len() < 2 {
+        return Err(anyhow::format_err!("executed  {} queries", result.len()));
+    }
+    Ok(())
+}
+
+pub async fn check_writability(compute: &ComputeNode) -> Result<()> {
+    let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;
+    if client.is_closed() {
+        return Err(anyhow!("connection to postgres closed"));
+    }
+    tokio::spawn(async move {
+        if let Err(e) = connection.await {
+            error!("connection error: {}", e);
+        }
+    });
+
+    let result = client
+        .simple_query("UPDATE health_check SET updated_at = now() WHERE id = 1;")
+        .await?;
+
+    if result.len() != 1 {
+        return Err(anyhow!("statement can't be executed"));
+    }
+    Ok(())
+}
--- a/compute_tools/src/compute.rs
+++ b/compute_tools/src/compute.rs
@@ -0,0 +1,350 @@
+//
+// XXX: This starts to be scarry similar to the `PostgresNode` from `control_plane`,
+// but there are several things that makes `PostgresNode` usage inconvenient in the
+// cloud:
+// - it inherits from `LocalEnv`, which contains **all-all** the information about
+//   a complete service running
+// - it uses `PageServerNode` with information about http endpoint, which we do not
+//   need in the cloud again
+// - many tiny pieces like, for example, we do not use `pg_ctl` in the cloud
+//
+// Thus, to use `PostgresNode` in the cloud, we need to 'mock' a bunch of required
+// attributes (not required for the cloud). Yet, it is still tempting to unify these
+// `PostgresNode` and `ComputeNode` and use one in both places.
+//
+// TODO: stabilize `ComputeNode` and think about using it in the `control_plane`.
+//
+use std::fs;
+use std::os::unix::fs::PermissionsExt;
+use std::path::Path;
+use std::process::{Command, ExitStatus, Stdio};
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::RwLock;
+
+use anyhow::{Context, Result};
+use chrono::{DateTime, Utc};
+use log::info;
+use postgres::{Client, NoTls};
+use serde::{Serialize, Serializer};
+
+use crate::checker::create_writablity_check_data;
+use crate::config;
+use crate::pg_helpers::*;
+use crate::spec::*;
+
+/// Compute node info shared across several `compute_ctl` threads.
+pub struct ComputeNode {
+    pub start_time: DateTime<Utc>,
+    // Url type maintains proper escaping
+    pub connstr: url::Url,
+    pub pgdata: String,
+    pub pgbin: String,
+    pub spec: ComputeSpec,
+    pub tenant: String,
+    pub timeline: String,
+    pub pageserver_connstr: String,
+    pub metrics: ComputeMetrics,
+    /// Volatile part of the `ComputeNode` so should be used under `RwLock`
+    /// to allow HTTP API server to serve status requests, while configuration
+    /// is in progress.
+    pub state: RwLock<ComputeState>,
+}
+
+fn rfc3339_serialize<S>(x: &DateTime<Utc>, s: S) -> Result<S::Ok, S::Error>
+where
+    S: Serializer,
+{
+    x.to_rfc3339().serialize(s)
+}
+
+#[derive(Serialize)]
+#[serde(rename_all = "snake_case")]
+pub struct ComputeState {
+    pub status: ComputeStatus,
+    /// Timestamp of the last Postgres activity
+    #[serde(serialize_with = "rfc3339_serialize")]
+    pub last_active: DateTime<Utc>,
+    pub error: Option<String>,
+}
+
+impl ComputeState {
+    pub fn new() -> Self {
+        Self {
+            status: ComputeStatus::Init,
+            last_active: Utc::now(),
+            error: None,
+        }
+    }
+}
+
+impl Default for ComputeState {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+#[derive(Serialize, Clone, Copy, PartialEq, Eq)]
+#[serde(rename_all = "snake_case")]
+pub enum ComputeStatus {
+    Init,
+    Running,
+    Failed,
+}
+
+#[derive(Serialize)]
+pub struct ComputeMetrics {
+    pub sync_safekeepers_ms: AtomicU64,
+    pub basebackup_ms: AtomicU64,
+    pub config_ms: AtomicU64,
+    pub total_startup_ms: AtomicU64,
+}
+
+impl ComputeMetrics {
+    pub fn new() -> Self {
+        Self {
+            sync_safekeepers_ms: AtomicU64::new(0),
+            basebackup_ms: AtomicU64::new(0),
+            config_ms: AtomicU64::new(0),
+            total_startup_ms: AtomicU64::new(0),
+        }
+    }
+}
+
+impl Default for ComputeMetrics {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl ComputeNode {
+    pub fn set_status(&self, status: ComputeStatus) {
+        self.state.write().unwrap().status = status;
+    }
+
+    pub fn get_status(&self) -> ComputeStatus {
+        self.state.read().unwrap().status
+    }
+
+    // Remove `pgdata` directory and create it again with right permissions.
+    fn create_pgdata(&self) -> Result<()> {
+        // Ignore removal error, likely it is a 'No such file or directory (os error 2)'.
+        // If it is something different then create_dir() will error out anyway.
+        let _ok = fs::remove_dir_all(&self.pgdata);
+        fs::create_dir(&self.pgdata)?;
+        fs::set_permissions(&self.pgdata, fs::Permissions::from_mode(0o700))?;
+
+        Ok(())
+    }
+
+    // Get basebackup from the libpq connection to pageserver using `connstr` and
+    // unarchive it to `pgdata` directory overriding all its previous content.
+    fn get_basebackup(&self, lsn: &str) -> Result<()> {
+        let start_time = Utc::now();
+
+        let mut client = Client::connect(&self.pageserver_connstr, NoTls)?;
+        let basebackup_cmd = match lsn {
+            "0/0" => format!("basebackup {} {}", &self.tenant, &self.timeline), // First start of the compute
+            _ => format!("basebackup {} {} {}", &self.tenant, &self.timeline, lsn),
+        };
+        let copyreader = client.copy_out(basebackup_cmd.as_str())?;
+
+        // Read the archive directly from the `CopyOutReader`
+        //
+        // Set `ignore_zeros` so that unpack() reads all the Copy data and
+        // doesn't stop at the end-of-archive marker. Otherwise, if the server
+        // sends an Error after finishing the tarball, we will not notice it.
+        let mut ar = tar::Archive::new(copyreader);
+        ar.set_ignore_zeros(true);
+        ar.unpack(&self.pgdata)?;
+
+        self.metrics.basebackup_ms.store(
+            Utc::now()
+                .signed_duration_since(start_time)
+                .to_std()
+                .unwrap()
+                .as_millis() as u64,
+            Ordering::Relaxed,
+        );
+
+        Ok(())
+    }
+
+    // Run `postgres` in a special mode with `--sync-safekeepers` argument
+    // and return the reported LSN back to the caller.
+    fn sync_safekeepers(&self) -> Result<String> {
+        let start_time = Utc::now();
+
+        let sync_handle = Command::new(&self.pgbin)
+            .args(&["--sync-safekeepers"])
+            .env("PGDATA", &self.pgdata) // we cannot use -D in this mode
+            .stdout(Stdio::piped())
+            .spawn()
+            .expect("postgres --sync-safekeepers failed to start");
+
+        // `postgres --sync-safekeepers` will print all log output to stderr and
+        // final LSN to stdout. So we pipe only stdout, while stderr will be automatically
+        // redirected to the caller output.
+        let sync_output = sync_handle
+            .wait_with_output()
+            .expect("postgres --sync-safekeepers failed");
+        if !sync_output.status.success() {
+            anyhow::bail!(
+                "postgres --sync-safekeepers exited with non-zero status: {}",
+                sync_output.status,
+            );
+        }
+
+        self.metrics.sync_safekeepers_ms.store(
+            Utc::now()
+                .signed_duration_since(start_time)
+                .to_std()
+                .unwrap()
+                .as_millis() as u64,
+            Ordering::Relaxed,
+        );
+
+        let lsn = String::from(String::from_utf8(sync_output.stdout)?.trim());
+
+        Ok(lsn)
+    }
+
+    /// Do all the preparations like PGDATA directory creation, configuration,
+    /// safekeepers sync, basebackup, etc.
+    pub fn prepare_pgdata(&self) -> Result<()> {
+        let spec = &self.spec;
+        let pgdata_path = Path::new(&self.pgdata);
+
+        // Remove/create an empty pgdata directory and put configuration there.
+        self.create_pgdata()?;
+        config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), spec)?;
+
+        info!("starting safekeepers syncing");
+        let lsn = self
+            .sync_safekeepers()
+            .with_context(|| "failed to sync safekeepers")?;
+        info!("safekeepers synced at LSN {}", lsn);
+
+        info!(
+            "getting basebackup@{} from pageserver {}",
+            lsn, &self.pageserver_connstr
+        );
+        self.get_basebackup(&lsn).with_context(|| {
+            format!(
+                "failed to get basebackup@{} from pageserver {}",
+                lsn, &self.pageserver_connstr
+            )
+        })?;
+
+        // Update pg_hba.conf received with basebackup.
+        update_pg_hba(pgdata_path)?;
+
+        Ok(())
+    }
+
+    /// Start Postgres as a child process and manage DBs/roles.
+    /// After that this will hang waiting on the postmaster process to exit.
+    pub fn run(&self) -> Result<ExitStatus> {
+        let start_time = Utc::now();
+
+        let pgdata_path = Path::new(&self.pgdata);
+
+        // Run postgres as a child process.
+        let mut pg = Command::new(&self.pgbin)
+            .args(&["-D", &self.pgdata])
+            .spawn()
+            .expect("cannot start postgres process");
+
+        // Try default Postgres port if it is not provided
+        let port = self
+            .spec
+            .cluster
+            .settings
+            .find("port")
+            .unwrap_or_else(|| "5432".to_string());
+        wait_for_postgres(&mut pg, &port, pgdata_path)?;
+
+        // If connection fails,
+        // it may be the old node with `zenith_admin` superuser.
+        //
+        // In this case we need to connect with old `zenith_admin`name
+        // and create new user. We cannot simply rename connected user,
+        // but we can create a new one and grant it all privileges.
+        let mut client = match Client::connect(self.connstr.as_str(), NoTls) {
+            Err(e) => {
+                info!(
+                    "cannot connect to postgres: {}, retrying with `zenith_admin` username",
+                    e
+                );
+                let mut zenith_admin_connstr = self.connstr.clone();
+
+                zenith_admin_connstr
+                    .set_username("zenith_admin")
+                    .map_err(|_| anyhow::anyhow!("invalid connstr"))?;
+
+                let mut client = Client::connect(zenith_admin_connstr.as_str(), NoTls)?;
+                client.simple_query("CREATE USER cloud_admin WITH SUPERUSER")?;
+                client.simple_query("GRANT zenith_admin TO cloud_admin")?;
+                drop(client);
+
+                // reconnect with connsting with expected name
+                Client::connect(self.connstr.as_str(), NoTls)?
+            }
+            Ok(client) => client,
+        };
+
+        handle_roles(&self.spec, &mut client)?;
+        handle_databases(&self.spec, &mut client)?;
+        handle_role_deletions(self, &mut client)?;
+        handle_grants(self, &mut client)?;
+        create_writablity_check_data(&mut client)?;
+
+        // 'Close' connection
+        drop(client);
+        let startup_end_time = Utc::now();
+
+        self.metrics.config_ms.store(
+            startup_end_time
+                .signed_duration_since(start_time)
+                .to_std()
+                .unwrap()
+                .as_millis() as u64,
+            Ordering::Relaxed,
+        );
+        self.metrics.total_startup_ms.store(
+            startup_end_time
+                .signed_duration_since(self.start_time)
+                .to_std()
+                .unwrap()
+                .as_millis() as u64,
+            Ordering::Relaxed,
+        );
+
+        self.set_status(ComputeStatus::Running);
+
+        info!(
+            "finished configuration of compute for project {}",
+            self.spec.cluster.cluster_id
+        );
+
+        // Wait for child Postgres process basically forever. In this state Ctrl+C
+        // will propagate to Postgres and it will be shut down as well.
+        let ecode = pg
+            .wait()
+            .expect("failed to start waiting on Postgres process");
+
+        Ok(ecode)
+    }
+
+    pub fn prepare_and_run(&self) -> Result<ExitStatus> {
+        info!(
+            "starting compute for project {}, operation {}, tenant {}, timeline {}",
+            self.spec.cluster.cluster_id,
+            self.spec.operation_uuid.as_ref().unwrap(),
+            self.tenant,
+            self.timeline,
+        );
+
+        self.prepare_pgdata()?;
+        self.run()
+    }
+}
--- a/compute_tools/src/config.rs
+++ b/compute_tools/src/config.rs
@@ -0,0 +1,51 @@
+use std::fs::{File, OpenOptions};
+use std::io;
+use std::io::prelude::*;
+use std::path::Path;
+
+use anyhow::Result;
+
+use crate::pg_helpers::PgOptionsSerialize;
+use crate::spec::ComputeSpec;
+
+/// Check that `line` is inside a text file and put it there if it is not.
+/// Create file if it doesn't exist.
+pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {
+    let mut file = OpenOptions::new()
+        .read(true)
+        .write(true)
+        .create(true)
+        .append(false)
+        .open(path)?;
+    let buf = io::BufReader::new(&file);
+    let mut count: usize = 0;
+
+    for l in buf.lines() {
+        if l? == line {
+            return Ok(false);
+        }
+        count = 1;
+    }
+
+    write!(file, "{}{}", "\n".repeat(count), line)?;
+    Ok(true)
+}
+
+/// Create or completely rewrite configuration file specified by `path`
+pub fn write_postgres_conf(path: &Path, spec: &ComputeSpec) -> Result<()> {
+    // File::create() destroys the file content if it exists.
+    let mut postgres_conf = File::create(path)?;
+
+    write_auto_managed_block(&mut postgres_conf, &spec.cluster.settings.as_pg_settings())?;
+
+    Ok(())
+}
+
+// Write Postgres config block wrapped with generated comment section
+fn write_auto_managed_block(file: &mut File, buf: &str) -> Result<()> {
+    writeln!(file, "# Managed by compute_ctl: begin")?;
+    writeln!(file, "{}", buf)?;
+    writeln!(file, "# Managed by compute_ctl: end")?;
+
+    Ok(())
+}
--- a/compute_tools/src/http/api.rs
+++ b/compute_tools/src/http/api.rs
@@ -0,0 +1,109 @@
+use std::convert::Infallible;
+use std::net::SocketAddr;
+use std::sync::Arc;
+use std::thread;
+
+use anyhow::Result;
+use hyper::service::{make_service_fn, service_fn};
+use hyper::{Body, Method, Request, Response, Server, StatusCode};
+use log::{error, info};
+use serde_json;
+
+use crate::compute::{ComputeNode, ComputeStatus};
+
+// Service function to handle all available routes.
+async fn routes(req: Request<Body>, compute: Arc<ComputeNode>) -> Response<Body> {
+    match (req.method(), req.uri().path()) {
+        // Timestamp of the last Postgres activity in the plain text.
+        // DEPRECATED in favour of /status
+        (&Method::GET, "/last_activity") => {
+            info!("serving /last_active GET request");
+            let state = compute.state.read().unwrap();
+
+            // Use RFC3339 format for consistency.
+            Response::new(Body::from(state.last_active.to_rfc3339()))
+        }
+
+        // Has compute setup process finished? -> true/false.
+        // DEPRECATED in favour of /status
+        (&Method::GET, "/ready") => {
+            info!("serving /ready GET request");
+            let status = compute.get_status();
+            Response::new(Body::from(format!("{}", status == ComputeStatus::Running)))
+        }
+
+        // Serialized compute state.
+        (&Method::GET, "/status") => {
+            info!("serving /status GET request");
+            let state = compute.state.read().unwrap();
+            Response::new(Body::from(serde_json::to_string(&*state).unwrap()))
+        }
+
+        // Startup metrics in JSON format. Keep /metrics reserved for a possible
+        // future use for Prometheus metrics format.
+        (&Method::GET, "/metrics.json") => {
+            info!("serving /metrics.json GET request");
+            Response::new(Body::from(serde_json::to_string(&compute.metrics).unwrap()))
+        }
+
+        // DEPRECATED, use POST instead
+        (&Method::GET, "/check_writability") => {
+            info!("serving /check_writability GET request");
+            let res = crate::checker::check_writability(&compute).await;
+            match res {
+                Ok(_) => Response::new(Body::from("true")),
+                Err(e) => Response::new(Body::from(e.to_string())),
+            }
+        }
+
+        (&Method::POST, "/check_writability") => {
+            info!("serving /check_writability POST request");
+            let res = crate::checker::check_writability(&compute).await;
+            match res {
+                Ok(_) => Response::new(Body::from("true")),
+                Err(e) => Response::new(Body::from(e.to_string())),
+            }
+        }
+
+        // Return the `404 Not Found` for any other routes.
+        _ => {
+            let mut not_found = Response::new(Body::from("404 Not Found"));
+            *not_found.status_mut() = StatusCode::NOT_FOUND;
+            not_found
+        }
+    }
+}
+
+// Main Hyper HTTP server function that runs it and blocks waiting on it forever.
+#[tokio::main]
+async fn serve(state: Arc<ComputeNode>) {
+    let addr = SocketAddr::from(([0, 0, 0, 0], 3080));
+
+    let make_service = make_service_fn(move |_conn| {
+        let state = state.clone();
+        async move {
+            Ok::<_, Infallible>(service_fn(move |req: Request<Body>| {
+                let state = state.clone();
+                async move { Ok::<_, Infallible>(routes(req, state).await) }
+            }))
+        }
+    });
+
+    info!("starting HTTP server on {}", addr);
+
+    let server = Server::bind(&addr).serve(make_service);
+
+    // Run this server forever
+    if let Err(e) = server.await {
+        error!("server error: {}", e);
+    }
+}
+
+/// Launch a separate Hyper HTTP API server thread and return its `JoinHandle`.
+pub fn launch_http_server(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
+    let state = Arc::clone(state);
+
+    Ok(thread::Builder::new()
+        .name("http-endpoint".into())
+        .spawn(move || serve(state))?)
+}
--- a/compute_tools/src/http/mod.rs
+++ b/compute_tools/src/http/mod.rs
@@ -0,0 +1 @@
+pub mod api;
--- a/compute_tools/src/http/openapi_spec.yaml
+++ b/compute_tools/src/http/openapi_spec.yaml
@@ -0,0 +1,158 @@
+openapi: "3.0.2"
+info:
+  title: Compute node control API
+  version: "1.0"
+
+servers:
+  - url: "http://localhost:3080"
+
+paths:
+  /status:
+    get:
+      tags:
+      - "info"
+      summary: Get compute node internal status
+      description: ""
+      operationId: getComputeStatus
+      responses:
+        "200":
+          description: ComputeState
+          content:
+            application/json:
+              schema:
+                $ref: "#/components/schemas/ComputeState"
+
+  /metrics.json:
+    get:
+      tags:
+      - "info"
+      summary: Get compute node startup metrics in JSON format
+      description: ""
+      operationId: getComputeMetricsJSON
+      responses:
+        "200":
+          description: ComputeMetrics
+          content:
+            application/json:
+              schema:
+                $ref: "#/components/schemas/ComputeMetrics"
+
+  /ready:
+    get:
+      deprecated: true
+      tags:
+      - "info"
+      summary: Check whether compute startup process finished successfully
+      description: ""
+      operationId: computeIsReady
+      responses:
+        "200":
+          description: Compute is ready ('true') or not ('false')
+          content:
+            text/plain:
+              schema:
+                type: string
+                example: "true"
+
+  /last_activity:
+    get:
+      deprecated: true
+      tags:
+      - "info"
+      summary: Get timestamp of the last compute activity
+      description: ""
+      operationId: getLastComputeActivityTS
+      responses:
+        "200":
+          description: Timestamp of the last compute activity
+          content:
+            text/plain:
+              schema:
+                type: string
+                example: "2022-10-12T07:20:50.52Z"
+
+  /check_writability:
+    get:
+      deprecated: true
+      tags:
+      - "check"
+      summary: Check that we can write new data on this compute
+      description: ""
+      operationId: checkComputeWritabilityDeprecated
+      responses:
+        "200":
+          description: Check result
+          content:
+            text/plain:
+              schema:
+                type: string
+                description: Error text or 'true' if check passed
+                example: "true"
+
+    post:
+      tags:
+      - "check"
+      summary: Check that we can write new data on this compute
+      description: ""
+      operationId: checkComputeWritability
+      responses:
+        "200":
+          description: Check result
+          content:
+            text/plain:
+              schema:
+                type: string
+                description: Error text or 'true' if check passed
+                example: "true"
+
+components:
+  securitySchemes:
+    JWT:
+      type: http
+      scheme: bearer
+      bearerFormat: JWT
+
+  schemas:
+    ComputeMetrics:
+      type: object
+      description: Compute startup metrics
+      required:
+        - sync_safekeepers_ms
+        - basebackup_ms
+        - config_ms
+        - total_startup_ms
+      properties:
+        sync_safekeepers_ms:
+          type: integer
+        basebackup_ms:
+          type: integer
+        config_ms:
+          type: integer
+        total_startup_ms:
+          type: integer
+
+    ComputeState:
+      type: object
+      required:
+        - status
+        - last_active
+      properties:
+        status:
+          $ref: '#/components/schemas/ComputeStatus'
+        last_active:
+          type: string
+          description: The last detected compute activity timestamp in UTC and RFC3339 format
+          example: "2022-10-12T07:20:50.52Z"
+        error:
+          type: string
+          description: Text of the error during compute startup, if any
+
+    ComputeStatus:
+      type: string
+      enum:
+        - init
+        - failed
+        - running
+
+security:
+  - JWT: []
--- a/compute_tools/src/lib.rs
+++ b/compute_tools/src/lib.rs
@@ -0,0 +1,14 @@
+//!
+//! Various tools and helpers to handle cluster / compute node (Postgres)
+//! configuration.
+//!
+pub mod checker;
+pub mod config;
+pub mod http;
+#[macro_use]
+pub mod logger;
+pub mod compute;
+pub mod monitor;
+pub mod params;
+pub mod pg_helpers;
+pub mod spec;
--- a/compute_tools/src/logger.rs
+++ b/compute_tools/src/logger.rs
@@ -0,0 +1,43 @@
+use std::io::Write;
+
+use anyhow::Result;
+use chrono::Utc;
+use env_logger::{Builder, Env};
+
+macro_rules! info_println {
+    ($($tts:tt)*) => {
+        if log_enabled!(Level::Info) {
+            println!($($tts)*);
+        }
+    }
+}
+
+macro_rules! info_print {
+    ($($tts:tt)*) => {
+        if log_enabled!(Level::Info) {
+            print!($($tts)*);
+        }
+    }
+}
+
+/// Initialize `env_logger` using either `default_level` or
+/// `RUST_LOG` environment variable as default log level.
+pub fn init_logger(default_level: &str) -> Result<()> {
+    let env = Env::default().filter_or("RUST_LOG", default_level);
+
+    Builder::from_env(env)
+        .format(|buf, record| {
+            let thread_handle = std::thread::current();
+            writeln!(
+                buf,
+                "{} [{}] {}: {}",
+                Utc::now().format("%Y-%m-%d %H:%M:%S%.3f %Z"),
+                thread_handle.name().unwrap_or("main"),
+                record.level(),
+                record.args()
+            )
+        })
+        .init();
+
+    Ok(())
+}
--- a/compute_tools/src/monitor.rs
+++ b/compute_tools/src/monitor.rs
@@ -0,0 +1,109 @@
+use std::sync::Arc;
+use std::{thread, time};
+
+use anyhow::Result;
+use chrono::{DateTime, Utc};
+use log::{debug, info};
+use postgres::{Client, NoTls};
+
+use crate::compute::ComputeNode;
+
+const MONITOR_CHECK_INTERVAL: u64 = 500; // milliseconds
+
+// Spin in a loop and figure out the last activity time in the Postgres.
+// Then update it in the shared state. This function never errors out.
+// XXX: the only expected panic is at `RwLock` unwrap().
+fn watch_compute_activity(compute: &ComputeNode) {
+    // Suppose that `connstr` doesn't change
+    let connstr = compute.connstr.as_str();
+    // Define `client` outside of the loop to reuse existing connection if it's active.
+    let mut client = Client::connect(connstr, NoTls);
+    let timeout = time::Duration::from_millis(MONITOR_CHECK_INTERVAL);
+
+    info!("watching Postgres activity at {}", connstr);
+
+    loop {
+        // Should be outside of the write lock to allow others to read while we sleep.
+        thread::sleep(timeout);
+
+        match &mut client {
+            Ok(cli) => {
+                if cli.is_closed() {
+                    info!("connection to postgres closed, trying to reconnect");
+
+                    // Connection is closed, reconnect and try again.
+                    client = Client::connect(connstr, NoTls);
+                    continue;
+                }
+
+                // Get all running client backends except ourself, use RFC3339 DateTime format.
+                let backends = cli
+                    .query(
+                        "SELECT state, to_char(state_change, 'YYYY-MM-DD\"T\"HH24:MI:SS.US\"Z\"') AS state_change
+                         FROM pg_stat_activity
+                         WHERE backend_type = 'client backend'
+                            AND pid != pg_backend_pid()
+                            AND usename != 'cloud_admin';", // XXX: find a better way to filter other monitors?
+                        &[],
+                    );
+                let mut last_active = compute.state.read().unwrap().last_active;
+
+                if let Ok(backs) = backends {
+                    let mut idle_backs: Vec<DateTime<Utc>> = vec![];
+
+                    for b in backs.into_iter() {
+                        let state: String = b.get("state");
+                        let change: String = b.get("state_change");
+
+                        if state == "idle" {
+                            let change = DateTime::parse_from_rfc3339(&change);
+                            match change {
+                                Ok(t) => idle_backs.push(t.with_timezone(&Utc)),
+                                Err(e) => {
+                                    info!("cannot parse backend state_change DateTime: {}", e);
+                                    continue;
+                                }
+                            }
+                        } else {
+                            // Found non-idle backend, so the last activity is NOW.
+                            // Save it and exit the for loop. Also clear the idle backend
+                            // `state_change` timestamps array as it doesn't matter now.
+                            last_active = Utc::now();
+                            idle_backs.clear();
+                            break;
+                        }
+                    }
+
+                    // Sort idle backend `state_change` timestamps. The last one corresponds
+                    // to the last activity.
+                    idle_backs.sort();
+                    if let Some(last) = idle_backs.last() {
+                        last_active = *last;
+                    }
+                }
+
+                // Update the last activity in the shared state if we got a more recent one.
+                let mut state = compute.state.write().unwrap();
+                if last_active > state.last_active {
+                    state.last_active = last_active;
+                    debug!("set the last compute activity time to: {}", last_active);
+                }
+            }
+            Err(e) => {
+                debug!("cannot connect to postgres: {}, retrying", e);
+
+                // Establish a new connection and try again.
+                client = Client::connect(connstr, NoTls);
+            }
+        }
+    }
+}
+
+/// Launch a separate compute monitor thread and return its `JoinHandle`.
+pub fn launch_monitor(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
+    let state = Arc::clone(state);
+
+    Ok(thread::Builder::new()
+        .name("compute-monitor".into())
+        .spawn(move || watch_compute_activity(&state))?)
+}
--- a/compute_tools/src/params.rs
+++ b/compute_tools/src/params.rs
@@ -0,0 +1,3 @@
+pub const DEFAULT_LOG_LEVEL: &str = "info";
+pub const DEFAULT_CONNSTRING: &str = "host=localhost user=postgres";
+pub const PG_HBA_ALL_MD5: &str = "host\tall\t\tall\t\t0.0.0.0/0\t\tmd5";
--- a/compute_tools/src/pg_helpers.rs
+++ b/compute_tools/src/pg_helpers.rs
@@ -0,0 +1,284 @@
+use std::fmt::Write;
+use std::fs::File;
+use std::io::{BufRead, BufReader};
+use std::net::{SocketAddr, TcpStream};
+use std::os::unix::fs::PermissionsExt;
+use std::path::Path;
+use std::process::Child;
+use std::str::FromStr;
+use std::{fs, thread, time};
+
+use anyhow::{bail, Result};
+use postgres::{Client, Transaction};
+use serde::Deserialize;
+
+const POSTGRES_WAIT_TIMEOUT: u64 = 60 * 1000; // milliseconds
+
+/// Rust representation of Postgres role info with only those fields
+/// that matter for us.
+#[derive(Clone, Deserialize)]
+pub struct Role {
+    pub name: PgIdent,
+    pub encrypted_password: Option<String>,
+    pub options: GenericOptions,
+}
+
+/// Rust representation of Postgres database info with only those fields
+/// that matter for us.
+#[derive(Clone, Deserialize)]
+pub struct Database {
+    pub name: PgIdent,
+    pub owner: PgIdent,
+    pub options: GenericOptions,
+}
+
+/// Common type representing both SQL statement params with or without value,
+/// like `LOGIN` or `OWNER username` in the `CREATE/ALTER ROLE`, and config
+/// options like `wal_level = logical`.
+#[derive(Clone, Deserialize)]
+pub struct GenericOption {
+    pub name: String,
+    pub value: Option<String>,
+    pub vartype: String,
+}
+
+/// Optional collection of `GenericOption`'s. Type alias allows us to
+/// declare a `trait` on it.
+pub type GenericOptions = Option<Vec<GenericOption>>;
+
+impl GenericOption {
+    /// Represent `GenericOption` as SQL statement parameter.
+    pub fn to_pg_option(&self) -> String {
+        if let Some(val) = &self.value {
+            match self.vartype.as_ref() {
+                "string" => format!("{} '{}'", self.name, val),
+                _ => format!("{} {}", self.name, val),
+            }
+        } else {
+            self.name.to_owned()
+        }
+    }
+
+    /// Represent `GenericOption` as configuration option.
+    pub fn to_pg_setting(&self) -> String {
+        if let Some(val) = &self.value {
+            match self.vartype.as_ref() {
+                "string" => format!("{} = '{}'", self.name, val),
+                _ => format!("{} = {}", self.name, val),
+            }
+        } else {
+            self.name.to_owned()
+        }
+    }
+}
+
+pub trait PgOptionsSerialize {
+    fn as_pg_options(&self) -> String;
+    fn as_pg_settings(&self) -> String;
+}
+
+impl PgOptionsSerialize for GenericOptions {
+    /// Serialize an optional collection of `GenericOption`'s to
+    /// Postgres SQL statement arguments.
+    fn as_pg_options(&self) -> String {
+        if let Some(ops) = &self {
+            ops.iter()
+                .map(|op| op.to_pg_option())
+                .collect::<Vec<String>>()
+                .join(" ")
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Serialize an optional collection of `GenericOption`'s to
+    /// `postgresql.conf` compatible format.
+    fn as_pg_settings(&self) -> String {
+        if let Some(ops) = &self {
+            ops.iter()
+                .map(|op| op.to_pg_setting())
+                .collect::<Vec<String>>()
+                .join("\n")
+        } else {
+            "".to_string()
+        }
+    }
+}
+
+pub trait GenericOptionsSearch {
+    fn find(&self, name: &str) -> Option<String>;
+}
+
+impl GenericOptionsSearch for GenericOptions {
+    /// Lookup option by name
+    fn find(&self, name: &str) -> Option<String> {
+        match &self {
+            Some(ops) => {
+                let op = ops.iter().find(|s| s.name == name);
+                match op {
+                    Some(op) => op.value.clone(),
+                    None => None,
+                }
+            }
+            None => None,
+        }
+    }
+}
+
+impl Role {
+    /// Serialize a list of role parameters into a Postgres-acceptable
+    /// string of arguments.
+    pub fn to_pg_options(&self) -> String {
+        // XXX: consider putting LOGIN as a default option somewhere higher, e.g. in Rails.
+        // For now we do not use generic `options` for roles. Once used, add
+        // `self.options.as_pg_options()` somewhere here.
+        let mut params: String = "LOGIN".to_string();
+
+        if let Some(pass) = &self.encrypted_password {
+            // Some time ago we supported only md5 and treated all encrypted_password as md5.
+            // Now we also support SCRAM-SHA-256 and to preserve compatibility
+            // we treat all encrypted_password as md5 unless they starts with SCRAM-SHA-256.
+            if pass.starts_with("SCRAM-SHA-256") {
+                write!(params, " PASSWORD '{pass}'")
+                    .expect("String is documented to not to error during write operations");
+            } else {
+                write!(params, " PASSWORD 'md5{pass}'")
+                    .expect("String is documented to not to error during write operations");
+            }
+        } else {
+            params.push_str(" PASSWORD NULL");
+        }
+
+        params
+    }
+}
+
+impl Database {
+    /// Serialize a list of database parameters into a Postgres-acceptable
+    /// string of arguments.
+    /// NB: `TEMPLATE` is actually also an identifier, but so far we only need
+    /// to use `template0` and `template1`, so it is not a problem. Yet in the future
+    /// it may require a proper quoting too.
+    pub fn to_pg_options(&self) -> String {
+        let mut params: String = self.options.as_pg_options();
+        write!(params, " OWNER {}", &self.owner.quote())
+            .expect("String is documented to not to error during write operations");
+
+        params
+    }
+}
+
+/// String type alias representing Postgres identifier and
+/// intended to be used for DB / role names.
+pub type PgIdent = String;
+
+/// Generic trait used to provide quoting for strings used in the
+/// Postgres SQL queries. Currently used only to implement quoting
+/// of identifiers, but could be used for literals in the future.
+pub trait PgQuote {
+    fn quote(&self) -> String;
+}
+
+impl PgQuote for PgIdent {
+    /// This is intended to mimic Postgres quote_ident(), but for simplicity it
+    /// always quotes provided string with `""` and escapes every `"`. Not idempotent,
+    /// i.e. if string is already escaped it will be escaped again.
+    fn quote(&self) -> String {
+        let result = format!("\"{}\"", self.replace('"', "\"\""));
+        result
+    }
+}
+
+/// Build a list of existing Postgres roles
+pub fn get_existing_roles(xact: &mut Transaction<'_>) -> Result<Vec<Role>> {
+    let postgres_roles = xact
+        .query("SELECT rolname, rolpassword FROM pg_catalog.pg_authid", &[])?
+        .iter()
+        .map(|row| Role {
+            name: row.get("rolname"),
+            encrypted_password: row.get("rolpassword"),
+            options: None,
+        })
+        .collect();
+
+    Ok(postgres_roles)
+}
+
+/// Build a list of existing Postgres databases
+pub fn get_existing_dbs(client: &mut Client) -> Result<Vec<Database>> {
+    let postgres_dbs = client
+        .query(
+            "SELECT datname, datdba::regrole::text as owner
+               FROM pg_catalog.pg_database;",
+            &[],
+        )?
+        .iter()
+        .map(|row| Database {
+            name: row.get("datname"),
+            owner: row.get("owner"),
+            options: None,
+        })
+        .collect();
+
+    Ok(postgres_dbs)
+}
+
+/// Wait for Postgres to become ready to accept connections:
+/// - state should be `ready` in the `pgdata/postmaster.pid`
+/// - and we should be able to connect to 127.0.0.1:5432
+pub fn wait_for_postgres(pg: &mut Child, port: &str, pgdata: &Path) -> Result<()> {
+    let pid_path = pgdata.join("postmaster.pid");
+    let mut slept: u64 = 0; // ms
+    let pause = time::Duration::from_millis(100);
+
+    let timeout = time::Duration::from_millis(10);
+    let addr = SocketAddr::from_str(&format!("127.0.0.1:{}", port)).unwrap();
+
+    loop {
+        // Sleep POSTGRES_WAIT_TIMEOUT at max (a bit longer actually if consider a TCP timeout,
+        // but postgres starts listening almost immediately, even if it is not really
+        // ready to accept connections).
+        if slept >= POSTGRES_WAIT_TIMEOUT {
+            bail!("timed out while waiting for Postgres to start");
+        }
+
+        if let Ok(Some(status)) = pg.try_wait() {
+            // Postgres exited, that is not what we expected, bail out earlier.
+            let code = status.code().unwrap_or(-1);
+            bail!("Postgres exited unexpectedly with code {}", code);
+        }
+
+        // Check that we can open pid file first.
+        if let Ok(file) = File::open(&pid_path) {
+            let file = BufReader::new(file);
+            let last_line = file.lines().last();
+
+            // Pid file could be there and we could read it, but it could be empty, for example.
+            if let Some(Ok(line)) = last_line {
+                let status = line.trim();
+                let can_connect = TcpStream::connect_timeout(&addr, timeout).is_ok();
+
+                // Now Postgres is ready to accept connections
+                if status == "ready" && can_connect {
+                    break;
+                }
+            }
+        }
+
+        thread::sleep(pause);
+        slept += 100;
+    }
+
+    Ok(())
+}
+
+/// Remove `pgdata` directory and create it again with right permissions.
+pub fn create_pgdata(pgdata: &str) -> Result<()> {
+    // Ignore removal error, likely it is a 'No such file or directory (os error 2)'.
+    // If it is something different then create_dir() will error out anyway.
+    let _ok = fs::remove_dir_all(pgdata);
+    fs::create_dir(pgdata)?;
+    fs::set_permissions(pgdata, fs::Permissions::from_mode(0o700))?;
+
+    Ok(())
+}
--- a/compute_tools/src/spec.rs
+++ b/compute_tools/src/spec.rs
@@ -0,0 +1,428 @@
+use std::path::Path;
+
+use anyhow::Result;
+use log::{info, log_enabled, warn, Level};
+use postgres::{Client, NoTls};
+use serde::Deserialize;
+
+use crate::compute::ComputeNode;
+use crate::config;
+use crate::params::PG_HBA_ALL_MD5;
+use crate::pg_helpers::*;
+
+/// Cluster spec or configuration represented as an optional number of
+/// delta operations + final cluster state description.
+#[derive(Clone, Deserialize)]
+pub struct ComputeSpec {
+    pub format_version: f32,
+    pub timestamp: String,
+    pub operation_uuid: Option<String>,
+    /// Expected cluster state at the end of transition process.
+    pub cluster: Cluster,
+    pub delta_operations: Option<Vec<DeltaOp>>,
+}
+
+/// Cluster state seen from the perspective of the external tools
+/// like Rails web console.
+#[derive(Clone, Deserialize)]
+pub struct Cluster {
+    pub cluster_id: String,
+    pub name: String,
+    pub state: Option<String>,
+    pub roles: Vec<Role>,
+    pub databases: Vec<Database>,
+    pub settings: GenericOptions,
+}
+
+/// Single cluster state changing operation that could not be represented as
+/// a static `Cluster` structure. For example:
+/// - DROP DATABASE
+/// - DROP ROLE
+/// - ALTER ROLE name RENAME TO new_name
+/// - ALTER DATABASE name RENAME TO new_name
+#[derive(Clone, Deserialize)]
+pub struct DeltaOp {
+    pub action: String,
+    pub name: PgIdent,
+    pub new_name: Option<PgIdent>,
+}
+
+/// It takes cluster specification and does the following:
+/// - Serialize cluster config and put it into `postgresql.conf` completely rewriting the file.
+/// - Update `pg_hba.conf` to allow external connections.
+pub fn handle_configuration(spec: &ComputeSpec, pgdata_path: &Path) -> Result<()> {
+    // File `postgresql.conf` is no longer included into `basebackup`, so just
+    // always write all config into it creating new file.
+    config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), spec)?;
+
+    update_pg_hba(pgdata_path)?;
+
+    Ok(())
+}
+
+/// Check `pg_hba.conf` and update if needed to allow external connections.
+pub fn update_pg_hba(pgdata_path: &Path) -> Result<()> {
+    // XXX: consider making it a part of spec.json
+    info!("checking pg_hba.conf");
+    let pghba_path = pgdata_path.join("pg_hba.conf");
+
+    if config::line_in_file(&pghba_path, PG_HBA_ALL_MD5)? {
+        info!("updated pg_hba.conf to allow external connections");
+    } else {
+        info!("pg_hba.conf is up-to-date");
+    }
+
+    Ok(())
+}
+
+/// Given a cluster spec json and open transaction it handles roles creation,
+/// deletion and update.
+pub fn handle_roles(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
+    let mut xact = client.transaction()?;
+    let existing_roles: Vec<Role> = get_existing_roles(&mut xact)?;
+
+    // Print a list of existing Postgres roles (only in debug mode)
+    info!("postgres roles:");
+    for r in &existing_roles {
+        info_println!(
+            "{} - {}:{}",
+            " ".repeat(27 + 5),
+            r.name,
+            if r.encrypted_password.is_some() {
+                "[FILTERED]"
+            } else {
+                "(null)"
+            }
+        );
+    }
+
+    // Process delta operations first
+    if let Some(ops) = &spec.delta_operations {
+        info!("processing role renames");
+        for op in ops {
+            match op.action.as_ref() {
+                "delete_role" => {
+                    // no-op now, roles will be deleted at the end of configuration
+                }
+                // Renaming role drops its password, since role name is
+                // used as a salt there.  It is important that this role
+                // is recorded with a new `name` in the `roles` list.
+                // Follow up roles update will set the new password.
+                "rename_role" => {
+                    let new_name = op.new_name.as_ref().unwrap();
+
+                    // XXX: with a limited number of roles it is fine, but consider making it a HashMap
+                    if existing_roles.iter().any(|r| r.name == op.name) {
+                        let query: String = format!(
+                            "ALTER ROLE {} RENAME TO {}",
+                            op.name.quote(),
+                            new_name.quote()
+                        );
+
+                        warn!("renaming role '{}' to '{}'", op.name, new_name);
+                        xact.execute(query.as_str(), &[])?;
+                    }
+                }
+                _ => {}
+            }
+        }
+    }
+
+    // Refresh Postgres roles info to handle possible roles renaming
+    let existing_roles: Vec<Role> = get_existing_roles(&mut xact)?;
+
+    info!("cluster spec roles:");
+    for role in &spec.cluster.roles {
+        let name = &role.name;
+
+        info_print!(
+            "{} - {}:{}",
+            " ".repeat(27 + 5),
+            name,
+            if role.encrypted_password.is_some() {
+                "[FILTERED]"
+            } else {
+                "(null)"
+            }
+        );
+
+        // XXX: with a limited number of roles it is fine, but consider making it a HashMap
+        let pg_role = existing_roles.iter().find(|r| r.name == *name);
+
+        if let Some(r) = pg_role {
+            let mut update_role = false;
+
+            if (r.encrypted_password.is_none() && role.encrypted_password.is_some())
+                || (r.encrypted_password.is_some() && role.encrypted_password.is_none())
+            {
+                update_role = true;
+            } else if let Some(pg_pwd) = &r.encrypted_password {
+                // Check whether password changed or not (trim 'md5:' prefix first)
+                update_role = pg_pwd[3..] != *role.encrypted_password.as_ref().unwrap();
+            }
+
+            if update_role {
+                let mut query: String = format!("ALTER ROLE {} ", name.quote());
+                info_print!(" -> update");
+
+                query.push_str(&role.to_pg_options());
+                xact.execute(query.as_str(), &[])?;
+            }
+        } else {
+            info!("role name: '{}'", &name);
+            let mut query: String = format!("CREATE ROLE {} ", name.quote());
+            info!("role create query: '{}'", &query);
+            info_print!(" -> create");
+
+            query.push_str(&role.to_pg_options());
+            xact.execute(query.as_str(), &[])?;
+
+            let grant_query = format!(
+                "GRANT pg_read_all_data, pg_write_all_data TO {}",
+                name.quote()
+            );
+            xact.execute(grant_query.as_str(), &[])?;
+            info!("role grant query: '{}'", &grant_query);
+        }
+
+        info_print!("\n");
+    }
+
+    xact.commit()?;
+
+    Ok(())
+}
+
+/// Reassign all dependent objects and delete requested roles.
+pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<()> {
+    let spec = &node.spec;
+
+    // First, reassign all dependent objects to db owners.
+    if let Some(ops) = &spec.delta_operations {
+        info!("reassigning dependent objects of to-be-deleted roles");
+        for op in ops {
+            if op.action == "delete_role" {
+                reassign_owned_objects(node, &op.name)?;
+            }
+        }
+    }
+
+    // Second, proceed with role deletions.
+    let mut xact = client.transaction()?;
+    if let Some(ops) = &spec.delta_operations {
+        info!("processing role deletions");
+        for op in ops {
+            // We do not check either role exists or not,
+            // Postgres will take care of it for us
+            if op.action == "delete_role" {
+                let query: String = format!("DROP ROLE IF EXISTS {}", &op.name.quote());
+
+                warn!("deleting role '{}'", &op.name);
+                xact.execute(query.as_str(), &[])?;
+            }
+        }
+    }
+
+    Ok(())
+}
+
+// Reassign all owned objects in all databases to the owner of the database.
+fn reassign_owned_objects(node: &ComputeNode, role_name: &PgIdent) -> Result<()> {
+    for db in &node.spec.cluster.databases {
+        if db.owner != *role_name {
+            let mut connstr = node.connstr.clone();
+            // database name is always the last and the only component of the path
+            connstr.set_path(&db.name);
+
+            let mut client = Client::connect(connstr.as_str(), NoTls)?;
+
+            // This will reassign all dependent objects to the db owner
+            let reassign_query = format!(
+                "REASSIGN OWNED BY {} TO {}",
+                role_name.quote(),
+                db.owner.quote()
+            );
+            info!(
+                "reassigning objects owned by '{}' in db '{}' to '{}'",
+                role_name, &db.name, &db.owner
+            );
+            client.simple_query(&reassign_query)?;
+
+            // This now will only drop privileges of the role
+            let drop_query = format!("DROP OWNED BY {}", role_name.quote());
+            client.simple_query(&drop_query)?;
+        }
+    }
+
+    Ok(())
+}
+
+/// It follows mostly the same logic as `handle_roles()` excepting that we
+/// does not use an explicit transactions block, since major database operations
+/// like `CREATE DATABASE` and `DROP DATABASE` do not support it. Statement-level
+/// atomicity should be enough here due to the order of operations and various checks,
+/// which together provide us idempotency.
+pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
+    let existing_dbs: Vec<Database> = get_existing_dbs(client)?;
+
+    // Print a list of existing Postgres databases (only in debug mode)
+    info!("postgres databases:");
+    for r in &existing_dbs {
+        info_println!("{} - {}:{}", " ".repeat(27 + 5), r.name, r.owner);
+    }
+
+    // Process delta operations first
+    if let Some(ops) = &spec.delta_operations {
+        info!("processing delta operations on databases");
+        for op in ops {
+            match op.action.as_ref() {
+                // We do not check either DB exists or not,
+                // Postgres will take care of it for us
+                "delete_db" => {
+                    let query: String = format!("DROP DATABASE IF EXISTS {}", &op.name.quote());
+
+                    warn!("deleting database '{}'", &op.name);
+                    client.execute(query.as_str(), &[])?;
+                }
+                "rename_db" => {
+                    let new_name = op.new_name.as_ref().unwrap();
+
+                    // XXX: with a limited number of roles it is fine, but consider making it a HashMap
+                    if existing_dbs.iter().any(|r| r.name == op.name) {
+                        let query: String = format!(
+                            "ALTER DATABASE {} RENAME TO {}",
+                            op.name.quote(),
+                            new_name.quote()
+                        );
+
+                        warn!("renaming database '{}' to '{}'", op.name, new_name);
+                        client.execute(query.as_str(), &[])?;
+                    }
+                }
+                _ => {}
+            }
+        }
+    }
+
+    // Refresh Postgres databases info to handle possible renames
+    let existing_dbs: Vec<Database> = get_existing_dbs(client)?;
+
+    info!("cluster spec databases:");
+    for db in &spec.cluster.databases {
+        let name = &db.name;
+
+        info_print!("{} - {}:{}", " ".repeat(27 + 5), db.name, db.owner);
+
+        // XXX: with a limited number of databases it is fine, but consider making it a HashMap
+        let pg_db = existing_dbs.iter().find(|r| r.name == *name);
+
+        if let Some(r) = pg_db {
+            // XXX: db owner name is returned as quoted string from Postgres,
+            // when quoting is needed.
+            let new_owner = if r.owner.starts_with('"') {
+                db.owner.quote()
+            } else {
+                db.owner.clone()
+            };
+
+            if new_owner != r.owner {
+                let query: String = format!(
+                    "ALTER DATABASE {} OWNER TO {}",
+                    name.quote(),
+                    db.owner.quote()
+                );
+                info_print!(" -> update");
+
+                client.execute(query.as_str(), &[])?;
+            }
+        } else {
+            let mut query: String = format!("CREATE DATABASE {} ", name.quote());
+            info_print!(" -> create");
+
+            query.push_str(&db.to_pg_options());
+            client.execute(query.as_str(), &[])?;
+        }
+
+        info_print!("\n");
+    }
+
+    Ok(())
+}
+
+/// Grant CREATE ON DATABASE to the database owner and do some other alters and grants
+/// to allow users creating trusted extensions and re-creating `public` schema, for example.
+pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
+    let spec = &node.spec;
+
+    info!("cluster spec grants:");
+
+    // We now have a separate `web_access` role to connect to the database
+    // via the web interface and proxy link auth. And also we grant a
+    // read / write all data privilege to every role. So also grant
+    // create to everyone.
+    // XXX: later we should stop messing with Postgres ACL in such horrible
+    // ways.
+    let roles = spec
+        .cluster
+        .roles
+        .iter()
+        .map(|r| r.name.quote())
+        .collect::<Vec<_>>();
+
+    for db in &spec.cluster.databases {
+        let dbname = &db.name;
+
+        let query: String = format!(
+            "GRANT CREATE ON DATABASE {} TO {}",
+            dbname.quote(),
+            roles.join(", ")
+        );
+        info!("grant query {}", &query);
+
+        client.execute(query.as_str(), &[])?;
+    }
+
+    // Do some per-database access adjustments. We'd better do this at db creation time,
+    // but CREATE DATABASE isn't transactional. So we cannot create db + do some grants
+    // atomically.
+    let mut db_connstr = node.connstr.clone();
+    for db in &node.spec.cluster.databases {
+        // database name is always the last and the only component of the path
+        db_connstr.set_path(&db.name);
+
+        let mut db_client = Client::connect(db_connstr.as_str(), NoTls)?;
+
+        // This will only change ownership on the schema itself, not the objects
+        // inside it. Without it owner of the `public` schema will be `cloud_admin`
+        // and database owner cannot do anything with it. SQL procedure ensures
+        // that it won't error out if schema `public` doesn't exist.
+        let alter_query = format!(
+            "DO $$\n\
+                DECLARE\n\
+                    schema_owner TEXT;\n\
+                BEGIN\n\
+                    IF EXISTS(\n\
+                        SELECT nspname\n\
+                        FROM pg_catalog.pg_namespace\n\
+                        WHERE nspname = 'public'\n\
+                    )\n\
+                    THEN\n\
+                        SELECT nspowner::regrole::text\n\
+                            FROM pg_catalog.pg_namespace\n\
+                            WHERE nspname = 'public'\n\
+                            INTO schema_owner;\n\
+                \n\
+                        IF schema_owner = 'cloud_admin' OR schema_owner = 'zenith_admin'\n\
+                        THEN\n\
+                            ALTER SCHEMA public OWNER TO {};\n\
+                        END IF;\n\
+                    END IF;\n\
+                END\n\
+            $$;",
+            db.owner.quote()
+        );
+        db_client.simple_query(&alter_query)?;
+    }
+
+    Ok(())
+}
--- a/compute_tools/tests/cluster_spec.json
+++ b/compute_tools/tests/cluster_spec.json
@@ -0,0 +1,205 @@
+{
+    "format_version": 1.0,
+
+    "timestamp": "2021-05-23T18:25:43.511Z",
+    "operation_uuid": "0f657b36-4b0f-4a2d-9c2e-1dcd615e7d8b",
+
+    "cluster": {
+        "cluster_id": "test-cluster-42",
+        "name": "Zenith Test",
+        "state": "restarted",
+        "roles": [
+            {
+                "name": "postgres",
+                "encrypted_password": "6b1d16b78004bbd51fa06af9eda75972",
+                "options": null
+            },
+            {
+                "name": "alexk",
+                "encrypted_password": null,
+                "options": null
+            },
+            {
+                "name": "zenith \"new\"",
+                "encrypted_password": "5b1d16b78004bbd51fa06af9eda75972",
+                "options": null
+            },
+            {
+                "name": "zen",
+                "encrypted_password": "9b1d16b78004bbd51fa06af9eda75972"
+            },
+            {
+                "name": "\"name\";\\n select 1;",
+                "encrypted_password": "5b1d16b78004bbd51fa06af9eda75972"
+            },
+            {
+                "name": "MyRole",
+                "encrypted_password": "5b1d16b78004bbd51fa06af9eda75972"
+            }
+        ],
+        "databases": [
+            {
+                "name": "DB2",
+                "owner": "alexk",
+                "options": [
+                    {
+                        "name": "LC_COLLATE",
+                        "value": "C",
+                        "vartype": "string"
+                    },
+                    {
+                        "name": "LC_CTYPE",
+                        "value": "C",
+                        "vartype": "string"
+                    },
+                    {
+                        "name": "TEMPLATE",
+                        "value": "template0",
+                        "vartype": "enum"
+                    }
+                ]
+            },
+            {
+                "name": "zenith",
+                "owner": "MyRole"
+            },
+            {
+                "name": "zen",
+                "owner": "zen"
+            }
+        ],
+        "settings": [
+            {
+                "name": "fsync",
+                "value": "off",
+                "vartype": "bool"
+            },
+            {
+                "name": "wal_level",
+                "value": "replica",
+                "vartype": "enum"
+            },
+            {
+                "name": "hot_standby",
+                "value": "on",
+                "vartype": "bool"
+            },
+            {
+                "name": "safekeepers",
+                "value": "127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501",
+                "vartype": "string"
+            },
+            {
+                "name": "wal_log_hints",
+                "value": "on",
+                "vartype": "bool"
+            },
+            {
+                "name": "log_connections",
+                "value": "on",
+                "vartype": "bool"
+            },
+            {
+                "name": "shared_buffers",
+                "value": "32768",
+                "vartype": "integer"
+            },
+            {
+                "name": "port",
+                "value": "55432",
+                "vartype": "integer"
+            },
+            {
+                "name": "max_connections",
+                "value": "100",
+                "vartype": "integer"
+            },
+            {
+                "name": "max_wal_senders",
+                "value": "10",
+                "vartype": "integer"
+            },
+            {
+                "name": "listen_addresses",
+                "value": "0.0.0.0",
+                "vartype": "string"
+            },
+            {
+                "name": "wal_sender_timeout",
+                "value": "0",
+                "vartype": "integer"
+            },
+            {
+                "name": "password_encryption",
+                "value": "md5",
+                "vartype": "enum"
+            },
+            {
+                "name": "maintenance_work_mem",
+                "value": "65536",
+                "vartype": "integer"
+            },
+            {
+                "name": "max_parallel_workers",
+                "value": "8",
+                "vartype": "integer"
+            },
+            {
+                "name": "max_worker_processes",
+                "value": "8",
+                "vartype": "integer"
+            },
+            {
+                "name": "neon.tenant_id",
+                "value": "b0554b632bd4d547a63b86c3630317e8",
+                "vartype": "string"
+            },
+            {
+                "name": "max_replication_slots",
+                "value": "10",
+                "vartype": "integer"
+            },
+            {
+                "name": "neon.timeline_id",
+                "value": "2414a61ffc94e428f14b5758fe308e13",
+                "vartype": "string"
+            },
+            {
+                "name": "shared_preload_libraries",
+                "value": "neon",
+                "vartype": "string"
+            },
+            {
+                "name": "synchronous_standby_names",
+                "value": "walproposer",
+                "vartype": "string"
+            },
+            {
+                "name": "neon.pageserver_connstring",
+                "value": "host=127.0.0.1 port=6400",
+                "vartype": "string"
+            }
+        ]
+    },
+
+    "delta_operations": [
+        {
+            "action": "delete_db",
+            "name": "zenith_test"
+        },
+        {
+            "action": "rename_db",
+            "name": "DB",
+            "new_name": "DB2"
+        },
+        {
+            "action": "delete_role",
+            "name": "zenith2"
+        },
+        {
+            "action": "rename_role",
+            "name": "zenith new",
+            "new_name": "zenith \"new\""
+        }
+    ]
+}
--- a/compute_tools/tests/config_test.rs
+++ b/compute_tools/tests/config_test.rs
@@ -0,0 +1,48 @@
+#[cfg(test)]
+mod config_tests {
+
+    use std::fs::{remove_file, File};
+    use std::io::{Read, Write};
+    use std::path::Path;
+
+    use compute_tools::config::*;
+
+    fn write_test_file(path: &Path, content: &str) {
+        let mut file = File::create(path).unwrap();
+        file.write_all(content.as_bytes()).unwrap();
+    }
+
+    fn check_file_content(path: &Path, expected_content: &str) {
+        let mut file = File::open(path).unwrap();
+        let mut content = String::new();
+
+        file.read_to_string(&mut content).unwrap();
+        assert_eq!(content, expected_content);
+    }
+
+    #[test]
+    fn test_line_in_file() {
+        let path = Path::new("./tests/tmp/config_test.txt");
+        write_test_file(path, "line1\nline2.1\t line2.2\nline3");
+
+        let line = "line2.1\t line2.2";
+        let result = line_in_file(path, line).unwrap();
+        assert!(!result);
+        check_file_content(path, "line1\nline2.1\t line2.2\nline3");
+
+        let line = "line4";
+        let result = line_in_file(path, line).unwrap();
+        assert!(result);
+        check_file_content(path, "line1\nline2.1\t line2.2\nline3\nline4");
+
+        remove_file(path).unwrap();
+
+        let path = Path::new("./tests/tmp/new_config_test.txt");
+        let line = "line4";
+        let result = line_in_file(path, line).unwrap();
+        assert!(result);
+        check_file_content(path, "line4");
+
+        remove_file(path).unwrap();
+    }
+}
--- a/compute_tools/tests/pg_helpers_tests.rs
+++ b/compute_tools/tests/pg_helpers_tests.rs
@@ -0,0 +1,41 @@
+#[cfg(test)]
+mod pg_helpers_tests {
+
+    use std::fs::File;
+
+    use compute_tools::pg_helpers::*;
+    use compute_tools::spec::ComputeSpec;
+
+    #[test]
+    fn params_serialize() {
+        let file = File::open("tests/cluster_spec.json").unwrap();
+        let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
+
+        assert_eq!(
+            spec.cluster.databases.first().unwrap().to_pg_options(),
+            "LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0 OWNER \"alexk\""
+        );
+        assert_eq!(
+            spec.cluster.roles.first().unwrap().to_pg_options(),
+            "LOGIN PASSWORD 'md56b1d16b78004bbd51fa06af9eda75972'"
+        );
+    }
+
+    #[test]
+    fn settings_serialize() {
+        let file = File::open("tests/cluster_spec.json").unwrap();
+        let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
+
+        assert_eq!(
+            spec.cluster.settings.as_pg_settings(),
+            "fsync = off\nwal_level = replica\nhot_standby = on\nsafekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'\nwal_log_hints = on\nlog_connections = on\nshared_buffers = 32768\nport = 55432\nmax_connections = 100\nmax_wal_senders = 10\nlisten_addresses = '0.0.0.0'\nwal_sender_timeout = 0\npassword_encryption = md5\nmaintenance_work_mem = 65536\nmax_parallel_workers = 8\nmax_worker_processes = 8\nneon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'\nmax_replication_slots = 10\nneon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'\nshared_preload_libraries = 'neon'\nsynchronous_standby_names = 'walproposer'\nneon.pageserver_connstring = 'host=127.0.0.1 port=6400'"
+        );
+    }
+
+    #[test]
+    fn quote_ident() {
+        let ident: PgIdent = PgIdent::from("\"name\";\\n select 1;");
+
+        assert_eq!(ident.quote(), "\"\"\"name\"\";\\n select 1;\"");
+    }
+}
--- a/compute_tools/tests/tmp/.gitignore
+++ b/compute_tools/tests/tmp/.gitignore
@@ -0,0 +1 @@
+**/*
--- a/control_plane/Cargo.toml
+++ b/control_plane/Cargo.toml
@@ -1,27 +1,22 @@
 [package]
 name = "control_plane"
 version = "0.1.0"
-authors = ["Stas Kelvich <stas@zenith.tech>"]
-edition = "2018"
-
-# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
+edition = "2021"

 [dependencies]
-rand = "0.8.3"
-tar = "0.4.33"
-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="a0d067b66447951d1276a53fb09886539c3fa094" }
-tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="a0d067b66447951d1276a53fb09886539c3fa094" }
-
-serde = ""
-serde_derive = ""
-toml = ""
-lazy_static = ""
+tar = "0.4.38"
+postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
+serde = { version = "1.0", features = ["derive"] }
+serde_with = "1.12.0"
+toml = "0.5"
+once_cell = "1.13.0"
 regex = "1"
 anyhow = "1.0"
-hex = "0.4.3"
-bytes = "1.0.1"
-fs_extra = "1.2.0"
+thiserror = "1"
+nix = "0.23"
+reqwest = { version = "0.11", default-features = false, features = ["blocking", "json", "rustls-tls"] }

 pageserver = { path = "../pageserver" }
-walkeeper = { path = "../walkeeper" }
-postgres_ffi = { path = "../postgres_ffi" }
+safekeeper = { path = "../safekeeper" }
+utils = { path = "../libs/utils" }
+workspace_hack = { version = "0.1", path = "../workspace_hack" }
--- a/control_plane/safekeepers.conf
+++ b/control_plane/safekeepers.conf
@@ -0,0 +1,20 @@
+# Page server and three safekeepers.
+[pageserver]
+listen_pg_addr = '127.0.0.1:64000'
+listen_http_addr = '127.0.0.1:9898'
+auth_type = 'Trust'
+
+[[safekeepers]]
+id = 1
+pg_port = 5454
+http_port = 7676
+
+[[safekeepers]]
+id = 2
+pg_port = 5455
+http_port = 7677
+
+[[safekeepers]]
+id = 3
+pg_port = 5456
+http_port = 7678
--- a/control_plane/simple.conf
+++ b/control_plane/simple.conf
@@ -0,0 +1,14 @@
+# Minimal zenith environment with one safekeeper. This is equivalent to the built-in
+# defaults that you get with no --config
+[pageserver]
+listen_pg_addr = '127.0.0.1:64000'
+listen_http_addr = '127.0.0.1:9898'
+auth_type = 'Trust'
+
+[[safekeepers]]
+id = 1
+pg_port = 5454
+http_port = 7676
+
+[etcd_broker]
+broker_endpoints = ['http://127.0.0.1:2379']
--- a/control_plane/src/compute.rs
+++ b/control_plane/src/compute.rs
@@ -1,23 +1,26 @@
-use std::fs::{self, OpenOptions};
-use std::io::{Read, Write};
+use std::collections::BTreeMap;
+use std::fs::{self, File};
+use std::io::Write;
 use std::net::SocketAddr;
 use std::net::TcpStream;
 use std::os::unix::fs::PermissionsExt;
-use std::process::Command;
+use std::path::PathBuf;
+use std::process::{Command, Stdio};
+use std::str::FromStr;
 use std::sync::Arc;
 use std::time::Duration;
-use std::{collections::BTreeMap, path::PathBuf};

 use anyhow::{Context, Result};
-use lazy_static::lazy_static;
-use regex::Regex;
-use tar;
-
-use postgres::{Client, NoTls};
+use utils::{
+    connstring::connection_host_port,
+    lsn::Lsn,
+    postgres_backend::AuthType,
+    zid::{ZTenantId, ZTimelineId},
+};

 use crate::local_env::LocalEnv;
-use crate::storage::{PageServerNode, WalProposerNode};
-use pageserver::ZTimelineId;
+use crate::postgresql_conf::PostgresConf;
+use crate::storage::PageServerNode;

 //
 // ComputeControlPlane
@@ -25,27 +28,34 @@ use pageserver::ZTimelineId;
 pub struct ComputeControlPlane {
    base_port: u16,
    pageserver: Arc<PageServerNode>,
-    pub nodes: BTreeMap<String, Arc<PostgresNode>>,
+    pub nodes: BTreeMap<(ZTenantId, String), Arc<PostgresNode>>,
    env: LocalEnv,
 }

 impl ComputeControlPlane {
    // Load current nodes with ports from data directories on disk
+    // Directory structure has the following layout:
+    // pgdatadirs
+    // |- tenants
+    // |  |- <tenant_id>
+    // |  |   |- <node name>
    pub fn load(env: LocalEnv) -> Result<ComputeControlPlane> {
-        // TODO: since pageserver do not have config file yet we believe here that
-        // it is running on default port. Change that when pageserver will have config.
        let pageserver = Arc::new(PageServerNode::from_env(&env));

-        let pgdatadirspath = env.repo_path.join("pgdatadirs");
-        let nodes: Result<BTreeMap<_, _>> = fs::read_dir(&pgdatadirspath)
+        let mut nodes = BTreeMap::default();
+        let pgdatadirspath = &env.pg_data_dirs_path();
+
+        for tenant_dir in fs::read_dir(&pgdatadirspath)
            .with_context(|| format!("failed to list {}", pgdatadirspath.display()))?
-            .into_iter()
-            .map(|f| {
-                PostgresNode::from_dir_entry(f?, &env, &pageserver)
-                    .map(|node| (node.name.clone(), Arc::new(node)))
-            })
-            .collect();
-        let nodes = nodes?;
+        {
+            let tenant_dir = tenant_dir?;
+            for timeline_dir in fs::read_dir(tenant_dir.path())
+                .with_context(|| format!("failed to list {}", tenant_dir.path().display()))?
+            {
+                let node = PostgresNode::from_dir_entry(timeline_dir?, &env, &pageserver)?;
+                nodes.insert((node.tenant_id, node.name.clone()), Arc::new(node));
+            }
+        }

        Ok(ComputeControlPlane {
            base_port: 55431,
@@ -64,80 +74,32 @@ impl ComputeControlPlane {
            .unwrap_or(self.base_port)
    }

-    pub fn local(local_env: &LocalEnv, pageserver: &Arc<PageServerNode>) -> ComputeControlPlane {
-        ComputeControlPlane {
-            base_port: 65431,
-            pageserver: Arc::clone(pageserver),
-            nodes: BTreeMap::new(),
-            env: local_env.clone(),
-        }
-    }
-
-    /// Connect to a page server, get base backup, and untar it to initialize a
-    /// new data directory
-    pub fn new_from_page_server(
+    pub fn new_node(
        &mut self,
-        is_test: bool,
-        timelineid: ZTimelineId,
+        tenant_id: ZTenantId,
+        name: &str,
+        timeline_id: ZTimelineId,
+        lsn: Option<Lsn>,
+        port: Option<u16>,
    ) -> Result<Arc<PostgresNode>> {
-        let node_id = self.nodes.len() as u32 + 1;
-
+        let port = port.unwrap_or_else(|| self.get_port());
        let node = Arc::new(PostgresNode {
-            name: format!("pg{}", node_id),
-            address: SocketAddr::new("127.0.0.1".parse().unwrap(), self.get_port()),
+            name: name.to_owned(),
+            address: SocketAddr::new("127.0.0.1".parse().unwrap(), port),
            env: self.env.clone(),
            pageserver: Arc::clone(&self.pageserver),
-            is_test,
-            timelineid,
+            is_test: false,
+            timeline_id,
+            lsn,
+            tenant_id,
+            uses_wal_proposer: false,
        });

-        node.init_from_page_server()?;
-        self.nodes.insert(node.name.clone(), Arc::clone(&node));
+        node.create_pgdata()?;
+        node.setup_pg_conf(self.env.pageserver.auth_type)?;

-        Ok(node)
-    }
-
-    pub fn new_test_node(&mut self, timelineid: ZTimelineId) -> Arc<PostgresNode> {
-        let node = self.new_from_page_server(true, timelineid);
-        assert!(node.is_ok());
-        let node = node.unwrap();
-
-        // Configure the node to stream WAL directly to the pageserver
-        node.append_conf(
-            "postgresql.conf",
-            format!(
-                "callmemaybe_connstring = '{}'\n", // FIXME escaping
-                node.connstr()
-            )
-            .as_str(),
-        );
-
-        node
-    }
-
-    pub fn new_test_master_node(&mut self, timelineid: ZTimelineId) -> Arc<PostgresNode> {
-        let node = self.new_from_page_server(true, timelineid).unwrap();
-
-        node.append_conf(
-            "postgresql.conf",
-            "synchronous_standby_names = 'safekeeper_proxy'\n",
-        );
-
-        node
-    }
-
-    pub fn new_node(&mut self, timelineid: ZTimelineId) -> Result<Arc<PostgresNode>> {
-        let node = self.new_from_page_server(false, timelineid).unwrap();
-
-        // Configure the node to stream WAL directly to the pageserver
-        node.append_conf(
-            "postgresql.conf",
-            format!(
-                "callmemaybe_connstring = '{}'\n", // FIXME escaping
-                node.connstr()
-            )
-            .as_str(),
-        );
+        self.nodes
+            .insert((tenant_id, node.name.clone()), Arc::clone(&node));

        Ok(node)
    }
@@ -145,13 +107,17 @@ impl ComputeControlPlane {

 ///////////////////////////////////////////////////////////////////////////////

+#[derive(Debug)]
 pub struct PostgresNode {
    pub address: SocketAddr,
    name: String,
    pub env: LocalEnv,
    pageserver: Arc<PageServerNode>,
    is_test: bool,
-    timelineid: ZTimelineId,
+    pub timeline_id: ZTimelineId,
+    pub lsn: Option<Lsn>, // if it's a read-only node. None for primary
+    pub tenant_id: ZTenantId,
+    uses_wal_proposer: bool,
 }

 impl PostgresNode {
@@ -167,43 +133,28 @@ impl PostgresNode {
            );
        }

-        lazy_static! {
-            static ref CONF_PORT_RE: Regex = Regex::new(r"(?m)^\s*port\s*=\s*(\d+)\s*$").unwrap();
-        }
-
        // parse data directory name
        let fname = entry.file_name();
        let name = fname.to_str().unwrap().to_string();

-        // find out tcp port in config file
+        // Read config file into memory
        let cfg_path = entry.path().join("postgresql.conf");
-        let config = fs::read_to_string(cfg_path.clone()).with_context(|| {
-            format!(
-                "failed to read config file in {}",
-                cfg_path.to_str().unwrap()
-            )
-        })?;
+        let cfg_path_str = cfg_path.to_string_lossy();
+        let mut conf_file = File::open(&cfg_path)
+            .with_context(|| format!("failed to open config file in {}", cfg_path_str))?;
+        let conf = PostgresConf::read(&mut conf_file)
+            .with_context(|| format!("failed to read config file in {}", cfg_path_str))?;

-        let err_msg = format!(
-            "failed to find port definition in config file {}",
-            cfg_path.to_str().unwrap()
-        );
-        let port: u16 = CONF_PORT_RE
-            .captures(config.as_str())
-            .ok_or(anyhow::Error::msg(err_msg.clone() + " 1"))?
-            .iter()
-            .last()
-            .ok_or(anyhow::Error::msg(err_msg.clone() + " 2"))?
-            .ok_or(anyhow::Error::msg(err_msg.clone() + " 3"))?
-            .as_str()
-            .parse()
-            .with_context(|| err_msg)?;
+        // Read a few options from the config file
+        let context = format!("in config file {}", cfg_path_str);
+        let port: u16 = conf.parse_field("port", &context)?;
+        let timeline_id: ZTimelineId = conf.parse_field("neon.timeline_id", &context)?;
+        let tenant_id: ZTenantId = conf.parse_field("neon.tenant_id", &context)?;
+        let uses_wal_proposer = conf.get("safekeepers").is_some();

-        // FIXME: What timeline is this server on? Would have to parse the postgresql.conf
-        // file for that, too. It's currently not needed for anything, but it would be
-        // nice to list the timeline in "zenith pg list"
-        let timelineid_buf = [0u8; 16];
-        let timelineid = ZTimelineId::from(timelineid_buf);
+        // parse recovery_target_lsn, if any
+        let recovery_target_lsn: Option<Lsn> =
+            conf.parse_field_optional("recovery_target_lsn", &context)?;

        // ok now
        Ok(PostgresNode {
@@ -212,105 +163,228 @@ impl PostgresNode {
            env: env.clone(),
            pageserver: Arc::clone(pageserver),
            is_test: false,
-            timelineid,
+            timeline_id,
+            lsn: recovery_target_lsn,
+            tenant_id,
+            uses_wal_proposer,
        })
    }

-    // Connect to a page server, get base backup, and untar it to initialize a
-    // new data directory
-    pub fn init_from_page_server(&self) -> Result<()> {
-        let pgdata = self.pgdata();
+    fn sync_safekeepers(&self, auth_token: &Option<String>) -> Result<Lsn> {
+        let pg_path = self.env.pg_bin_dir().join("postgres");
+        let mut cmd = Command::new(&pg_path);

+        cmd.arg("--sync-safekeepers")
+            .env_clear()
+            .env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
+            .env("DYLD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
+            .env("PGDATA", self.pgdata().to_str().unwrap())
+            .stdout(Stdio::piped())
+            // Comment this to avoid capturing stderr (useful if command hangs)
+            .stderr(Stdio::piped());
+
+        if let Some(token) = auth_token {
+            cmd.env("ZENITH_AUTH_TOKEN", token);
+        }
+
+        let sync_handle = cmd
+            .spawn()
+            .expect("postgres --sync-safekeepers failed to start");
+
+        let sync_output = sync_handle
+            .wait_with_output()
+            .expect("postgres --sync-safekeepers failed");
+        if !sync_output.status.success() {
+            anyhow::bail!(
+                "sync-safekeepers failed: '{}'",
+                String::from_utf8_lossy(&sync_output.stderr)
+            );
+        }
+
+        let lsn = Lsn::from_str(std::str::from_utf8(&sync_output.stdout)?.trim())?;
+        println!("Safekeepers synced on {}", lsn);
+        Ok(lsn)
+    }
+
+    /// Get basebackup from the pageserver as a tar archive and extract it
+    /// to the `self.pgdata()` directory.
+    fn do_basebackup(&self, lsn: Option<Lsn>) -> Result<()> {
        println!(
            "Extracting base backup to create postgres instance: path={} port={}",
-            pgdata.display(),
+            self.pgdata().display(),
            self.address.port()
        );

-        // initialize data directory
-        if self.is_test {
-            fs::remove_dir_all(&pgdata).ok();
-        }
+        let sql = if let Some(lsn) = lsn {
+            format!("basebackup {} {} {}", self.tenant_id, self.timeline_id, lsn)
+        } else {
+            format!("basebackup {} {}", self.tenant_id, self.timeline_id)
+        };

-        let sql = format!("basebackup {}", self.timelineid);
        let mut client = self
            .pageserver
            .page_server_psql_client()
-            .with_context(|| "connecting to page server failed")?;
+            .context("connecting to page server failed")?;

-        fs::create_dir_all(&pgdata)
-            .with_context(|| format!("could not create data directory {}", pgdata.display()))?;
-        fs::set_permissions(pgdata.as_path(), fs::Permissions::from_mode(0o700)).with_context(
-            || {
-                format!(
-                    "could not set permissions in data directory {}",
-                    pgdata.display()
-                )
-            },
-        )?;
-
-        // FIXME: The compute node should be able to stream the WAL it needs from the WAL safekeepers or archive.
-        // But that's not implemented yet. For now, 'pg_wal' is included in the base backup tarball that
-        // we receive from the Page Server, so we don't need to create the empty 'pg_wal' directory here.
-        //fs::create_dir_all(pgdata.join("pg_wal"))?;
-
-        let mut copyreader = client
+        let copyreader = client
            .copy_out(sql.as_str())
-            .with_context(|| "page server 'basebackup' command failed")?;
+            .context("page server 'basebackup' command failed")?;

-        // FIXME: Currently, we slurp the whole tarball into memory, and then extract it,
-        // but we really should do this:
-        //let mut ar = tar::Archive::new(copyreader);
-        let mut buf = vec![];
-        copyreader
-            .read_to_end(&mut buf)
-            .with_context(|| "reading base backup from page server failed")?;
-        let mut ar = tar::Archive::new(buf.as_slice());
-        ar.unpack(&pgdata)
-            .with_context(|| "extracting page backup failed")?;
-
-        // listen for selected port
-        self.append_conf(
-            "postgresql.conf",
-            &format!(
-                "max_wal_senders = 10\n\
-                 max_replication_slots = 10\n\
-                 hot_standby = on\n\
-                 shared_buffers = 1MB\n\
-                 max_connections = 100\n\
-                 wal_level = replica\n\
-                 listen_addresses = '{address}'\n\
-                 port = {port}\n",
-                address = self.address.ip(),
-                port = self.address.port()
-            ),
-        );
-
-        // Never clean up old WAL. TODO: We should use a replication
-        // slot or something proper, to prevent the compute node
-        // from removing WAL that hasn't been streamed to the safekeepr or
-        // page server yet. But this will do for now.
-        self.append_conf("postgresql.conf", &format!("wal_keep_size='10TB'\n"));
-
-        // Connect it to the page server.
-
-        // Configure that node to take pages from pageserver
-        self.append_conf(
-            "postgresql.conf",
-            &format!(
-                "page_server_connstring = 'host={} port={}'\n\
-                      zenith_timeline='{}'\n",
-                self.pageserver.address().ip(),
-                self.pageserver.address().port(),
-                self.timelineid
-            ),
-        );
+        // Read the archive directly from the `CopyOutReader`
+        //
+        // Set `ignore_zeros` so that unpack() reads all the Copy data and
+        // doesn't stop at the end-of-archive marker. Otherwise, if the server
+        // sends an Error after finishing the tarball, we will not notice it.
+        let mut ar = tar::Archive::new(copyreader);
+        ar.set_ignore_zeros(true);
+        ar.unpack(&self.pgdata())
+            .context("extracting base backup failed")?;

        Ok(())
    }

-    fn pgdata(&self) -> PathBuf {
-        self.env.repo_path.join("pgdatadirs").join(&self.name)
+    fn create_pgdata(&self) -> Result<()> {
+        fs::create_dir_all(&self.pgdata()).with_context(|| {
+            format!(
+                "could not create data directory {}",
+                self.pgdata().display()
+            )
+        })?;
+        fs::set_permissions(self.pgdata().as_path(), fs::Permissions::from_mode(0o700))
+            .with_context(|| {
+                format!(
+                    "could not set permissions in data directory {}",
+                    self.pgdata().display()
+                )
+            })
+    }
+
+    // Connect to a page server, get base backup, and untar it to initialize a
+    // new data directory
+    fn setup_pg_conf(&self, auth_type: AuthType) -> Result<()> {
+        let mut conf = PostgresConf::new();
+        conf.append("max_wal_senders", "10");
+        // wal_log_hints is mandatory when running against pageserver (see gh issue#192)
+        // TODO: is it possible to check wal_log_hints at pageserver side via XLOG_PARAMETER_CHANGE?
+        conf.append("wal_log_hints", "on");
+        conf.append("max_replication_slots", "10");
+        conf.append("hot_standby", "on");
+        conf.append("shared_buffers", "1MB");
+        conf.append("fsync", "off");
+        conf.append("max_connections", "100");
+        conf.append("wal_level", "replica");
+        // wal_sender_timeout is the maximum time to wait for WAL replication.
+        // It also defines how often the walreciever will send a feedback message to the wal sender.
+        conf.append("wal_sender_timeout", "5s");
+        conf.append("listen_addresses", &self.address.ip().to_string());
+        conf.append("port", &self.address.port().to_string());
+        conf.append("wal_keep_size", "0");
+        // walproposer panics when basebackup is invalid, it is pointless to restart in this case.
+        conf.append("restart_after_crash", "off");
+
+        // Configure the node to fetch pages from pageserver
+        let pageserver_connstr = {
+            let (host, port) = connection_host_port(&self.pageserver.pg_connection_config);
+
+            // Set up authentication
+            //
+            // $ZENITH_AUTH_TOKEN will be replaced with value from environment
+            // variable during compute pg startup. It is done this way because
+            // otherwise user will be able to retrieve the value using SHOW
+            // command or pg_settings
+            let password = if let AuthType::ZenithJWT = auth_type {
+                "$ZENITH_AUTH_TOKEN"
+            } else {
+                ""
+            };
+            // NOTE avoiding spaces in connection string, because it is less error prone if we forward it somewhere.
+            // Also note that not all parameters are supported here. Because in compute we substitute $ZENITH_AUTH_TOKEN
+            // We parse this string and build it back with token from env var, and for simplicity rebuild
+            // uses only needed variables namely host, port, user, password.
+            format!("postgresql://no_user:{}@{}:{}", password, host, port)
+        };
+        conf.append("shared_preload_libraries", "neon");
+        conf.append_line("");
+        conf.append("neon.pageserver_connstring", &pageserver_connstr);
+        conf.append("neon.tenant_id", &self.tenant_id.to_string());
+        conf.append("neon.timeline_id", &self.timeline_id.to_string());
+        if let Some(lsn) = self.lsn {
+            conf.append("recovery_target_lsn", &lsn.to_string());
+        }
+
+        conf.append_line("");
+        // Configure backpressure
+        // - Replication write lag depends on how fast the walreceiver can process incoming WAL.
+        //   This lag determines latency of get_page_at_lsn. Speed of applying WAL is about 10MB/sec,
+        //   so to avoid expiration of 1 minute timeout, this lag should not be larger than 600MB.
+        //   Actually latency should be much smaller (better if < 1sec). But we assume that recently
+        //   updates pages are not requested from pageserver.
+        // - Replication flush lag depends on speed of persisting data by checkpointer (creation of
+        //   delta/image layers) and advancing disk_consistent_lsn. Safekeepers are able to
+        //   remove/archive WAL only beyond disk_consistent_lsn. Too large a lag can cause long
+        //   recovery time (in case of pageserver crash) and disk space overflow at safekeepers.
+        // - Replication apply lag depends on speed of uploading changes to S3 by uploader thread.
+        //   To be able to restore database in case of pageserver node crash, safekeeper should not
+        //   remove WAL beyond this point. Too large lag can cause space exhaustion in safekeepers
+        //   (if they are not able to upload WAL to S3).
+        conf.append("max_replication_write_lag", "500MB");
+        conf.append("max_replication_flush_lag", "10GB");
+
+        if !self.env.safekeepers.is_empty() {
+            // Configure the node to connect to the safekeepers
+            conf.append("synchronous_standby_names", "walproposer");
+
+            let safekeepers = self
+                .env
+                .safekeepers
+                .iter()
+                .map(|sk| format!("localhost:{}", sk.pg_port))
+                .collect::<Vec<String>>()
+                .join(",");
+            conf.append("safekeepers", &safekeepers);
+        } else {
+            // We only use setup without safekeepers for tests,
+            // and don't care about data durability on pageserver,
+            // so set more relaxed synchronous_commit.
+            conf.append("synchronous_commit", "remote_write");
+
+            // Configure the node to stream WAL directly to the pageserver
+            // This isn't really a supported configuration, but can be useful for
+            // testing.
+            conf.append("synchronous_standby_names", "pageserver");
+        }
+
+        let mut file = File::create(self.pgdata().join("postgresql.conf"))?;
+        file.write_all(conf.to_string().as_bytes())?;
+
+        Ok(())
+    }
+
+    fn load_basebackup(&self, auth_token: &Option<String>) -> Result<()> {
+        let backup_lsn = if let Some(lsn) = self.lsn {
+            Some(lsn)
+        } else if self.uses_wal_proposer {
+            // LSN 0 means that it is bootstrap and we need to download just
+            // latest data from the pageserver. That is a bit clumsy but whole bootstrap
+            // procedure evolves quite actively right now, so let's think about it again
+            // when things would be more stable (TODO).
+            let lsn = self.sync_safekeepers(auth_token)?;
+            if lsn == Lsn(0) {
+                None
+            } else {
+                Some(lsn)
+            }
+        } else {
+            None
+        };
+
+        self.do_basebackup(backup_lsn)?;
+
+        Ok(())
+    }
+
+    pub fn pgdata(&self) -> PathBuf {
+        self.env.pg_data_dir(&self.tenant_id, &self.name)
    }

    pub fn status(&self) -> &str {
@@ -326,60 +400,106 @@ impl PostgresNode {
        }
    }

-    pub fn append_conf(&self, config: &str, opts: &str) {
-        OpenOptions::new()
-            .append(true)
-            .open(self.pgdata().join(config).to_str().unwrap())
-            .unwrap()
-            .write_all(opts.as_bytes())
-            .unwrap();
-    }
-
-    fn pg_ctl(&self, args: &[&str]) -> Result<()> {
+    fn pg_ctl(&self, args: &[&str], auth_token: &Option<String>) -> Result<()> {
        let pg_ctl_path = self.env.pg_bin_dir().join("pg_ctl");
+        let mut cmd = Command::new(pg_ctl_path);
+        cmd.args(
+            [
+                &[
+                    "-D",
+                    self.pgdata().to_str().unwrap(),
+                    "-l",
+                    self.pgdata().join("pg.log").to_str().unwrap(),
+                    "-w", //wait till pg_ctl actually does what was asked
+                ],
+                args,
+            ]
+            .concat(),
+        )
+        .env_clear()
+        .env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
+        .env("DYLD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap());
+        if let Some(token) = auth_token {
+            cmd.env("ZENITH_AUTH_TOKEN", token);
+        }

-        let pg_ctl = Command::new(pg_ctl_path)
-            .args(
-                [
-                    &[
-                        "-D",
-                        self.pgdata().to_str().unwrap(),
-                        "-l",
-                        self.pgdata().join("log").to_str().unwrap(),
-                    ],
-                    args,
-                ]
-                .concat(),
-            )
-            .env_clear()
-            .env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
-            .status()
-            .with_context(|| "pg_ctl failed")?;
-        if !pg_ctl.success() {
-            anyhow::bail!("pg_ctl failed");
+        let pg_ctl = cmd.output().context("pg_ctl failed")?;
+        if !pg_ctl.status.success() {
+            anyhow::bail!(
+                "pg_ctl failed, exit code: {}, stdout: {}, stderr: {}",
+                pg_ctl.status,
+                String::from_utf8_lossy(&pg_ctl.stdout),
+                String::from_utf8_lossy(&pg_ctl.stderr),
+            );
        }
        Ok(())
    }

-    pub fn start(&self) -> Result<()> {
+    pub fn start(&self, auth_token: &Option<String>) -> Result<()> {
+        // Bail if the node already running.
+        if self.status() == "running" {
+            anyhow::bail!("The node is already running");
+        }
+
+        // 1. We always start compute node from scratch, so
+        // if old dir exists, preserve 'postgresql.conf' and drop the directory
+        let postgresql_conf_path = self.pgdata().join("postgresql.conf");
+        let postgresql_conf = fs::read(&postgresql_conf_path).with_context(|| {
+            format!(
+                "failed to read config file in {}",
+                postgresql_conf_path.to_str().unwrap()
+            )
+        })?;
+        fs::remove_dir_all(&self.pgdata())?;
+        self.create_pgdata()?;
+
+        // 2. Bring back config files
+        fs::write(&postgresql_conf_path, postgresql_conf)?;
+
+        // 3. Load basebackup
+        self.load_basebackup(auth_token)?;
+
+        if self.lsn.is_some() {
+            File::create(self.pgdata().join("standby.signal"))?;
+        }
+
+        // 4. Finally start the compute node postgres
        println!("Starting postgres node at '{}'", self.connstr());
-        self.pg_ctl(&["start"])
+        self.pg_ctl(&["start"], auth_token)
    }

-    pub fn restart(&self) -> Result<()> {
-        self.pg_ctl(&["restart"])
+    pub fn restart(&self, auth_token: &Option<String>) -> Result<()> {
+        self.pg_ctl(&["restart"], auth_token)
    }

-    pub fn stop(&self) -> Result<()> {
-        self.pg_ctl(&["-m", "immediate", "stop"])
+    pub fn stop(&self, destroy: bool) -> Result<()> {
+        // If we are going to destroy data directory,
+        // use immediate shutdown mode, otherwise,
+        // shutdown gracefully to leave the data directory sane.
+        //
+        // Compute node always starts from scratch, so stop
+        // without destroy only used for testing and debugging.
+        //
+        if destroy {
+            self.pg_ctl(&["-m", "immediate", "stop"], &None)?;
+            println!(
+                "Destroying postgres data directory '{}'",
+                self.pgdata().to_str().unwrap()
+            );
+            fs::remove_dir_all(&self.pgdata())?;
+        } else {
+            self.pg_ctl(&["stop"], &None)?;
+        }
+        Ok(())
    }

    pub fn connstr(&self) -> String {
        format!(
-            "host={} port={} user={}",
+            "host={} port={} user={} dbname={}",
            self.address.ip(),
            self.address.port(),
-            self.whoami()
+            "cloud_admin",
+            "postgres"
        )
    }

@@ -389,62 +509,10 @@ impl PostgresNode {
            .output()
            .expect("failed to execute whoami");

-        if !output.status.success() {
-            panic!("whoami failed");
-        }
+        assert!(output.status.success(), "whoami failed");

        String::from_utf8(output.stdout).unwrap().trim().to_string()
    }
-
-    pub fn safe_psql(&self, db: &str, sql: &str) -> Vec<tokio_postgres::Row> {
-        let connstring = format!(
-            "host={} port={} dbname={} user={}",
-            self.address.ip(),
-            self.address.port(),
-            db,
-            self.whoami()
-        );
-        let mut client = Client::connect(connstring.as_str(), NoTls).unwrap();
-
-        println!("Running {}", sql);
-        client.query(sql, &[]).unwrap()
-    }
-
-    pub fn open_psql(&self, db: &str) -> Client {
-        let connstring = format!(
-            "host={} port={} dbname={} user={}",
-            self.address.ip(),
-            self.address.port(),
-            db,
-            self.whoami()
-        );
-        Client::connect(connstring.as_str(), NoTls).unwrap()
-    }
-
-    pub fn start_proxy(&self, wal_acceptors: &str) -> WalProposerNode {
-        let proxy_path = self.env.pg_bin_dir().join("safekeeper_proxy");
-        match Command::new(proxy_path.as_path())
-            .args(&["--ztimelineid", &self.timelineid.to_string()])
-            .args(&["-s", wal_acceptors])
-            .args(&["-h", &self.address.ip().to_string()])
-            .args(&["-p", &self.address.port().to_string()])
-            .arg("-v")
-            .stderr(
-                OpenOptions::new()
-                    .create(true)
-                    .append(true)
-                    .open(self.pgdata().join("safekeeper_proxy.log"))
-                    .unwrap(),
-            )
-            .spawn()
-        {
-            Ok(child) => WalProposerNode { pid: child.id() },
-            Err(e) => panic!("Failed to launch {:?}: {}", proxy_path, e),
-        }
-    }
-
-    // TODO
-    pub fn pg_bench() {}
 }

 impl Drop for PostgresNode {
@@ -453,7 +521,7 @@ impl Drop for PostgresNode {
    // and checking it here. But let just clean datadirs on start.
    fn drop(&mut self) {
        if self.is_test {
-            let _ = self.stop();
+            let _ = self.stop(true);
        }
    }
 }
--- a/control_plane/src/etcd.rs
+++ b/control_plane/src/etcd.rs
@@ -0,0 +1,97 @@
+use std::{
+    fs,
+    path::PathBuf,
+    process::{Command, Stdio},
+};
+
+use anyhow::Context;
+use nix::{
+    sys::signal::{kill, Signal},
+    unistd::Pid,
+};
+
+use crate::{local_env, read_pidfile};
+
+pub fn start_etcd_process(env: &local_env::LocalEnv) -> anyhow::Result<()> {
+    let etcd_broker = &env.etcd_broker;
+    println!(
+        "Starting etcd broker using {}",
+        etcd_broker.etcd_binary_path.display()
+    );
+
+    let etcd_data_dir = env.base_data_dir.join("etcd");
+    fs::create_dir_all(&etcd_data_dir).with_context(|| {
+        format!(
+            "Failed to create etcd data dir: {}",
+            etcd_data_dir.display()
+        )
+    })?;
+
+    let etcd_stdout_file =
+        fs::File::create(etcd_data_dir.join("etcd.stdout.log")).with_context(|| {
+            format!(
+                "Failed to create etcd stout file in directory {}",
+                etcd_data_dir.display()
+            )
+        })?;
+    let etcd_stderr_file =
+        fs::File::create(etcd_data_dir.join("etcd.stderr.log")).with_context(|| {
+            format!(
+                "Failed to create etcd stderr file in directory {}",
+                etcd_data_dir.display()
+            )
+        })?;
+    let client_urls = etcd_broker.comma_separated_endpoints();
+
+    let etcd_process = Command::new(&etcd_broker.etcd_binary_path)
+        .args(&[
+            format!("--data-dir={}", etcd_data_dir.display()),
+            format!("--listen-client-urls={client_urls}"),
+            format!("--advertise-client-urls={client_urls}"),
+            // Set --quota-backend-bytes to keep the etcd virtual memory
+            // size smaller. Our test etcd clusters are very small.
+            // See https://github.com/etcd-io/etcd/issues/7910
+            "--quota-backend-bytes=100000000".to_string(),
+        ])
+        .stdout(Stdio::from(etcd_stdout_file))
+        .stderr(Stdio::from(etcd_stderr_file))
+        .spawn()
+        .context("Failed to spawn etcd subprocess")?;
+    let pid = etcd_process.id();
+
+    let etcd_pid_file_path = etcd_pid_file_path(env);
+    fs::write(&etcd_pid_file_path, pid.to_string()).with_context(|| {
+        format!(
+            "Failed to create etcd pid file at {}",
+            etcd_pid_file_path.display()
+        )
+    })?;
+
+    Ok(())
+}
+
+pub fn stop_etcd_process(env: &local_env::LocalEnv) -> anyhow::Result<()> {
+    let etcd_path = &env.etcd_broker.etcd_binary_path;
+    println!("Stopping etcd broker at {}", etcd_path.display());
+
+    let etcd_pid_file_path = etcd_pid_file_path(env);
+    let pid = Pid::from_raw(read_pidfile(&etcd_pid_file_path).with_context(|| {
+        format!(
+            "Failed to read etcd pid file at {}",
+            etcd_pid_file_path.display()
+        )
+    })?);
+
+    kill(pid, Signal::SIGTERM).with_context(|| {
+        format!(
+            "Failed to stop etcd with pid {pid} at {}",
+            etcd_pid_file_path.display()
+        )
+    })?;
+
+    Ok(())
+}
+
+fn etcd_pid_file_path(env: &local_env::LocalEnv) -> PathBuf {
+    env.base_data_dir.join("etcd.pid")
+}
--- a/control_plane/src/lib.rs
+++ b/control_plane/src/lib.rs
@@ -1,12 +1,64 @@
 //
 // Local control plane.
 //
-// Can start, cofigure and stop postgres instances running as a local processes.
+// Can start, configure and stop postgres instances running as a local processes.
 //
 // Intended to be used in integration tests and in CLI tools for
 // local installations.
 //
+use anyhow::{anyhow, bail, Context, Result};
+use std::fs;
+use std::path::Path;
+use std::process::Command;

 pub mod compute;
+pub mod etcd;
 pub mod local_env;
+pub mod postgresql_conf;
+pub mod safekeeper;
 pub mod storage;
+
+/// Read a PID file
+///
+/// We expect a file that contains a single integer.
+/// We return an i32 for compatibility with libc and nix.
+pub fn read_pidfile(pidfile: &Path) -> Result<i32> {
+    let pid_str = fs::read_to_string(pidfile)
+        .with_context(|| format!("failed to read pidfile {:?}", pidfile))?;
+    let pid: i32 = pid_str
+        .parse()
+        .map_err(|_| anyhow!("failed to parse pidfile {:?}", pidfile))?;
+    if pid < 1 {
+        bail!("pidfile {:?} contained bad value '{}'", pidfile, pid);
+    }
+    Ok(pid)
+}
+
+fn fill_rust_env_vars(cmd: &mut Command) -> &mut Command {
+    let cmd = cmd.env_clear().env("RUST_BACKTRACE", "1");
+
+    let var = "LLVM_PROFILE_FILE";
+    if let Some(val) = std::env::var_os(var) {
+        cmd.env(var, val);
+    }
+
+    const RUST_LOG_KEY: &str = "RUST_LOG";
+    if let Ok(rust_log_value) = std::env::var(RUST_LOG_KEY) {
+        cmd.env(RUST_LOG_KEY, rust_log_value)
+    } else {
+        cmd
+    }
+}
+
+fn fill_aws_secrets_vars(mut cmd: &mut Command) -> &mut Command {
+    for env_key in [
+        "AWS_ACCESS_KEY_ID",
+        "AWS_SECRET_ACCESS_KEY",
+        "AWS_SESSION_TOKEN",
+    ] {
+        if let Ok(value) = std::env::var(env_key) {
+            cmd = cmd.env(env_key, value);
+        }
+    }
+    cmd
+}
--- a/control_plane/src/local_env.rs
+++ b/control_plane/src/local_env.rs
@@ -1,389 +1,502 @@
-//
-// This module is responsible for locating and loading paths in a local setup.
-//
-// Now it also provides init method which acts like a stub for proper installation
-// script which will use local paths.
-//
-use anyhow::Context;
-use bytes::Bytes;
-use rand::Rng;
+//! This module is responsible for locating and loading paths in a local setup.
+//!
+//! Now it also provides init method which acts like a stub for proper installation
+//! script which will use local paths.
+
+use anyhow::{bail, ensure, Context};
+use reqwest::Url;
+use serde::{Deserialize, Serialize};
+use serde_with::{serde_as, DisplayFromStr};
+use std::collections::HashMap;
 use std::env;
 use std::fs;
 use std::path::{Path, PathBuf};
 use std::process::{Command, Stdio};
+use utils::{
+    auth::{encode_from_key_file, Claims, Scope},
+    postgres_backend::AuthType,
+    zid::{NodeId, ZTenantId, ZTenantTimelineId, ZTimelineId},
+};

-use anyhow::Result;
-use serde_derive::{Deserialize, Serialize};
-
-use pageserver::ZTimelineId;
-use walkeeper::xlog_utils;
+use crate::safekeeper::SafekeeperNode;

 //
-// This data structure represents deserialized zenith config, which should be
-// located in ~/.zenith
+// This data structures represents neon_local CLI config
 //
-// TODO: should we also support ZENITH_CONF env var?
+// It is deserialized from the .neon/config file, or the config file passed
+// to 'neon_local init --config=<path>' option. See control_plane/simple.conf for
+// an example.
 //
-#[derive(Serialize, Deserialize, Clone)]
+#[serde_as]
+#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
 pub struct LocalEnv {
-    // Path to the Repository. Here page server and compute nodes will create and store their data.
-    pub repo_path: PathBuf,
-
-    // System identifier, from the PostgreSQL control file
-    pub systemid: u64,
+    // Base directory for all the nodes (the pageserver, safekeepers and
+    // compute nodes).
+    //
+    // This is not stored in the config file. Rather, this is the path where the
+    // config file itself is. It is read from the NEON_REPO_DIR env variable or
+    // '.neon' if not given.
+    #[serde(skip)]
+    pub base_data_dir: PathBuf,

    // Path to postgres distribution. It's expected that "bin", "include",
    // "lib", "share" from postgres distribution are there. If at some point
    // in time we will be able to run against vanilla postgres we may split that
    // to four separate paths and match OS-specific installation layout.
+    #[serde(default)]
    pub pg_distrib_dir: PathBuf,

    // Path to pageserver binary.
+    #[serde(default)]
    pub zenith_distrib_dir: PathBuf,
+
+    // Default tenant ID to use with the 'zenith' command line utility, when
+    // --tenantid is not explicitly specified.
+    #[serde(default)]
+    #[serde_as(as = "Option<DisplayFromStr>")]
+    pub default_tenant_id: Option<ZTenantId>,
+
+    // used to issue tokens during e.g pg start
+    #[serde(default)]
+    pub private_key_path: PathBuf,
+
+    pub etcd_broker: EtcdBroker,
+
+    pub pageserver: PageServerConf,
+
+    #[serde(default)]
+    pub safekeepers: Vec<SafekeeperConf>,
+
+    /// Keep human-readable aliases in memory (and persist them to config), to hide ZId hex strings from the user.
+    #[serde(default)]
+    // A `HashMap<String, HashMap<ZTenantId, ZTimelineId>>` would be more appropriate here,
+    // but deserialization into a generic toml object as `toml::Value::try_from` fails with an error.
+    // https://toml.io/en/v1.0.0 does not contain a concept of "a table inside another table".
+    #[serde_as(as = "HashMap<_, Vec<(DisplayFromStr, DisplayFromStr)>>")]
+    branch_name_mappings: HashMap<String, Vec<(ZTenantId, ZTimelineId)>>,
+}
+
+/// Etcd broker config for cluster internal communication.
+#[serde_as]
+#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
+pub struct EtcdBroker {
+    /// A prefix to all to any key when pushing/polling etcd from a node.
+    #[serde(default)]
+    pub broker_etcd_prefix: Option<String>,
+
+    /// Broker (etcd) endpoints for storage nodes coordination, e.g. 'http://127.0.0.1:2379'.
+    #[serde(default)]
+    #[serde_as(as = "Vec<DisplayFromStr>")]
+    pub broker_endpoints: Vec<Url>,
+
+    /// Etcd binary path to use.
+    #[serde(default)]
+    pub etcd_binary_path: PathBuf,
+}
+
+impl EtcdBroker {
+    pub fn locate_etcd() -> anyhow::Result<PathBuf> {
+        let which_output = Command::new("which")
+            .arg("etcd")
+            .output()
+            .context("Failed to run 'which etcd' command")?;
+        let stdout = String::from_utf8_lossy(&which_output.stdout);
+        ensure!(
+            which_output.status.success(),
+            "'which etcd' invocation failed. Status: {}, stdout: {stdout}, stderr: {}",
+            which_output.status,
+            String::from_utf8_lossy(&which_output.stderr)
+        );
+
+        let etcd_path = PathBuf::from(stdout.trim());
+        ensure!(
+            etcd_path.is_file(),
+            "'which etcd' invocation was successful, but the path it returned is not a file or does not exist: {}",
+            etcd_path.display()
+        );
+
+        Ok(etcd_path)
+    }
+
+    pub fn comma_separated_endpoints(&self) -> String {
+        self.broker_endpoints
+            .iter()
+            .map(|url| {
+                // URL by default adds a '/' path at the end, which is not what etcd CLI wants.
+                let url_string = url.as_str();
+                if url_string.ends_with('/') {
+                    &url_string[0..url_string.len() - 1]
+                } else {
+                    url_string
+                }
+            })
+            .fold(String::new(), |mut comma_separated_urls, url| {
+                if !comma_separated_urls.is_empty() {
+                    comma_separated_urls.push(',');
+                }
+                comma_separated_urls.push_str(url);
+                comma_separated_urls
+            })
+    }
+}
+
+#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
+#[serde(default)]
+pub struct PageServerConf {
+    // node id
+    pub id: NodeId,
+    // Pageserver connection settings
+    pub listen_pg_addr: String,
+    pub listen_http_addr: String,
+
+    // used to determine which auth type is used
+    pub auth_type: AuthType,
+
+    // jwt auth token used for communication with pageserver
+    pub auth_token: String,
+}
+
+impl Default for PageServerConf {
+    fn default() -> Self {
+        Self {
+            id: NodeId(0),
+            listen_pg_addr: String::new(),
+            listen_http_addr: String::new(),
+            auth_type: AuthType::Trust,
+            auth_token: String::new(),
+        }
+    }
+}
+
+#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
+#[serde(default)]
+pub struct SafekeeperConf {
+    pub id: NodeId,
+    pub pg_port: u16,
+    pub http_port: u16,
+    pub sync: bool,
+    pub remote_storage: Option<String>,
+    pub backup_threads: Option<u32>,
+    pub auth_enabled: bool,
+}
+
+impl Default for SafekeeperConf {
+    fn default() -> Self {
+        Self {
+            id: NodeId(0),
+            pg_port: 0,
+            http_port: 0,
+            sync: true,
+            remote_storage: None,
+            backup_threads: None,
+            auth_enabled: false,
+        }
+    }
 }

 impl LocalEnv {
-    // postgres installation
+    // postgres installation paths
    pub fn pg_bin_dir(&self) -> PathBuf {
        self.pg_distrib_dir.join("bin")
    }
    pub fn pg_lib_dir(&self) -> PathBuf {
        self.pg_distrib_dir.join("lib")
    }
-}

-fn zenith_repo_dir() -> PathBuf {
-    // Find repository path
-    match std::env::var_os("ZENITH_REPO_DIR") {
-        Some(val) => PathBuf::from(val.to_str().unwrap()),
-        None => ".zenith".into(),
-    }
-}
-
-//
-// Initialize a new Zenith repository
-//
-pub fn init() -> Result<()> {
-    // check if config already exists
-    let repo_path = zenith_repo_dir();
-    if repo_path.exists() {
-        anyhow::bail!(
-            "{} already exists. Perhaps already initialized?",
-            repo_path.to_str().unwrap()
-        );
+    pub fn pageserver_bin(&self) -> anyhow::Result<PathBuf> {
+        Ok(self.zenith_distrib_dir.join("pageserver"))
    }

-    // Now we can run init only from crate directory, so check that current dir is our crate.
-    // Use 'pageserver/Cargo.toml' existence as evidendce.
-    let cargo_path = env::current_dir()?;
-    if !cargo_path.join("pageserver/Cargo.toml").exists() {
-        anyhow::bail!(
-            "Current dirrectory does not look like a zenith repo. \
-            Please, run 'init' from zenith repo root."
-        );
+    pub fn safekeeper_bin(&self) -> anyhow::Result<PathBuf> {
+        Ok(self.zenith_distrib_dir.join("safekeeper"))
    }

-    // ok, now check that expected binaries are present
-
-    // check postgres
-    let pg_distrib_dir = cargo_path.join("tmp_install");
-    let pg_path = pg_distrib_dir.join("bin/postgres");
-    if !pg_path.exists() {
-        anyhow::bail!(
-            "Can't find postres binary at {}. \
-                       Perhaps './pgbuild.sh' is needed to build it first.",
-            pg_path.to_str().unwrap()
-        );
+    pub fn pg_data_dirs_path(&self) -> PathBuf {
+        self.base_data_dir.join("pgdatadirs").join("tenants")
    }

-    // check pageserver
-    let zenith_distrib_dir = cargo_path.join("target/debug/");
-    let pageserver_path = zenith_distrib_dir.join("pageserver");
-    if !pageserver_path.exists() {
-        anyhow::bail!(
-            "Can't find pageserver binary at {}. Please build it.",
-            pageserver_path.to_str().unwrap()
-        );
+    pub fn pg_data_dir(&self, tenantid: &ZTenantId, branch_name: &str) -> PathBuf {
+        self.pg_data_dirs_path()
+            .join(tenantid.to_string())
+            .join(branch_name)
    }

-    // ok, we are good to go
-    let mut conf = LocalEnv {
-        repo_path: repo_path.clone(),
-        pg_distrib_dir,
-        zenith_distrib_dir,
-        systemid: 0,
-    };
-    init_repo(&mut conf)?;
-
-    Ok(())
-}
-
-pub fn init_repo(local_env: &mut LocalEnv) -> Result<()> {
-    let repopath = &local_env.repo_path;
-    fs::create_dir(&repopath)
-        .with_context(|| format!("could not create directory {}", repopath.display()))?;
-    fs::create_dir(repopath.join("pgdatadirs"))?;
-    fs::create_dir(repopath.join("timelines"))?;
-    fs::create_dir(repopath.join("refs"))?;
-    fs::create_dir(repopath.join("refs").join("branches"))?;
-    fs::create_dir(repopath.join("refs").join("tags"))?;
-    println!("created directory structure in {}", repopath.display());
-
-    // Create initial timeline
-    let tli = create_timeline(&local_env, None)?;
-    let timelinedir = repopath.join("timelines").join(tli.to_string());
-    println!("created initial timeline {}", timelinedir.display());
-
-    // Run initdb
-    //
-    // FIXME: we create it temporarily in "tmp" directory, and move it into
-    // the repository. Use "tempdir()" or something? Or just create it directly
-    // in the repo?
-    let initdb_path = local_env.pg_bin_dir().join("initdb");
-    let _initdb = Command::new(initdb_path)
-        .args(&["-D", "tmp"])
-        .arg("--no-instructions")
-        .env_clear()
-        .env("LD_LIBRARY_PATH", local_env.pg_lib_dir().to_str().unwrap())
-        .stdout(Stdio::null())
-        .status()
-        .with_context(|| "failed to execute initdb")?;
-    println!("initdb succeeded");
-
-    // Read control file to extract the LSN and system id
-    let controlfile =
-        postgres_ffi::decode_pg_control(Bytes::from(fs::read("tmp/global/pg_control")?))?;
-    let systemid = controlfile.system_identifier;
-    let lsn = controlfile.checkPoint;
-    let lsnstr = format!("{:016X}", lsn);
-
-    // Move the initial WAL file
-    fs::rename(
-        "tmp/pg_wal/000000010000000000000001",
-        timelinedir
-            .join("wal")
-            .join("000000010000000000000001.partial"),
-    )?;
-    println!("moved initial WAL file");
-
-    // Remove pg_wal
-    fs::remove_dir_all("tmp/pg_wal")?;
-    println!("removed tmp/pg_wal");
-
-    force_crash_recovery(&PathBuf::from("tmp"))?;
-    println!("updated pg_control");
-
-    let target = timelinedir.join("snapshots").join(&lsnstr);
-    fs::rename("tmp", &target)?;
-    println!("moved 'tmp' to {}", target.display());
-
-    // Create 'main' branch to refer to the initial timeline
-    let data = tli.to_string();
-    fs::write(repopath.join("refs").join("branches").join("main"), data)?;
-    println!("created main branch");
-
-    // Also update the system id in the LocalEnv
-    local_env.systemid = systemid;
-
-    // write config
-    let toml = toml::to_string(&local_env)?;
-    fs::write(repopath.join("config"), toml)?;
-
-    println!(
-        "new zenith repository was created in {}",
-        repopath.display()
-    );
-
-    Ok(())
-}
-
-// If control file says the cluster was shut down cleanly, modify it, to mark
-// it as crashed. That forces crash recovery when you start the cluster.
-//
-// FIXME:
-// We currently do this to the initial snapshot in "zenith init". It would
-// be more natural to do this when the snapshot is restored instead, but we
-// currently don't have any code to create new snapshots, so it doesn't matter
-// Or better yet, use a less hacky way of putting the cluster into recovery.
-// Perhaps create a backup label file in the data directory when it's restored.
-fn force_crash_recovery(datadir: &Path) -> Result<()> {
-    // Read in the control file
-    let controlfilepath = datadir.to_path_buf().join("global").join("pg_control");
-    let mut controlfile =
-        postgres_ffi::decode_pg_control(Bytes::from(fs::read(controlfilepath.as_path())?))?;
-
-    controlfile.state = postgres_ffi::DBState_DB_IN_PRODUCTION;
-
-    fs::write(
-        controlfilepath.as_path(),
-        postgres_ffi::encode_pg_control(controlfile),
-    )?;
-
-    Ok(())
-}
-
-// check that config file is present
-pub fn load_config(repopath: &Path) -> Result<LocalEnv> {
-    if !repopath.exists() {
-        anyhow::bail!(
-            "Zenith config is not found in {}. You need to run 'zenith init' first",
-            repopath.to_str().unwrap()
-        );
+    // TODO: move pageserver files into ./pageserver
+    pub fn pageserver_data_dir(&self) -> PathBuf {
+        self.base_data_dir.clone()
    }

-    // load and parse file
-    let config = fs::read_to_string(repopath.join("config"))?;
-    toml::from_str(config.as_str()).map_err(|e| e.into())
-}
-
-// local env for tests
-pub fn test_env(testname: &str) -> LocalEnv {
-    fs::create_dir_all("../tmp_check").expect("could not create directory ../tmp_check");
-
-    let repo_path = Path::new(env!("CARGO_MANIFEST_DIR"))
-        .join("../tmp_check/")
-        .join(testname);
-
-    // Remove remnants of old test repo
-    let _ = fs::remove_dir_all(&repo_path);
-
-    let mut local_env = LocalEnv {
-        repo_path,
-        pg_distrib_dir: Path::new(env!("CARGO_MANIFEST_DIR")).join("../tmp_install"),
-        zenith_distrib_dir: cargo_bin_dir(),
-        systemid: 0,
-    };
-    init_repo(&mut local_env).expect("could not initialize zenith repository");
-    return local_env;
-}
-
-// Find the directory where the binaries were put (i.e. target/debug/)
-pub fn cargo_bin_dir() -> PathBuf {
-    let mut pathbuf = std::env::current_exe().unwrap();
-
-    pathbuf.pop();
-    if pathbuf.ends_with("deps") {
-        pathbuf.pop();
+    pub fn safekeeper_data_dir(&self, data_dir_name: &str) -> PathBuf {
+        self.base_data_dir.join("safekeepers").join(data_dir_name)
    }

-    return pathbuf;
-}
+    pub fn register_branch_mapping(
+        &mut self,
+        branch_name: String,
+        tenant_id: ZTenantId,
+        timeline_id: ZTimelineId,
+    ) -> anyhow::Result<()> {
+        let existing_values = self
+            .branch_name_mappings
+            .entry(branch_name.clone())
+            .or_default();

-#[derive(Debug, Clone, Copy)]
-pub struct PointInTime {
-    pub timelineid: ZTimelineId,
-    pub lsn: u64,
-}
+        let existing_ids = existing_values
+            .iter()
+            .find(|(existing_tenant_id, _)| existing_tenant_id == &tenant_id);

-fn create_timeline(local_env: &LocalEnv, ancestor: Option<PointInTime>) -> Result<ZTimelineId> {
-    let repopath = &local_env.repo_path;
-
-    // Create initial timeline
-    let mut tli_buf = [0u8; 16];
-    rand::thread_rng().fill(&mut tli_buf);
-    let timelineid = ZTimelineId::from(tli_buf);
-
-    let timelinedir = repopath.join("timelines").join(timelineid.to_string());
-
-    fs::create_dir(&timelinedir)?;
-    fs::create_dir(&timelinedir.join("snapshots"))?;
-    fs::create_dir(&timelinedir.join("wal"))?;
-
-    if let Some(ancestor) = ancestor {
-        let data = format!(
-            "{}@{:X}/{:X}",
-            ancestor.timelineid,
-            ancestor.lsn >> 32,
-            ancestor.lsn & 0xffffffff
-        );
-        fs::write(timelinedir.join("ancestor"), data)?;
-    }
-
-    Ok(timelineid)
-}
-
-// Parse an LSN in the format used in filenames
-//
-// For example: 00000000015D3DD8
-//
-fn parse_lsn(s: &str) -> std::result::Result<u64, std::num::ParseIntError> {
-    u64::from_str_radix(s, 16)
-}
-
-// Create a new branch in the repository (for the "zenith branch" subcommand)
-pub fn create_branch(
-    local_env: &LocalEnv,
-    branchname: &str,
-    startpoint: PointInTime,
-) -> Result<()> {
-    let repopath = &local_env.repo_path;
-
-    // create a new timeline for it
-    let newtli = create_timeline(local_env, Some(startpoint))?;
-    let newtimelinedir = repopath.join("timelines").join(newtli.to_string());
-
-    let data = newtli.to_string();
-    fs::write(
-        repopath.join("refs").join("branches").join(branchname),
-        data,
-    )?;
-
-    // Copy the latest snapshot (TODO: before the startpoint) and all WAL
-    // TODO: be smarter and avoid the copying...
-    let (_maxsnapshot, oldsnapshotdir) = find_latest_snapshot(local_env, startpoint.timelineid)?;
-    let copy_opts = fs_extra::dir::CopyOptions::new();
-    fs_extra::dir::copy(oldsnapshotdir, newtimelinedir.join("snapshots"), &copy_opts)?;
-
-    let oldtimelinedir = repopath
-        .join("timelines")
-        .join(startpoint.timelineid.to_string());
-    let mut copy_opts = fs_extra::dir::CopyOptions::new();
-    copy_opts.content_only = true;
-    fs_extra::dir::copy(
-        oldtimelinedir.join("wal"),
-        newtimelinedir.join("wal"),
-        &copy_opts,
-    )?;
-
-    Ok(())
-}
-
-// Find the end of valid WAL in a wal directory
-pub fn find_end_of_wal(local_env: &LocalEnv, timeline: ZTimelineId) -> Result<u64> {
-    let repopath = &local_env.repo_path;
-    let waldir = repopath
-        .join("timelines")
-        .join(timeline.to_string())
-        .join("wal");
-
-    let (lsn, _tli) = xlog_utils::find_end_of_wal(&waldir, 16 * 1024 * 1024, true);
-
-    return Ok(lsn);
-}
-
-// Find the latest snapshot for a timeline
-fn find_latest_snapshot(local_env: &LocalEnv, timeline: ZTimelineId) -> Result<(u64, PathBuf)> {
-    let repopath = &local_env.repo_path;
-
-    let snapshotsdir = repopath
-        .join("timelines")
-        .join(timeline.to_string())
-        .join("snapshots");
-    let paths = fs::read_dir(&snapshotsdir)?;
-    let mut maxsnapshot: u64 = 0;
-    let mut snapshotdir: Option<PathBuf> = None;
-    for path in paths {
-        let path = path?;
-        let filename = path.file_name().to_str().unwrap().to_owned();
-        if let Ok(lsn) = parse_lsn(&filename) {
-            maxsnapshot = std::cmp::max(lsn, maxsnapshot);
-            snapshotdir = Some(path.path());
+        if let Some((_, old_timeline_id)) = existing_ids {
+            if old_timeline_id == &timeline_id {
+                Ok(())
+            } else {
+                bail!("branch '{branch_name}' is already mapped to timeline {old_timeline_id}, cannot map to another timeline {timeline_id}");
+            }
+        } else {
+            existing_values.push((tenant_id, timeline_id));
+            Ok(())
        }
    }
-    if maxsnapshot == 0 {
-        // TODO: check ancestor timeline
-        anyhow::bail!("no snapshot found in {}", snapshotsdir.display());
+
+    pub fn get_branch_timeline_id(
+        &self,
+        branch_name: &str,
+        tenant_id: ZTenantId,
+    ) -> Option<ZTimelineId> {
+        self.branch_name_mappings
+            .get(branch_name)?
+            .iter()
+            .find(|(mapped_tenant_id, _)| mapped_tenant_id == &tenant_id)
+            .map(|&(_, timeline_id)| timeline_id)
+            .map(ZTimelineId::from)
    }

-    Ok((maxsnapshot, snapshotdir.unwrap()))
+    pub fn timeline_name_mappings(&self) -> HashMap<ZTenantTimelineId, String> {
+        self.branch_name_mappings
+            .iter()
+            .flat_map(|(name, tenant_timelines)| {
+                tenant_timelines.iter().map(|&(tenant_id, timeline_id)| {
+                    (ZTenantTimelineId::new(tenant_id, timeline_id), name.clone())
+                })
+            })
+            .collect()
+    }
+
+    /// Create a LocalEnv from a config file.
+    ///
+    /// Unlike 'load_config', this function fills in any defaults that are missing
+    /// from the config file.
+    pub fn parse_config(toml: &str) -> anyhow::Result<Self> {
+        let mut env: LocalEnv = toml::from_str(toml)?;
+
+        // Find postgres binaries.
+        // Follow POSTGRES_DISTRIB_DIR if set, otherwise look in "tmp_install".
+        if env.pg_distrib_dir == Path::new("") {
+            if let Some(postgres_bin) = env::var_os("POSTGRES_DISTRIB_DIR") {
+                env.pg_distrib_dir = postgres_bin.into();
+            } else {
+                let cwd = env::current_dir()?;
+                env.pg_distrib_dir = cwd.join("tmp_install")
+            }
+        }
+
+        // Find zenith binaries.
+        if env.zenith_distrib_dir == Path::new("") {
+            env.zenith_distrib_dir = env::current_exe()?.parent().unwrap().to_owned();
+        }
+
+        // If no initial tenant ID was given, generate it.
+        if env.default_tenant_id.is_none() {
+            env.default_tenant_id = Some(ZTenantId::generate());
+        }
+
+        env.base_data_dir = base_path();
+
+        Ok(env)
+    }
+
+    /// Locate and load config
+    pub fn load_config() -> anyhow::Result<Self> {
+        let repopath = base_path();
+
+        if !repopath.exists() {
+            bail!(
+                "Zenith config is not found in {}. You need to run 'neon_local init' first",
+                repopath.to_str().unwrap()
+            );
+        }
+
+        // TODO: check that it looks like a zenith repository
+
+        // load and parse file
+        let config = fs::read_to_string(repopath.join("config"))?;
+        let mut env: LocalEnv = toml::from_str(config.as_str())?;
+
+        env.base_data_dir = repopath;
+
+        Ok(env)
+    }
+
+    pub fn persist_config(&self, base_path: &Path) -> anyhow::Result<()> {
+        // Currently, the user first passes a config file with 'neon_local init --config=<path>'
+        // We read that in, in `create_config`, and fill any missing defaults. Then it's saved
+        // to .neon/config. TODO: We lose any formatting and comments along the way, which is
+        // a bit sad.
+        let mut conf_content = r#"# This file describes a locale deployment of the page server
+# and safekeeeper node. It is read by the 'neon_local' command-line
+# utility.
+"#
+        .to_string();
+
+        // Convert the LocalEnv to a toml file.
+        //
+        // This could be as simple as this:
+        //
+        // conf_content += &toml::to_string_pretty(env)?;
+        //
+        // But it results in a "values must be emitted before tables". I'm not sure
+        // why, AFAICS the table, i.e. 'safekeepers: Vec<SafekeeperConf>' is last.
+        // Maybe rust reorders the fields to squeeze avoid padding or something?
+        // In any case, converting to toml::Value first, and serializing that, works.
+        // See https://github.com/alexcrichton/toml-rs/issues/142
+        conf_content += &toml::to_string_pretty(&toml::Value::try_from(self)?)?;
+
+        let target_config_path = base_path.join("config");
+        fs::write(&target_config_path, conf_content).with_context(|| {
+            format!(
+                "Failed to write config file into path '{}'",
+                target_config_path.display()
+            )
+        })
+    }
+
+    // this function is used only for testing purposes in CLI e g generate tokens during init
+    pub fn generate_auth_token(&self, claims: &Claims) -> anyhow::Result<String> {
+        let private_key_path = if self.private_key_path.is_absolute() {
+            self.private_key_path.to_path_buf()
+        } else {
+            self.base_data_dir.join(&self.private_key_path)
+        };
+
+        let key_data = fs::read(private_key_path)?;
+        encode_from_key_file(claims, &key_data)
+    }
+
+    //
+    // Initialize a new Neon repository
+    //
+    pub fn init(&mut self) -> anyhow::Result<()> {
+        // check if config already exists
+        let base_path = &self.base_data_dir;
+        ensure!(
+            base_path != Path::new(""),
+            "repository base path is missing"
+        );
+
+        ensure!(
+            !base_path.exists(),
+            "directory '{}' already exists. Perhaps already initialized?",
+            base_path.display()
+        );
+        if !self.pg_distrib_dir.join("bin/postgres").exists() {
+            bail!(
+                "Can't find postgres binary at {}",
+                self.pg_distrib_dir.display()
+            );
+        }
+        for binary in ["pageserver", "safekeeper"] {
+            if !self.zenith_distrib_dir.join(binary).exists() {
+                bail!(
+                    "Can't find binary '{binary}' in zenith distrib dir '{}'",
+                    self.zenith_distrib_dir.display()
+                );
+            }
+        }
+
+        fs::create_dir(&base_path)?;
+
+        // generate keys for jwt
+        // openssl genrsa -out private_key.pem 2048
+        let private_key_path;
+        if self.private_key_path == PathBuf::new() {
+            private_key_path = base_path.join("auth_private_key.pem");
+            let keygen_output = Command::new("openssl")
+                .arg("genrsa")
+                .args(&["-out", private_key_path.to_str().unwrap()])
+                .arg("2048")
+                .stdout(Stdio::null())
+                .output()
+                .context("failed to generate auth private key")?;
+            if !keygen_output.status.success() {
+                bail!(
+                    "openssl failed: '{}'",
+                    String::from_utf8_lossy(&keygen_output.stderr)
+                );
+            }
+            self.private_key_path = PathBuf::from("auth_private_key.pem");
+
+            let public_key_path = base_path.join("auth_public_key.pem");
+            // openssl rsa -in private_key.pem -pubout -outform PEM -out public_key.pem
+            let keygen_output = Command::new("openssl")
+                .arg("rsa")
+                .args(&["-in", private_key_path.to_str().unwrap()])
+                .arg("-pubout")
+                .args(&["-outform", "PEM"])
+                .args(&["-out", public_key_path.to_str().unwrap()])
+                .stdout(Stdio::null())
+                .output()
+                .context("failed to generate auth private key")?;
+            if !keygen_output.status.success() {
+                bail!(
+                    "openssl failed: '{}'",
+                    String::from_utf8_lossy(&keygen_output.stderr)
+                );
+            }
+        }
+
+        self.pageserver.auth_token =
+            self.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
+
+        fs::create_dir_all(self.pg_data_dirs_path())?;
+
+        for safekeeper in &self.safekeepers {
+            fs::create_dir_all(SafekeeperNode::datadir_path_by_id(self, safekeeper.id))?;
+        }
+
+        self.persist_config(base_path)
+    }
+}
+
+fn base_path() -> PathBuf {
+    match std::env::var_os("NEON_REPO_DIR") {
+        Some(val) => PathBuf::from(val),
+        None => PathBuf::from(".neon"),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn simple_conf_parsing() {
+        let simple_conf_toml = include_str!("../simple.conf");
+        let simple_conf_parse_result = LocalEnv::parse_config(simple_conf_toml);
+        assert!(
+            simple_conf_parse_result.is_ok(),
+            "failed to parse simple config {simple_conf_toml}, reason: {simple_conf_parse_result:?}"
+        );
+
+        let string_to_replace = "broker_endpoints = ['http://127.0.0.1:2379']";
+        let spoiled_url_str = "broker_endpoints = ['!@$XOXO%^&']";
+        let spoiled_url_toml = simple_conf_toml.replace(string_to_replace, spoiled_url_str);
+        assert!(
+            spoiled_url_toml.contains(spoiled_url_str),
+            "Failed to replace string {string_to_replace} in the toml file {simple_conf_toml}"
+        );
+        let spoiled_url_parse_result = LocalEnv::parse_config(&spoiled_url_toml);
+        assert!(
+            spoiled_url_parse_result.is_err(),
+            "expected toml with invalid Url {spoiled_url_toml} to fail the parsing, but got {spoiled_url_parse_result:?}"
+        );
+    }
 }
--- a/control_plane/src/postgresql_conf.rs
+++ b/control_plane/src/postgresql_conf.rs
@@ -0,0 +1,226 @@
+///
+/// Module for parsing postgresql.conf file.
+///
+/// NOTE: This doesn't implement the full, correct postgresql.conf syntax. Just
+/// enough to extract a few settings we need in Zenith, assuming you don't do
+/// funny stuff like include-directives or funny escaping.
+use anyhow::{bail, Context, Result};
+use once_cell::sync::Lazy;
+use regex::Regex;
+use std::collections::HashMap;
+use std::fmt;
+use std::io::BufRead;
+use std::str::FromStr;
+
+/// In-memory representation of a postgresql.conf file
+#[derive(Default)]
+pub struct PostgresConf {
+    lines: Vec<String>,
+    hash: HashMap<String, String>,
+}
+
+static CONF_LINE_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"^((?:\w|\.)+)\s*=\s*(\S+)$").unwrap());
+
+impl PostgresConf {
+    pub fn new() -> PostgresConf {
+        PostgresConf::default()
+    }
+
+    /// Read file into memory
+    pub fn read(read: impl std::io::Read) -> Result<PostgresConf> {
+        let mut result = Self::new();
+
+        for line in std::io::BufReader::new(read).lines() {
+            let line = line?;
+
+            // Store each line in a vector, in original format
+            result.lines.push(line.clone());
+
+            // Also parse each line and insert key=value lines into a hash map.
+            //
+            // FIXME: This doesn't match exactly the flex/bison grammar in PostgreSQL.
+            // But it's close enough for our usage.
+            let line = line.trim();
+            if line.starts_with('#') {
+                // comment, ignore
+                continue;
+            } else if let Some(caps) = CONF_LINE_RE.captures(line) {
+                let name = caps.get(1).unwrap().as_str();
+                let raw_val = caps.get(2).unwrap().as_str();
+
+                if let Ok(val) = deescape_str(raw_val) {
+                    // Note: if there's already an entry in the hash map for
+                    // this key, this will replace it. That's the behavior what
+                    // we want; when PostgreSQL reads the file, each line
+                    // overrides any previous value for the same setting.
+                    result.hash.insert(name.to_string(), val.to_string());
+                }
+            }
+        }
+        Ok(result)
+    }
+
+    /// Return the current value of 'option'
+    pub fn get(&self, option: &str) -> Option<&str> {
+        self.hash.get(option).map(|x| x.as_ref())
+    }
+
+    /// Return the current value of a field, parsed to the right datatype.
+    ///
+    /// This calls the FromStr::parse() function on the value of the field. If
+    /// the field does not exist, or parsing fails, returns an error.
+    ///
+    pub fn parse_field<T>(&self, field_name: &str, context: &str) -> Result<T>
+    where
+        T: FromStr,
+        <T as FromStr>::Err: std::error::Error + Send + Sync + 'static,
+    {
+        self.get(field_name)
+            .with_context(|| format!("could not find '{}' option {}", field_name, context))?
+            .parse::<T>()
+            .with_context(|| format!("could not parse '{}' option {}", field_name, context))
+    }
+
+    pub fn parse_field_optional<T>(&self, field_name: &str, context: &str) -> Result<Option<T>>
+    where
+        T: FromStr,
+        <T as FromStr>::Err: std::error::Error + Send + Sync + 'static,
+    {
+        if let Some(val) = self.get(field_name) {
+            let result = val
+                .parse::<T>()
+                .with_context(|| format!("could not parse '{}' option {}", field_name, context))?;
+
+            Ok(Some(result))
+        } else {
+            Ok(None)
+        }
+    }
+
+    ///
+    /// Note: if you call this multiple times for the same option, the config
+    /// file will a line for each call. It would be nice to have a function
+    /// to change an existing line, but that's a TODO.
+    ///
+    pub fn append(&mut self, option: &str, value: &str) {
+        self.lines
+            .push(format!("{}={}\n", option, escape_str(value)));
+        self.hash.insert(option.to_string(), value.to_string());
+    }
+
+    /// Append an arbitrary non-setting line to the config file
+    pub fn append_line(&mut self, line: &str) {
+        self.lines.push(line.to_string());
+    }
+}
+
+impl fmt::Display for PostgresConf {
+    /// Return the whole configuration file as a string
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        for line in self.lines.iter() {
+            f.write_str(line)?;
+        }
+        Ok(())
+    }
+}
+
+/// Escape a value for putting in postgresql.conf.
+fn escape_str(s: &str) -> String {
+    // If the string doesn't contain anything that needs quoting or escaping, return it
+    // as it is.
+    //
+    // The first part of the regex, before the '|', matches the INTEGER rule in the
+    // PostgreSQL flex grammar (guc-file.l). It matches plain integers like "123" and
+    // "-123", and also accepts units like "10MB". The second part of the regex matches
+    // the UNQUOTED_STRING rule, and accepts strings that contain a single word, beginning
+    // with a letter. That covers words like "off" or "posix". Everything else is quoted.
+    //
+    // This regex is a bit more conservative than the rules in guc-file.l, so we quote some
+    // strings that PostgreSQL would accept without quoting, but that's OK.
+
+    static UNQUOTED_RE: Lazy<Regex> =
+        Lazy::new(|| Regex::new(r"(^[-+]?[0-9]+[a-zA-Z]*$)|(^[a-zA-Z][a-zA-Z0-9]*$)").unwrap());
+
+    if UNQUOTED_RE.is_match(s) {
+        s.to_string()
+    } else {
+        // Otherwise escape and quote it
+        let s = s
+            .replace('\\', "\\\\")
+            .replace('\n', "\\n")
+            .replace('\'', "''");
+
+        "\'".to_owned() + &s + "\'"
+    }
+}
+
+/// De-escape a possibly-quoted value.
+///
+/// See `DeescapeQuotedString` function in PostgreSQL sources for how PostgreSQL
+/// does this.
+fn deescape_str(s: &str) -> Result<String> {
+    // If the string has a quote at the beginning and end, strip them out.
+    if s.len() >= 2 && s.starts_with('\'') && s.ends_with('\'') {
+        let mut result = String::new();
+
+        let mut iter = s[1..(s.len() - 1)].chars().peekable();
+        while let Some(c) = iter.next() {
+            let newc = if c == '\\' {
+                match iter.next() {
+                    Some('b') => '\x08',
+                    Some('f') => '\x0c',
+                    Some('n') => '\n',
+                    Some('r') => '\r',
+                    Some('t') => '\t',
+                    Some('0'..='7') => {
+                        // TODO
+                        bail!("octal escapes not supported");
+                    }
+                    Some(n) => n,
+                    None => break,
+                }
+            } else if c == '\'' && iter.peek() == Some(&'\'') {
+                // doubled quote becomes just one quote
+                iter.next().unwrap()
+            } else {
+                c
+            };
+
+            result.push(newc);
+        }
+        Ok(result)
+    } else {
+        Ok(s.to_string())
+    }
+}
+
+#[test]
+fn test_postgresql_conf_escapes() -> Result<()> {
+    assert_eq!(escape_str("foo bar"), "'foo bar'");
+    // these don't need to be quoted
+    assert_eq!(escape_str("foo"), "foo");
+    assert_eq!(escape_str("123"), "123");
+    assert_eq!(escape_str("+123"), "+123");
+    assert_eq!(escape_str("-10"), "-10");
+    assert_eq!(escape_str("1foo"), "1foo");
+    assert_eq!(escape_str("foo1"), "foo1");
+    assert_eq!(escape_str("10MB"), "10MB");
+    assert_eq!(escape_str("-10kB"), "-10kB");
+
+    // these need quoting and/or escaping
+    assert_eq!(escape_str("foo bar"), "'foo bar'");
+    assert_eq!(escape_str("fo'o"), "'fo''o'");
+    assert_eq!(escape_str("fo\no"), "'fo\\no'");
+    assert_eq!(escape_str("fo\\o"), "'fo\\\\o'");
+    assert_eq!(escape_str("10 cats"), "'10 cats'");
+
+    // Test de-escaping
+    assert_eq!(deescape_str(&escape_str("foo"))?, "foo");
+    assert_eq!(deescape_str(&escape_str("fo'o\nba\\r"))?, "fo'o\nba\\r");
+    assert_eq!(deescape_str("'\\b\\f\\n\\r\\t'")?, "\x08\x0c\n\r\t");
+
+    // octal-escapes are currently not supported
+    assert!(deescape_str("'foo\\7\\07\\007'").is_err());
+
+    Ok(())
+}
--- a/control_plane/src/safekeeper.rs
+++ b/control_plane/src/safekeeper.rs
@@ -0,0 +1,304 @@
+use std::io::Write;
+use std::path::PathBuf;
+use std::process::Command;
+use std::sync::Arc;
+use std::time::Duration;
+use std::{io, result, thread};
+
+use anyhow::bail;
+use nix::errno::Errno;
+use nix::sys::signal::{kill, Signal};
+use nix::unistd::Pid;
+use postgres::Config;
+use reqwest::blocking::{Client, RequestBuilder, Response};
+use reqwest::{IntoUrl, Method};
+use safekeeper::http::models::TimelineCreateRequest;
+use thiserror::Error;
+use utils::{
+    connstring::connection_address,
+    http::error::HttpErrorBody,
+    zid::{NodeId, ZTenantId, ZTimelineId},
+};
+
+use crate::local_env::{LocalEnv, SafekeeperConf};
+use crate::storage::PageServerNode;
+use crate::{fill_aws_secrets_vars, fill_rust_env_vars, read_pidfile};
+
+#[derive(Error, Debug)]
+pub enum SafekeeperHttpError {
+    #[error("Reqwest error: {0}")]
+    Transport(#[from] reqwest::Error),
+
+    #[error("Error: {0}")]
+    Response(String),
+}
+
+type Result<T> = result::Result<T, SafekeeperHttpError>;
+
+pub trait ResponseErrorMessageExt: Sized {
+    fn error_from_body(self) -> Result<Self>;
+}
+
+impl ResponseErrorMessageExt for Response {
+    fn error_from_body(self) -> Result<Self> {
+        let status = self.status();
+        if !(status.is_client_error() || status.is_server_error()) {
+            return Ok(self);
+        }
+
+        // reqwest do not export it's error construction utility functions, so lets craft the message ourselves
+        let url = self.url().to_owned();
+        Err(SafekeeperHttpError::Response(
+            match self.json::<HttpErrorBody>() {
+                Ok(err_body) => format!("Error: {}", err_body.msg),
+                Err(_) => format!("Http error ({}) at {}.", status.as_u16(), url),
+            },
+        ))
+    }
+}
+
+//
+// Control routines for safekeeper.
+//
+// Used in CLI and tests.
+//
+#[derive(Debug)]
+pub struct SafekeeperNode {
+    pub id: NodeId,
+
+    pub conf: SafekeeperConf,
+
+    pub pg_connection_config: Config,
+    pub env: LocalEnv,
+    pub http_client: Client,
+    pub http_base_url: String,
+
+    pub pageserver: Arc<PageServerNode>,
+}
+
+impl SafekeeperNode {
+    pub fn from_env(env: &LocalEnv, conf: &SafekeeperConf) -> SafekeeperNode {
+        let pageserver = Arc::new(PageServerNode::from_env(env));
+
+        SafekeeperNode {
+            id: conf.id,
+            conf: conf.clone(),
+            pg_connection_config: Self::safekeeper_connection_config(conf.pg_port),
+            env: env.clone(),
+            http_client: Client::new(),
+            http_base_url: format!("http://127.0.0.1:{}/v1", conf.http_port),
+            pageserver,
+        }
+    }
+
+    /// Construct libpq connection string for connecting to this safekeeper.
+    fn safekeeper_connection_config(port: u16) -> Config {
+        // TODO safekeeper authentication not implemented yet
+        format!("postgresql://no_user@127.0.0.1:{}/no_db", port)
+            .parse()
+            .unwrap()
+    }
+
+    pub fn datadir_path_by_id(env: &LocalEnv, sk_id: NodeId) -> PathBuf {
+        env.safekeeper_data_dir(format!("sk{}", sk_id).as_ref())
+    }
+
+    pub fn datadir_path(&self) -> PathBuf {
+        SafekeeperNode::datadir_path_by_id(&self.env, self.id)
+    }
+
+    pub fn pid_file(&self) -> PathBuf {
+        self.datadir_path().join("safekeeper.pid")
+    }
+
+    pub fn start(&self) -> anyhow::Result<()> {
+        print!(
+            "Starting safekeeper at '{}' in '{}'",
+            connection_address(&self.pg_connection_config),
+            self.datadir_path().display()
+        );
+        io::stdout().flush().unwrap();
+
+        let listen_pg = format!("127.0.0.1:{}", self.conf.pg_port);
+        let listen_http = format!("127.0.0.1:{}", self.conf.http_port);
+
+        let mut cmd = Command::new(self.env.safekeeper_bin()?);
+        fill_rust_env_vars(
+            cmd.args(&["-D", self.datadir_path().to_str().unwrap()])
+                .args(&["--id", self.id.to_string().as_ref()])
+                .args(&["--listen-pg", &listen_pg])
+                .args(&["--listen-http", &listen_http])
+                .args(&["--recall", "1 second"])
+                .arg("--daemonize"),
+        );
+        if !self.conf.sync {
+            cmd.arg("--no-sync");
+        }
+
+        let comma_separated_endpoints = self.env.etcd_broker.comma_separated_endpoints();
+        if !comma_separated_endpoints.is_empty() {
+            cmd.args(&["--broker-endpoints", &comma_separated_endpoints]);
+        }
+        if let Some(prefix) = self.env.etcd_broker.broker_etcd_prefix.as_deref() {
+            cmd.args(&["--broker-etcd-prefix", prefix]);
+        }
+        if let Some(threads) = self.conf.backup_threads {
+            cmd.args(&["--backup-threads", threads.to_string().as_ref()]);
+        }
+        if let Some(ref remote_storage) = self.conf.remote_storage {
+            cmd.args(&["--remote-storage", remote_storage]);
+        }
+        if self.conf.auth_enabled {
+            cmd.arg("--auth-validation-public-key-path");
+            // PathBuf is better be passed as is, not via `String`.
+            cmd.arg(self.env.base_data_dir.join("auth_public_key.pem"));
+        }
+
+        fill_aws_secrets_vars(&mut cmd);
+
+        if !cmd.status()?.success() {
+            bail!(
+                "Safekeeper failed to start. See '{}' for details.",
+                self.datadir_path().join("safekeeper.log").display()
+            );
+        }
+
+        // It takes a while for the safekeeper to start up. Wait until it is
+        // open for business.
+        const RETRIES: i8 = 15;
+        for retries in 1..RETRIES {
+            match self.check_status() {
+                Ok(_) => {
+                    println!("\nSafekeeper started");
+                    return Ok(());
+                }
+                Err(err) => {
+                    match err {
+                        SafekeeperHttpError::Transport(err) => {
+                            if err.is_connect() && retries < 5 {
+                                print!(".");
+                                io::stdout().flush().unwrap();
+                            } else {
+                                if retries == 5 {
+                                    println!() // put a line break after dots for second message
+                                }
+                                println!(
+                                    "Safekeeper not responding yet, err {} retrying ({})...",
+                                    err, retries
+                                );
+                            }
+                        }
+                        SafekeeperHttpError::Response(msg) => {
+                            bail!("safekeeper failed to start: {} ", msg)
+                        }
+                    }
+                    thread::sleep(Duration::from_secs(1));
+                }
+            }
+        }
+        bail!("safekeeper failed to start in {} seconds", RETRIES);
+    }
+
+    ///
+    /// Stop the server.
+    ///
+    /// If 'immediate' is true, we use SIGQUIT, killing the process immediately.
+    /// Otherwise we use SIGTERM, triggering a clean shutdown
+    ///
+    /// If the server is not running, returns success
+    ///
+    pub fn stop(&self, immediate: bool) -> anyhow::Result<()> {
+        let pid_file = self.pid_file();
+        if !pid_file.exists() {
+            println!("Safekeeper {} is already stopped", self.id);
+            return Ok(());
+        }
+        let pid = read_pidfile(&pid_file)?;
+        let pid = Pid::from_raw(pid);
+
+        let sig = if immediate {
+            print!("Stopping safekeeper {} immediately..", self.id);
+            Signal::SIGQUIT
+        } else {
+            print!("Stopping safekeeper {} gracefully..", self.id);
+            Signal::SIGTERM
+        };
+        io::stdout().flush().unwrap();
+        match kill(pid, sig) {
+            Ok(_) => (),
+            Err(Errno::ESRCH) => {
+                println!(
+                    "Safekeeper with pid {} does not exist, but a PID file was found",
+                    pid
+                );
+                return Ok(());
+            }
+            Err(err) => bail!(
+                "Failed to send signal to safekeeper with pid {}: {}",
+                pid,
+                err.desc()
+            ),
+        }
+
+        // Wait until process is gone
+        for i in 0..600 {
+            let signal = None; // Send no signal, just get the error code
+            match kill(pid, signal) {
+                Ok(_) => (), // Process exists, keep waiting
+                Err(Errno::ESRCH) => {
+                    // Process not found, we're done
+                    println!("done!");
+                    return Ok(());
+                }
+                Err(err) => bail!(
+                    "Failed to send signal to pageserver with pid {}: {}",
+                    pid,
+                    err.desc()
+                ),
+            };
+
+            if i % 10 == 0 {
+                print!(".");
+                io::stdout().flush().unwrap();
+            }
+            thread::sleep(Duration::from_millis(100));
+        }
+
+        bail!("Failed to stop safekeeper with pid {}", pid);
+    }
+
+    fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
+        // TODO: authentication
+        //if self.env.auth_type == AuthType::ZenithJWT {
+        //    builder = builder.bearer_auth(&self.env.safekeeper_auth_token)
+        //}
+        self.http_client.request(method, url)
+    }
+
+    pub fn check_status(&self) -> Result<()> {
+        self.http_request(Method::GET, format!("{}/{}", self.http_base_url, "status"))
+            .send()?
+            .error_from_body()?;
+        Ok(())
+    }
+
+    pub fn timeline_create(
+        &self,
+        tenant_id: ZTenantId,
+        timeline_id: ZTimelineId,
+        peer_ids: Vec<NodeId>,
+    ) -> Result<()> {
+        Ok(self
+            .http_request(
+                Method::POST,
+                format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
+            )
+            .json(&TimelineCreateRequest {
+                timeline_id,
+                peer_ids,
+            })
+            .send()?
+            .error_from_body()?
+            .json()?)
+    }
+}
--- a/control_plane/src/storage.rs
+++ b/control_plane/src/storage.rs
@@ -1,135 +1,70 @@
-use anyhow::Result;
-use std::fs;
-use std::io;
-use std::net::SocketAddr;
-use std::net::TcpStream;
+use std::collections::HashMap;
+use std::fs::File;
+use std::io::{BufReader, Write};
+use std::num::NonZeroU64;
 use std::path::{Path, PathBuf};
 use std::process::Command;
-use std::str::FromStr;
-use std::sync::atomic::{AtomicBool, Ordering};
-use std::sync::Arc;
-use std::thread;
 use std::time::Duration;
+use std::{io, result, thread};

-use postgres::{Client, NoTls};
+use anyhow::{bail, Context};
+use nix::errno::Errno;
+use nix::sys::signal::{kill, Signal};
+use nix::unistd::Pid;
+use pageserver::http::models::{
+    TenantConfigRequest, TenantCreateRequest, TenantInfo, TimelineCreateRequest, TimelineInfo,
+};
+use postgres::{Config, NoTls};
+use reqwest::blocking::{Client, RequestBuilder, Response};
+use reqwest::{IntoUrl, Method};
+use thiserror::Error;
+use utils::{
+    connstring::connection_address,
+    http::error::HttpErrorBody,
+    lsn::Lsn,
+    postgres_backend::AuthType,
+    zid::{ZTenantId, ZTimelineId},
+};

-use crate::compute::PostgresNode;
 use crate::local_env::LocalEnv;
-use pageserver::ZTimelineId;
+use crate::{fill_aws_secrets_vars, fill_rust_env_vars, read_pidfile};

-//
-// Collection of several example deployments useful for tests.
-//
-// I'm intendedly modelling storage and compute control planes as a separate entities
-// as it is closer to the actual setup.
-//
-pub struct TestStorageControlPlane {
-    pub wal_acceptors: Vec<WalAcceptorNode>,
-    pub pageserver: Arc<PageServerNode>,
-    pub test_done: AtomicBool,
-    pub repopath: PathBuf,
+#[derive(Error, Debug)]
+pub enum PageserverHttpError {
+    #[error("Reqwest error: {0}")]
+    Transport(#[from] reqwest::Error),
+
+    #[error("Error: {0}")]
+    Response(String),
 }

-impl TestStorageControlPlane {
-    // Peek into the repository, to grab the timeline ID of given branch
-    pub fn get_branch_timeline(&self, branchname: &str) -> ZTimelineId {
-        let branchpath = self.repopath.join("refs/branches/".to_owned() + branchname);
-
-        ZTimelineId::from_str(&(fs::read_to_string(&branchpath).unwrap())).unwrap()
-    }
-
-    // postgres <-> page_server
-    //
-    // Initialize a new repository and configure a page server to run in it
-    //
-    pub fn one_page_server(local_env: &LocalEnv) -> TestStorageControlPlane {
-        let repopath = local_env.repo_path.clone();
-
-        let pserver = Arc::new(PageServerNode {
-            env: local_env.clone(),
-            kill_on_exit: true,
-            listen_address: None,
-        });
-        pserver.start().unwrap();
-
-        TestStorageControlPlane {
-            wal_acceptors: Vec::new(),
-            pageserver: pserver,
-            test_done: AtomicBool::new(false),
-            repopath: repopath,
-        }
-    }
-
-    pub fn one_page_server_no_start(local_env: &LocalEnv) -> TestStorageControlPlane {
-        let repopath = local_env.repo_path.clone();
-
-        let pserver = Arc::new(PageServerNode {
-            env: local_env.clone(),
-            kill_on_exit: true,
-            listen_address: None,
-        });
-
-        TestStorageControlPlane {
-            wal_acceptors: Vec::new(),
-            pageserver: pserver,
-            test_done: AtomicBool::new(false),
-            repopath: repopath,
-        }
-    }
-
-    // postgres <-> {wal_acceptor1, wal_acceptor2, ...}
-    pub fn fault_tolerant(local_env: &LocalEnv, redundancy: usize) -> TestStorageControlPlane {
-        let repopath = local_env.repo_path.clone();
-
-        let mut cplane = TestStorageControlPlane {
-            wal_acceptors: Vec::new(),
-            pageserver: Arc::new(PageServerNode {
-                env: local_env.clone(),
-                kill_on_exit: true,
-                listen_address: None,
-            }),
-            test_done: AtomicBool::new(false),
-            repopath: repopath,
-        };
-        cplane.pageserver.start().unwrap();
-
-        const WAL_ACCEPTOR_PORT: usize = 54321;
-
-        for i in 0..redundancy {
-            let wal_acceptor = WalAcceptorNode {
-                listen: format!("127.0.0.1:{}", WAL_ACCEPTOR_PORT + i)
-                    .parse()
-                    .unwrap(),
-                data_dir: local_env.repo_path.join(format!("wal_acceptor_{}", i)),
-                env: local_env.clone(),
-            };
-            wal_acceptor.init();
-            wal_acceptor.start();
-            cplane.wal_acceptors.push(wal_acceptor);
-        }
-        cplane
-    }
-
-    pub fn stop(&self) {
-        self.test_done.store(true, Ordering::Relaxed);
-    }
-
-    pub fn get_wal_acceptor_conn_info(&self) -> String {
-        self.wal_acceptors
-            .iter()
-            .map(|wa| wa.listen.to_string())
-            .collect::<Vec<String>>()
-            .join(",")
-    }
-
-    pub fn is_running(&self) -> bool {
-        self.test_done.load(Ordering::Relaxed)
+impl From<anyhow::Error> for PageserverHttpError {
+    fn from(e: anyhow::Error) -> Self {
+        Self::Response(e.to_string())
    }
 }

-impl Drop for TestStorageControlPlane {
-    fn drop(&mut self) {
-        self.stop();
+type Result<T> = result::Result<T, PageserverHttpError>;
+
+pub trait ResponseErrorMessageExt: Sized {
+    fn error_from_body(self) -> Result<Self>;
+}
+
+impl ResponseErrorMessageExt for Response {
+    fn error_from_body(self) -> Result<Self> {
+        let status = self.status();
+        if !(status.is_client_error() || status.is_server_error()) {
+            return Ok(self);
+        }
+
+        // reqwest do not export it's error construction utility functions, so lets craft the message ourselves
+        let url = self.url().to_owned();
+        Err(PageserverHttpError::Response(
+            match self.json::<HttpErrorBody>() {
+                Ok(err_body) => format!("Error: {}", err_body.msg),
+                Err(_) => format!("Http error ({}) at {}.", status.as_u16(), url),
+            },
+        ))
    }
 }

@@ -138,276 +73,493 @@ impl Drop for TestStorageControlPlane {
 //
 // Used in CLI and tests.
 //
+#[derive(Debug)]
 pub struct PageServerNode {
-    kill_on_exit: bool,
-    listen_address: Option<SocketAddr>,
+    pub pg_connection_config: Config,
    pub env: LocalEnv,
+    pub http_client: Client,
+    pub http_base_url: String,
 }

 impl PageServerNode {
    pub fn from_env(env: &LocalEnv) -> PageServerNode {
-        PageServerNode {
-            kill_on_exit: false,
-            listen_address: None, // default
+        let password = if env.pageserver.auth_type == AuthType::ZenithJWT {
+            &env.pageserver.auth_token
+        } else {
+            ""
+        };
+
+        Self {
+            pg_connection_config: Self::pageserver_connection_config(
+                password,
+                &env.pageserver.listen_pg_addr,
+            ),
            env: env.clone(),
+            http_client: Client::new(),
+            http_base_url: format!("http://{}/v1", env.pageserver.listen_http_addr),
        }
    }

-    pub fn address(&self) -> SocketAddr {
-        match self.listen_address {
-            Some(addr) => addr,
-            None => "127.0.0.1:64000".parse().unwrap(),
+    /// Construct libpq connection string for connecting to the pageserver.
+    fn pageserver_connection_config(password: &str, listen_addr: &str) -> Config {
+        format!("postgresql://no_user:{password}@{listen_addr}/no_db")
+            .parse()
+            .unwrap()
+    }
+
+    pub fn initialize(
+        &self,
+        create_tenant: Option<ZTenantId>,
+        initial_timeline_id: Option<ZTimelineId>,
+        config_overrides: &[&str],
+    ) -> anyhow::Result<ZTimelineId> {
+        let id = format!("id={}", self.env.pageserver.id);
+        // FIXME: the paths should be shell-escaped to handle paths with spaces, quotas etc.
+        let pg_distrib_dir_param =
+            format!("pg_distrib_dir='{}'", self.env.pg_distrib_dir.display());
+        let authg_type_param = format!("auth_type='{}'", self.env.pageserver.auth_type);
+        let listen_http_addr_param = format!(
+            "listen_http_addr='{}'",
+            self.env.pageserver.listen_http_addr
+        );
+        let listen_pg_addr_param =
+            format!("listen_pg_addr='{}'", self.env.pageserver.listen_pg_addr);
+        let broker_endpoints_param = format!(
+            "broker_endpoints=[{}]",
+            self.env
+                .etcd_broker
+                .broker_endpoints
+                .iter()
+                .map(|url| format!("'{url}'"))
+                .collect::<Vec<_>>()
+                .join(",")
+        );
+        let broker_etcd_prefix_param = self
+            .env
+            .etcd_broker
+            .broker_etcd_prefix
+            .as_ref()
+            .map(|prefix| format!("broker_etcd_prefix='{prefix}'"));
+
+        let mut init_config_overrides = config_overrides.to_vec();
+        init_config_overrides.push(&id);
+        init_config_overrides.push(&pg_distrib_dir_param);
+        init_config_overrides.push(&authg_type_param);
+        init_config_overrides.push(&listen_http_addr_param);
+        init_config_overrides.push(&listen_pg_addr_param);
+        init_config_overrides.push(&broker_endpoints_param);
+
+        if let Some(broker_etcd_prefix_param) = broker_etcd_prefix_param.as_deref() {
+            init_config_overrides.push(broker_etcd_prefix_param);
        }
+
+        if self.env.pageserver.auth_type != AuthType::Trust {
+            init_config_overrides.push("auth_validation_public_key_path='auth_public_key.pem'");
+        }
+
+        self.start_node(&init_config_overrides, &self.env.base_data_dir, true)?;
+        let init_result = self
+            .try_init_timeline(create_tenant, initial_timeline_id)
+            .context("Failed to create initial tenant and timeline for pageserver");
+        match &init_result {
+            Ok(initial_timeline_id) => {
+                println!("Successfully initialized timeline {initial_timeline_id}")
+            }
+            Err(e) => eprintln!("{e:#}"),
+        }
+        self.stop(false)?;
+        init_result
+    }
+
+    fn try_init_timeline(
+        &self,
+        new_tenant_id: Option<ZTenantId>,
+        new_timeline_id: Option<ZTimelineId>,
+    ) -> anyhow::Result<ZTimelineId> {
+        let initial_tenant_id = self.tenant_create(new_tenant_id, HashMap::new())?;
+        let initial_timeline_info =
+            self.timeline_create(initial_tenant_id, new_timeline_id, None, None)?;
+        Ok(initial_timeline_info.timeline_id)
    }

    pub fn repo_path(&self) -> PathBuf {
-        self.env.repo_path.clone()
+        self.env.pageserver_data_dir()
    }

    pub fn pid_file(&self) -> PathBuf {
-        self.env.repo_path.join("pageserver.pid")
+        self.repo_path().join("pageserver.pid")
    }

-    pub fn start(&self) -> Result<()> {
+    pub fn start(&self, config_overrides: &[&str]) -> anyhow::Result<()> {
+        self.start_node(config_overrides, &self.repo_path(), false)
+    }
+
+    fn start_node(
+        &self,
+        config_overrides: &[&str],
+        datadir: &Path,
+        update_config: bool,
+    ) -> anyhow::Result<()> {
        println!(
-            "Starting pageserver at '{}' in {}",
-            self.address(),
-            self.repo_path().display()
+            "Starting pageserver at '{}' in '{}'",
+            connection_address(&self.pg_connection_config),
+            datadir.display()
        );
+        io::stdout().flush()?;

-        let mut cmd = Command::new(self.env.zenith_distrib_dir.join("pageserver"));
-        cmd.args(&["-l", self.address().to_string().as_str()])
-            .arg("-d")
-            .env_clear()
-            .env("RUST_BACKTRACE", "1")
-            .env("ZENITH_REPO_DIR", self.repo_path())
-            .env("PATH", self.env.pg_bin_dir().to_str().unwrap()) // needs postres-wal-redo binary
-            .env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap());
+        let mut args = vec![
+            "-D",
+            datadir.to_str().with_context(|| {
+                format!(
+                    "Datadir path '{}' cannot be represented as a unicode string",
+                    datadir.display()
+                )
+            })?,
+        ];

-        if !cmd.status()?.success() {
-            anyhow::bail!(
-                "Pageserver failed to start. See '{}' for details.",
-                self.repo_path().join("pageserver.log").display()
+        if update_config {
+            args.push("--update-config");
+        }
+
+        for config_override in config_overrides {
+            args.extend(["-c", config_override]);
+        }
+
+        let mut cmd = Command::new(self.env.pageserver_bin()?);
+        let mut filled_cmd = fill_rust_env_vars(cmd.args(&args).arg("--daemonize"));
+        filled_cmd = fill_aws_secrets_vars(filled_cmd);
+
+        if !filled_cmd.status()?.success() {
+            bail!(
+                "Pageserver failed to start. See console output and '{}' for details.",
+                datadir.join("pageserver.log").display()
            );
        }

        // It takes a while for the page server to start up. Wait until it is
        // open for business.
-        for retries in 1..15 {
-            let client = self.page_server_psql_client();
-            if client.is_ok() {
-                break;
-            } else {
-                println!("page server not responding yet, retrying ({})...", retries);
-                thread::sleep(Duration::from_secs(1));
+        const RETRIES: i8 = 15;
+        for retries in 1..RETRIES {
+            match self.check_status() {
+                Ok(()) => {
+                    println!("\nPageserver started");
+                    return Ok(());
+                }
+                Err(err) => {
+                    match err {
+                        PageserverHttpError::Transport(err) => {
+                            if err.is_connect() && retries < 5 {
+                                print!(".");
+                                io::stdout().flush().unwrap();
+                            } else {
+                                if retries == 5 {
+                                    println!() // put a line break after dots for second message
+                                }
+                                println!("Pageserver not responding yet, err {err} retrying ({retries})...");
+                            }
+                        }
+                        PageserverHttpError::Response(msg) => {
+                            bail!("pageserver failed to start: {msg} ")
+                        }
+                    }
+                    thread::sleep(Duration::from_secs(1));
+                }
            }
        }
-        Ok(())
+        bail!("pageserver failed to start in {RETRIES} seconds");
    }

-    pub fn stop(&self) -> Result<()> {
-        let pidfile = self.pid_file();
-        let pid = read_pidfile(&pidfile)?;
-
-        let status = Command::new("kill")
-            .arg(&pid)
-            .env_clear()
-            .status()
-            .expect("failed to execute kill");
-
-        if !status.success() {
-            anyhow::bail!("Failed to kill pageserver with pid {}", pid);
-        }
-
-        // await for pageserver stop
-        for _ in 0..5 {
-            let stream = TcpStream::connect(self.address());
-            if let Err(_e) = stream {
-                return Ok(());
-            }
-            println!("Stopping pageserver on {}", self.address());
-            thread::sleep(Duration::from_secs(1));
-        }
-
-        // ok, we failed to stop pageserver, let's panic
-        if !status.success() {
-            anyhow::bail!("Failed to stop pageserver with pid {}", pid);
-        } else {
+    ///
+    /// Stop the server.
+    ///
+    /// If 'immediate' is true, we use SIGQUIT, killing the process immediately.
+    /// Otherwise we use SIGTERM, triggering a clean shutdown
+    ///
+    /// If the server is not running, returns success
+    ///
+    pub fn stop(&self, immediate: bool) -> anyhow::Result<()> {
+        let pid_file = self.pid_file();
+        if !pid_file.exists() {
+            println!("Pageserver is already stopped");
            return Ok(());
        }
+        let pid = Pid::from_raw(read_pidfile(&pid_file)?);
+
+        let sig = if immediate {
+            print!("Stopping pageserver immediately..");
+            Signal::SIGQUIT
+        } else {
+            print!("Stopping pageserver gracefully..");
+            Signal::SIGTERM
+        };
+        io::stdout().flush().unwrap();
+        match kill(pid, sig) {
+            Ok(_) => (),
+            Err(Errno::ESRCH) => {
+                println!("Pageserver with pid {pid} does not exist, but a PID file was found");
+                return Ok(());
+            }
+            Err(err) => bail!(
+                "Failed to send signal to pageserver with pid {pid}: {}",
+                err.desc()
+            ),
+        }
+
+        // Wait until process is gone
+        for i in 0..600 {
+            let signal = None; // Send no signal, just get the error code
+            match kill(pid, signal) {
+                Ok(_) => (), // Process exists, keep waiting
+                Err(Errno::ESRCH) => {
+                    // Process not found, we're done
+                    println!("done!");
+                    return Ok(());
+                }
+                Err(err) => bail!(
+                    "Failed to send signal to pageserver with pid {}: {}",
+                    pid,
+                    err.desc()
+                ),
+            };
+
+            if i % 10 == 0 {
+                print!(".");
+                io::stdout().flush().unwrap();
+            }
+            thread::sleep(Duration::from_millis(100));
+        }
+
+        bail!("Failed to stop pageserver with pid {pid}");
    }

    pub fn page_server_psql(&self, sql: &str) -> Vec<postgres::SimpleQueryMessage> {
-        let connstring = format!(
-            "host={} port={} dbname={} user={}",
-            self.address().ip(),
-            self.address().port(),
-            "no_db",
-            "no_user",
-        );
-        let mut client = Client::connect(connstring.as_str(), NoTls).unwrap();
+        let mut client = self.pg_connection_config.connect(NoTls).unwrap();

-        println!("Pageserver query: '{}'", sql);
+        println!("Pageserver query: '{sql}'");
        client.simple_query(sql).unwrap()
    }

-    pub fn page_server_psql_client(
+    pub fn page_server_psql_client(&self) -> result::Result<postgres::Client, postgres::Error> {
+        self.pg_connection_config.connect(NoTls)
+    }
+
+    fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
+        let mut builder = self.http_client.request(method, url);
+        if self.env.pageserver.auth_type == AuthType::ZenithJWT {
+            builder = builder.bearer_auth(&self.env.pageserver.auth_token)
+        }
+        builder
+    }
+
+    pub fn check_status(&self) -> Result<()> {
+        self.http_request(Method::GET, format!("{}/status", self.http_base_url))
+            .send()?
+            .error_from_body()?;
+        Ok(())
+    }
+
+    pub fn tenant_list(&self) -> Result<Vec<TenantInfo>> {
+        Ok(self
+            .http_request(Method::GET, format!("{}/tenant", self.http_base_url))
+            .send()?
+            .error_from_body()?
+            .json()?)
+    }
+
+    pub fn tenant_create(
        &self,
-    ) -> std::result::Result<postgres::Client, postgres::Error> {
-        let connstring = format!(
-            "host={} port={} dbname={} user={}",
-            self.address().ip(),
-            self.address().port(),
-            "no_db",
-            "no_user",
-        );
-        Client::connect(connstring.as_str(), NoTls)
+        new_tenant_id: Option<ZTenantId>,
+        settings: HashMap<&str, &str>,
+    ) -> anyhow::Result<ZTenantId> {
+        self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))
+            .json(&TenantCreateRequest {
+                new_tenant_id,
+                checkpoint_distance: settings
+                    .get("checkpoint_distance")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()?,
+                checkpoint_timeout: settings.get("checkpoint_timeout").map(|x| x.to_string()),
+                compaction_target_size: settings
+                    .get("compaction_target_size")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()?,
+                compaction_period: settings.get("compaction_period").map(|x| x.to_string()),
+                compaction_threshold: settings
+                    .get("compaction_threshold")
+                    .map(|x| x.parse::<usize>())
+                    .transpose()?,
+                gc_horizon: settings
+                    .get("gc_horizon")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()?,
+                gc_period: settings.get("gc_period").map(|x| x.to_string()),
+                image_creation_threshold: settings
+                    .get("image_creation_threshold")
+                    .map(|x| x.parse::<usize>())
+                    .transpose()?,
+                pitr_interval: settings.get("pitr_interval").map(|x| x.to_string()),
+                walreceiver_connect_timeout: settings
+                    .get("walreceiver_connect_timeout")
+                    .map(|x| x.to_string()),
+                lagging_wal_timeout: settings.get("lagging_wal_timeout").map(|x| x.to_string()),
+                max_lsn_wal_lag: settings
+                    .get("max_lsn_wal_lag")
+                    .map(|x| x.parse::<NonZeroU64>())
+                    .transpose()
+                    .context("Failed to parse 'max_lsn_wal_lag' as non zero integer")?,
+            })
+            .send()?
+            .error_from_body()?
+            .json::<Option<String>>()
+            .with_context(|| {
+                format!("Failed to parse tenant creation response for tenant id: {new_tenant_id:?}")
+            })?
+            .context("No tenant id was found in the tenant creation response")
+            .and_then(|tenant_id_string| {
+                tenant_id_string.parse().with_context(|| {
+                    format!("Failed to parse response string as tenant id: '{tenant_id_string}'")
+                })
+            })
    }
-}

-impl Drop for PageServerNode {
-    fn drop(&mut self) {
-        if self.kill_on_exit {
-            let _ = self.stop();
+    pub fn tenant_config(&self, tenant_id: ZTenantId, settings: HashMap<&str, &str>) -> Result<()> {
+        self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))
+            .json(&TenantConfigRequest {
+                tenant_id,
+                checkpoint_distance: settings
+                    .get("checkpoint_distance")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()
+                    .context("Failed to parse 'checkpoint_distance' as an integer")?,
+                checkpoint_timeout: settings.get("checkpoint_timeout").map(|x| x.to_string()),
+                compaction_target_size: settings
+                    .get("compaction_target_size")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()
+                    .context("Failed to parse 'compaction_target_size' as an integer")?,
+                compaction_period: settings.get("compaction_period").map(|x| x.to_string()),
+                compaction_threshold: settings
+                    .get("compaction_threshold")
+                    .map(|x| x.parse::<usize>())
+                    .transpose()
+                    .context("Failed to parse 'compaction_threshold' as an integer")?,
+                gc_horizon: settings
+                    .get("gc_horizon")
+                    .map(|x| x.parse::<u64>())
+                    .transpose()
+                    .context("Failed to parse 'gc_horizon' as an integer")?,
+                gc_period: settings.get("gc_period").map(|x| x.to_string()),
+                image_creation_threshold: settings
+                    .get("image_creation_threshold")
+                    .map(|x| x.parse::<usize>())
+                    .transpose()
+                    .context("Failed to parse 'image_creation_threshold' as non zero integer")?,
+                pitr_interval: settings.get("pitr_interval").map(|x| x.to_string()),
+                walreceiver_connect_timeout: settings
+                    .get("walreceiver_connect_timeout")
+                    .map(|x| x.to_string()),
+                lagging_wal_timeout: settings.get("lagging_wal_timeout").map(|x| x.to_string()),
+                max_lsn_wal_lag: settings
+                    .get("max_lsn_wal_lag")
+                    .map(|x| x.parse::<NonZeroU64>())
+                    .transpose()
+                    .context("Failed to parse 'max_lsn_wal_lag' as non zero integer")?,
+            })
+            .send()?
+            .error_from_body()?;
+
+        Ok(())
+    }
+
+    pub fn timeline_list(&self, tenant_id: &ZTenantId) -> anyhow::Result<Vec<TimelineInfo>> {
+        let timeline_infos: Vec<TimelineInfo> = self
+            .http_request(
+                Method::GET,
+                format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
+            )
+            .send()?
+            .error_from_body()?
+            .json()?;
+
+        Ok(timeline_infos)
+    }
+
+    pub fn timeline_create(
+        &self,
+        tenant_id: ZTenantId,
+        new_timeline_id: Option<ZTimelineId>,
+        ancestor_start_lsn: Option<Lsn>,
+        ancestor_timeline_id: Option<ZTimelineId>,
+    ) -> anyhow::Result<TimelineInfo> {
+        self.http_request(
+            Method::POST,
+            format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
+        )
+        .json(&TimelineCreateRequest {
+            new_timeline_id,
+            ancestor_start_lsn,
+            ancestor_timeline_id,
+        })
+        .send()?
+        .error_from_body()?
+        .json::<Option<TimelineInfo>>()
+        .with_context(|| {
+            format!("Failed to parse timeline creation response for tenant id: {tenant_id}")
+        })?
+        .with_context(|| {
+            format!(
+                "No timeline id was found in the timeline creation response for tenant {tenant_id}"
+            )
+        })
+    }
+
+    /// Import a basebackup prepared using either:
+    /// a) `pg_basebackup -F tar`, or
+    /// b) The `fullbackup` pageserver endpoint
+    ///
+    /// # Arguments
+    /// * `tenant_id` - tenant to import into. Created if not exists
+    /// * `timeline_id` - id to assign to imported timeline
+    /// * `base` - (start lsn of basebackup, path to `base.tar` file)
+    /// * `pg_wal` - if there's any wal to import: (end lsn, path to `pg_wal.tar`)
+    pub fn timeline_import(
+        &self,
+        tenant_id: ZTenantId,
+        timeline_id: ZTimelineId,
+        base: (Lsn, PathBuf),
+        pg_wal: Option<(Lsn, PathBuf)>,
+    ) -> anyhow::Result<()> {
+        let mut client = self.pg_connection_config.connect(NoTls).unwrap();
+
+        // Init base reader
+        let (start_lsn, base_tarfile_path) = base;
+        let base_tarfile = File::open(base_tarfile_path)?;
+        let mut base_reader = BufReader::new(base_tarfile);
+
+        // Init wal reader if necessary
+        let (end_lsn, wal_reader) = if let Some((end_lsn, wal_tarfile_path)) = pg_wal {
+            let wal_tarfile = File::open(wal_tarfile_path)?;
+            let wal_reader = BufReader::new(wal_tarfile);
+            (end_lsn, Some(wal_reader))
+        } else {
+            (start_lsn, None)
+        };
+
+        // Import base
+        let import_cmd =
+            format!("import basebackup {tenant_id} {timeline_id} {start_lsn} {end_lsn}");
+        let mut writer = client.copy_in(&import_cmd)?;
+        io::copy(&mut base_reader, &mut writer)?;
+        writer.finish()?;
+
+        // Import wal if necessary
+        if let Some(mut wal_reader) = wal_reader {
+            let import_cmd = format!("import wal {tenant_id} {timeline_id} {start_lsn} {end_lsn}");
+            let mut writer = client.copy_in(&import_cmd)?;
+            io::copy(&mut wal_reader, &mut writer)?;
+            writer.finish()?;
        }
-    }
-}
-
-//
-// Control routines for WalAcceptor.
-//
-// Now used only in test setups.
-//
-pub struct WalAcceptorNode {
-    listen: SocketAddr,
-    data_dir: PathBuf,
-    env: LocalEnv,
-}
-
-impl WalAcceptorNode {
-    pub fn init(&self) {
-        if self.data_dir.exists() {
-            fs::remove_dir_all(self.data_dir.clone()).unwrap();
-        }
-        fs::create_dir_all(self.data_dir.clone()).unwrap();
-    }
-
-    pub fn start(&self) {
-        println!(
-            "Starting wal_acceptor in {} listening '{}'",
-            self.data_dir.to_str().unwrap(),
-            self.listen
-        );
-
-        let status = Command::new(self.env.zenith_distrib_dir.join("wal_acceptor"))
-            .args(&["-D", self.data_dir.to_str().unwrap()])
-            .args(&["-l", self.listen.to_string().as_str()])
-            .args(&["--systemid", &self.env.systemid.to_string()])
-            // Tell page server it can receive WAL from this WAL safekeeper
-            // FIXME: If there are multiple safekeepers, they will all inform
-            // the page server. Only the last "notification" will stay in effect.
-            // So it's pretty random which safekeeper the page server will connect to
-            .args(&["--pageserver", "127.0.0.1:64000"])
-            .arg("-d")
-            .arg("-n")
-            .status()
-            .expect("failed to start wal_acceptor");
-
-        if !status.success() {
-            panic!("wal_acceptor start failed");
-        }
-    }
-
-    pub fn stop(&self) -> std::result::Result<(), io::Error> {
-        println!("Stopping wal acceptor on {}", self.listen);
-        let pidfile = self.data_dir.join("wal_acceptor.pid");
-        let pid = read_pidfile(&pidfile)?;
-        // Ignores any failures when running this command
-        let _status = Command::new("kill")
-            .arg(pid)
-            .env_clear()
-            .status()
-            .expect("failed to execute kill");

        Ok(())
    }
 }
-
-impl Drop for WalAcceptorNode {
-    fn drop(&mut self) {
-        self.stop().unwrap();
-    }
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-pub struct WalProposerNode {
-    pub pid: u32,
-}
-
-impl WalProposerNode {
-    pub fn stop(&self) {
-        let status = Command::new("kill")
-            .arg(self.pid.to_string())
-            .env_clear()
-            .status()
-            .expect("failed to execute kill");
-
-        if !status.success() {
-            panic!("kill start failed");
-        }
-    }
-}
-
-impl Drop for WalProposerNode {
-    fn drop(&mut self) {
-        self.stop();
-    }
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-pub fn regress_check(pg: &PostgresNode) {
-    pg.safe_psql("postgres", "CREATE DATABASE regression");
-
-    let regress_run_path = Path::new(env!("CARGO_MANIFEST_DIR")).join("tmp_check/regress");
-    fs::create_dir_all(regress_run_path.clone()).unwrap();
-    std::env::set_current_dir(regress_run_path).unwrap();
-
-    let regress_build_path =
-        Path::new(env!("CARGO_MANIFEST_DIR")).join("../tmp_install/build/src/test/regress");
-    let regress_src_path =
-        Path::new(env!("CARGO_MANIFEST_DIR")).join("../vendor/postgres/src/test/regress");
-
-    let _regress_check = Command::new(regress_build_path.join("pg_regress"))
-        .args(&[
-            "--bindir=''",
-            "--use-existing",
-            format!("--bindir={}", pg.env.pg_bin_dir().to_str().unwrap()).as_str(),
-            format!("--dlpath={}", regress_build_path.to_str().unwrap()).as_str(),
-            format!(
-                "--schedule={}",
-                regress_src_path.join("parallel_schedule").to_str().unwrap()
-            )
-            .as_str(),
-            format!("--inputdir={}", regress_src_path.to_str().unwrap()).as_str(),
-        ])
-        .env_clear()
-        .env("LD_LIBRARY_PATH", pg.env.pg_lib_dir().to_str().unwrap())
-        .env("PGHOST", pg.address.ip().to_string())
-        .env("PGPORT", pg.address.port().to_string())
-        .env("PGUSER", pg.whoami())
-        .status()
-        .expect("pg_regress failed");
-}
-
-/// Read a PID file
-///
-/// This should contain an unsigned integer, but we return it as a String
-/// because our callers only want to pass it back into a subcommand.
-fn read_pidfile(pidfile: &Path) -> std::result::Result<String, io::Error> {
-    fs::read_to_string(pidfile).map_err(|err| {
-        eprintln!("failed to read pidfile {:?}: {:?}", pidfile, err);
-        err
-    })
-}
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -0,0 +1 @@
+book
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -0,0 +1,82 @@
+# Summary
+
+[Introduction]()
+- [Separation of Compute and Storage](./separation-compute-storage.md)
+
+# Architecture
+
+- [Compute]()
+  - [WAL proposer]()
+  - [WAL Backpressure]()
+  - [Postgres changes](./core_changes.md)
+
+- [Pageserver](./pageserver.md)
+    - [Services](./pageserver-services.md)
+    - [Thread management](./pageserver-thread-mgmt.md)
+    - [WAL Redo](./pageserver-walredo.md)
+    - [Page cache](./pageserver-pagecache.md)
+    - [Storage](./pageserver-storage.md)
+        - [Datadir mapping]()
+        - [Layer files]()
+        - [Branching]()
+        - [Garbage collection]()
+    - [Cloud Storage]()
+    - [Processing a GetPage request](./pageserver-processing-getpage.md)
+    - [Processing WAL](./pageserver-processing-wal.md)
+	- [Management API]()
+	- [Tenant Rebalancing]()
+
+- [WAL Service](walservice.md)
+  - [Consensus protocol](safekeeper-protocol.md)
+  - [Management API]()
+  - [Rebalancing]()
+
+- [Control Plane]()
+
+- [Proxy]()
+
+- [Source view](./sourcetree.md)
+  - [docker.md](./docker.md) — Docker images and building pipeline.
+  - [Error handling and logging]()
+  - [Testing]()
+    - [Unit testing]()
+    - [Integration testing]()
+    - [Benchmarks]()
+
+
+- [Glossary](./glossary.md)
+
+# Uncategorized
+
+- [authentication.md](./authentication.md)
+- [multitenancy.md](./multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
+- [settings.md](./settings.md)
+#FIXME: move these under sourcetree.md
+#- [postgres_ffi/README.md](/libs/postgres_ffi/README.md)
+#- [test_runner/README.md](/test_runner/README.md)
+
+
+# RFCs
+
+- [RFCs](./rfcs/README.md)
+
+- [002-storage](rfcs/002-storage.md)
+- [003-laptop-cli](rfcs/003-laptop-cli.md)
+- [004-durability](rfcs/004-durability.md)
+- [005-zenith_local](rfcs/005-zenith_local.md)
+- [006-laptop-cli-v2-CLI](rfcs/006-laptop-cli-v2-CLI.md)
+- [006-laptop-cli-v2-repository-structure](rfcs/006-laptop-cli-v2-repository-structure.md)
+- [007-serverless-on-laptop](rfcs/007-serverless-on-laptop.md)
+- [008-push-pull](rfcs/008-push-pull.md)
+- [009-snapshot-first-storage-cli](rfcs/009-snapshot-first-storage-cli.md)
+- [009-snapshot-first-storage](rfcs/009-snapshot-first-storage.md)
+- [009-snapshot-first-storage-pitr](rfcs/009-snapshot-first-storage-pitr.md)
+- [010-storage_details](rfcs/010-storage_details.md)
+- [011-retention-policy](rfcs/011-retention-policy.md)
+- [012-background-tasks](rfcs/012-background-tasks.md)
+- [013-term-history](rfcs/013-term-history.md)
+- [014-safekeepers-gossip](rfcs/014-safekeepers-gossip.md)
+- [014-storage-lsm](rfcs/014-storage-lsm.md)
+- [015-storage-messaging](rfcs/015-storage-messaging.md)
+- [016-connection-routing](rfcs/016-connection-routing.md)
+- [cluster-size-limits](rfcs/cluster-size-limits.md)
--- a/docs/authentication.md
+++ b/docs/authentication.md
@@ -0,0 +1,30 @@
+## Authentication
+
+### Overview
+
+Current state of authentication includes usage of JWT tokens in communication between compute and pageserver and between CLI and pageserver. JWT token is signed using RSA keys. CLI generates a key pair during call to `zenith init`. Using following openssl commands:
+
+```bash
+openssl genrsa -out private_key.pem 2048
+openssl rsa -in private_key.pem -pubout -outform PEM -out public_key.pem
+```
+
+CLI also generates signed token and saves it in the config for later access to pageserver. Now authentication is optional. Pageserver has two variables in config: `auth_validation_public_key_path` and `auth_type`, so when auth type present and set to `ZenithJWT` pageserver will require authentication for connections. Actual JWT is passed in password field of connection string. There is a caveat for psql, it silently truncates passwords to 100 symbols, so to correctly pass JWT via psql you have to either use PGPASSWORD environment variable, or store password in psql config file.
+
+Currently there is no authentication between compute and safekeepers, because this communication layer is under heavy refactoring. After this refactoring support for authentication will be added there too. Now safekeeper supports "hardcoded" token passed via environment variable to be able to use callmemaybe command in pageserver.
+
+Compute uses token passed via environment variable to communicate to pageserver and in the future to the safekeeper too.
+
+JWT authentication now supports two scopes: tenant and pageserverapi. Tenant scope is intended for use in tenant related api calls, e.g. create_branch. Compute launched for particular tenant also uses this scope. Scope pageserver api is intended to be used by console to manage pageserver. For now we have only one management operation - create tenant.
+
+Examples for token generation in python:
+
+```python
+# generate pageserverapi token
+management_token = jwt.encode({"scope": "pageserverapi"}, auth_keys.priv, algorithm="RS256")
+
+# generate tenant token
+tenant_token = jwt.encode({"scope": "tenant", "tenant_id": ps.initial_tenant}, auth_keys.priv, algorithm="RS256")
+```
+
+Utility functions to work with jwts in rust are located in libs/utils/src/auth.rs
--- a/docs/book.toml
+++ b/docs/book.toml
@@ -0,0 +1,5 @@
+[book]
+language = "en"
+multilingual = false
+src = "."
+title = "Neon architecture"
--- a/docs/core_changes.md
+++ b/docs/core_changes.md
@@ -0,0 +1,519 @@
+# Postgres core changes
+
+This lists all the changes that have been made to the PostgreSQL
+source tree, as a somewhat logical set of patches. The long-term goal
+is to eliminate all these changes, by submitting patches to upstream
+and refactoring code into extensions, so that you can run unmodified
+PostgreSQL against Neon storage.
+
+In Neon, we run PostgreSQL in the compute nodes, but we also run a special WAL redo process in the
+page server. We currently use the same binary for both, with --wal-redo runtime flag to launch it in
+the WAL redo mode. Some PostgreSQL changes are needed in the compute node, while others are just for
+the WAL redo process.
+
+In addition to core PostgreSQL changes, there is a Neon extension in contrib/neon, to hook into the
+smgr interface. Once all the core changes have been submitted to upstream or eliminated some other
+way, the extension could live outside the postgres repository and build against vanilla PostgreSQL.
+
+Below is a list of all the PostgreSQL source code changes, categorized into changes needed for
+compute, and changes needed for the WAL redo process:
+
+# Changes for Compute node
+
+## Add t_cid to heap WAL records
+
+```
+ src/backend/access/heap/heapam.c                            |   26 +-
+ src/include/access/heapam_xlog.h                            |    6 +-
+```
+
+We have added a new t_cid field to heap WAL records. This changes the WAL record format, making Neon WAL format incompatible with vanilla PostgreSQL!
+
+### Problem we're trying to solve
+
+The problem is that the XLOG_HEAP_INSERT record does not include the command id of the inserted row. And same with deletion/update. So in the primary, a row is inserted with current xmin + cmin. But in the replica, the cmin is always set to 1. That works in PostgreSQL, because the command id is only relevant to the inserting transaction itself. After commit/abort, no one cares about it anymore. But with Neon, we rely on WAL replay to reconstruct the page, even while the original transaction is still running.
+
+### How to get rid of the patch
+
+Bite the bullet and submit the patch to PostgreSQL, to add the t_cid to the WAL records. It makes the WAL records larger, which could make this unpopular in the PostgreSQL community. However, it might simplify some logical decoding code; Andres Freund briefly mentioned in PGCon 2022 discussion on Heikki's Neon presentation that logical decoding currently needs to jump through some hoops to reconstruct the same information.
+
+
+### Alternatives
+Perhaps we could write an extra WAL record with the t_cid information, when a page is evicted that contains rows that were touched a transaction that's still running. However, that seems very complicated.
+
+## ginfast.c
+
+```
+diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
+index e0d9940946..2d964c02e9 100644
+--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
+@@ -285,6 +285,17 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
+                memset(&sublist, 0, sizeof(GinMetaPageData));
+                makeSublist(index, collector->tuples, collector->ntuples, &sublist);
+ 
+               if (metadata->head != InvalidBlockNumber)
+               {
+                       /*
+                        * ZENITH: Get buffer before XLogBeginInsert() to avoid recursive call
+                        * of XLogBeginInsert(). Reading a new buffer might evict a dirty page from
+                        * the buffer cache, and if that page happens to be an FSM or VM page, zenith_write()
+                        * will try to WAL-log an image of the page.
+                        */
+                       buffer = ReadBuffer(index, metadata->tail);
+               }
+
+                if (needWal)
+                        XLogBeginInsert();
+ 
+@@ -316,7 +327,6 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
+                        data.prevTail = metadata->tail;
+                        data.newRightlink = sublist.head;
+ 
+-                       buffer = ReadBuffer(index, metadata->tail);
+                        LockBuffer(buffer, GIN_EXCLUSIVE);
+                        page = BufferGetPage(buffer);
+```
+
+The problem is explained in the comment above
+
+### How to get rid of the patch
+
+Can we stop WAL-logging FSM or VM pages? Or delay the WAL logging until we're out of the critical
+section or something.
+
+Maybe some bigger rewrite of FSM and VM would help to avoid WAL-logging FSM and VM page images?
+
+
+## Mark index builds that use buffer manager without logging explicitly
+
+```
+ src/backend/access/gin/gininsert.c                          |    7 +
+ src/backend/access/gist/gistbuild.c                         |   15 +-
+ src/backend/access/spgist/spginsert.c                       |    8 +-
+
+also some changes in src/backend/storage/smgr/smgr.c
+```
+
+When a GIN index is built, for example, it is built by inserting the entries into the index more or
+less normally, but without WAL-logging anything. After the index has been built, we iterate through
+all pages and write them to the WAL. That doesn't work for Neon, because if a page is not WAL-logged
+and is evicted from the buffer cache, it is lost. We have an check to catch that in the Neon
+extension. To fix that, we've added a few functions to track explicitly when we're performing such
+an operation: `smgr_start_unlogged_build`, `smgr_finish_unlogged_build_phase_1` and
+`smgr_end_unlogged_build`.
+
+
+### How to get rid of the patch
+
+I think it would make sense to be more explicit about that in PostgreSQL too. So extract these
+changes to a patch and post to pgsql-hackers.
+
+
+## Track last-written page LSN
+
+```
+ src/backend/commands/dbcommands.c                           |   17 +-
+
+Also one call to SetLastWrittenPageLSN() in spginsert.c, maybe elsewhere too
+```
+
+Whenever a page is evicted from the buffer cache, we remember its LSN, so that we can use the same
+LSN in the GetPage@LSN request when reading the page back from the page server. The value is
+conservative: it would be correct to always use the last-inserted LSN, but it would be slow because
+then the page server would need to wait for the recent WAL to be streamed and processed, before
+responding to any GetPage@LSN request.
+
+The last-written page LSN is mostly tracked in the smgrwrite() function, without core code changes,
+but there are a few exceptions where we've had to add explicit calls to the Neon-specific
+SetLastWrittenPageLSN() function.
+
+There's an open PR to track the LSN in a more-fine grained fashion:
+https://github.com/neondatabase/postgres/pull/177
+
+PostgreSQL v15 introduces a new method to do CREATE DATABASE that WAL-logs the database instead of
+relying copying files and checkpoint. With that method, we probably won't need any special handling.
+The old method is still available, though.
+
+### How to get rid of the patch
+
+Wait until v15?
+
+
+## Cache relation sizes
+
+The Neon extension contains a little cache for smgrnblocks() and smgrexists() calls, to avoid going
+to the page server every time. It might be useful to cache those in PostgreSQL, maybe in the
+relcache? (I think we do cache nblocks in relcache already, check why that's not good enough for
+Neon)
+
+
+## Misc change in vacuumlazy.c
+
+```
+index 8aab6e324e..c684c4fbee 100644
+--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
+@@ -1487,7 +1487,10 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
+                else if (all_visible_according_to_vm && !PageIsAllVisible(page)
+                                 && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
+                {
+-                       elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+                       /* ZENITH-XXX: all visible hint is not wal-logged
+                        * FIXME: Replay visibilitymap changes in pageserver
+                        */
+                       elog(DEBUG1, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+                                 vacrel->relname, blkno);
+                        visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+                                                                VISIBILITYMAP_VALID_BITS);
+```
+
+
+Is this still needed? If that WARNING happens, it looks like potential corruption that we should
+fix!
+
+
+## Use buffer manager when extending VM or FSM
+
+```
+ src/backend/storage/freespace/freespace.c                   |   14 +-
+ src/backend/access/heap/visibilitymap.c                     |   15 +-
+
+diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
+index e198df65d8..addfe93eac 100644
+--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
+@@ -652,10 +652,19 @@ vm_extend(Relation rel, BlockNumber vm_nblocks)
+        /* Now extend the file */
+        while (vm_nblocks_now < vm_nblocks)
+        {
+-               PageSetChecksumInplace((Page) pg.data, vm_nblocks_now);
+               /*
+                * ZENITH: Initialize VM pages through buffer cache to prevent loading
+                * them from pageserver.
+                */
+               Buffer  buffer = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, P_NEW,
+                                                                                       RBM_ZERO_AND_LOCK, NULL);
+               Page    page = BufferGetPage(buffer);
+
+               PageInit((Page) page, BLCKSZ, 0);
+               PageSetChecksumInplace(page, vm_nblocks_now);
+               MarkBufferDirty(buffer);
+               UnlockReleaseBuffer(buffer);
+ 
+-               smgrextend(rel->rd_smgr, VISIBILITYMAP_FORKNUM, vm_nblocks_now,
+-                                  pg.data, false);
+                vm_nblocks_now++;
+        }
+```
+
+### Problem we're trying to solve
+
+???
+
+### How to get rid of the patch
+
+Maybe this would be a reasonable change in PostgreSQL too?
+
+
+## Allow startup without reading checkpoint record
+
+In Neon, the compute node is stateless. So when we are launching compute node, we need to provide
+some dummy PG_DATADIR. Relation pages can be requested on demand from page server. But Postgres
+still need some non-relational data: control and configuration files, SLRUs,...  It is currently
+implemented using basebackup (do not mix with pg_basebackup) which is created by pageserver. It
+includes in this tarball config/control files, SLRUs and required directories.
+
+As pageserver does not have the original WAL segments, the basebackup tarball includes an empty WAL
+segment to bootstrap the WAL writing, but it doesn't contain the checkpoint record.  There are some
+changes in xlog.c, to allow starting the compute node without reading the last checkpoint record
+from WAL.
+
+This includes code to read the `zenith.signal` file, which tells the startup code the LSN to start
+at. When the `zenith.signal` file is present, the startup uses that LSN instead of the last
+checkpoint's LSN. The system is known to be consistent at that LSN, without any WAL redo.
+
+
+### How to get rid of the patch
+
+???
+
+
+### Alternatives
+
+Include a fake checkpoint record in the tarball. Creating fake WAL is a bit risky, though; I'm
+afraid it might accidentally get streamed to the safekeepers and overwrite or corrupt the real WAL.
+
+## Disable sequence caching
+
+```
+diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
+index 0415df9ccb..9f9db3c8bc 100644
+--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
+@@ -53,7 +53,9 @@
+  * so we pre-log a few fetches in advance. In the event of
+  * crash we can lose (skip over) as many values as we pre-logged.
+  */
+-#define SEQ_LOG_VALS   32
+/* Zenith XXX: to ensure sequence order of sequence in Zenith we need to WAL log each sequence update. */
+/* #define SEQ_LOG_VALS        32 */
+#define SEQ_LOG_VALS   0
+```
+
+Due to performance reasons Postgres don't want to log each fetching of a value from a sequence, so
+it pre-logs a few fetches in advance. In the event of crash we can lose (skip over) as many values
+as we pre-logged. But with Neon, because page with sequence value can be evicted from buffer cache,
+we can get a gap in sequence values even without crash.
+
+### How to get rid of the patch
+
+Maybe we can just remove it, and accept the gaps. Or add some special handling for sequence
+relations in the Neon extension, to WAL log the sequence page when it's about to be evicted. It
+would be weird if the sequence moved backwards though, think of PITR.
+
+Or add a GUC for the amount to prefix to PostgreSQL, and force it to 1 in Neon.
+
+
+## Walproposer
+
+```
+ src/Makefile                                                |    1 +
+ src/backend/replication/libpqwalproposer/Makefile           |   37 +
+ src/backend/replication/libpqwalproposer/libpqwalproposer.c |  416 ++++++++++++
+ src/backend/postmaster/bgworker.c                           |    4 +
+ src/backend/postmaster/postmaster.c                         |    6 +
+ src/backend/replication/Makefile                            |    4 +-
+ src/backend/replication/walproposer.c                       | 2350 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ src/backend/replication/walproposer_utils.c                 |  402 +++++++++++
+ src/backend/replication/walreceiver.c                       |    7 +
+ src/backend/replication/walsender.c                         |  320 ++++++---
+ src/backend/storage/ipc/ipci.c                              |    6 +
+ src/include/replication/walproposer.h                       |  565 ++++++++++++++++
+```
+
+WAL proposer is communicating with safekeeper and ensures WAL durability by quorum writes.  It is
+currently implemented as patch to standard WAL sender.
+
+### How to get rid of the patch
+
+Refactor into an extension. Submit hooks or APIs into upstream if necessary.
+
+@MMeent did some work on this already: https://github.com/neondatabase/postgres/pull/96
+
+## Ignore unexpected data beyond EOF in bufmgr.c
+
+```
+@@ -922,11 +928,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
+                 */
+                bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
+                if (!PageIsNew((Page) bufBlock))
+-                       ereport(ERROR,
+               {
+                        // XXX-ZENITH
+                        MemSet((char *) bufBlock, 0, BLCKSZ);
+                        ereport(DEBUG1,
+                                        (errmsg("unexpected data beyond EOF in block %u of relation %s",
+                                                        blockNum, relpath(smgr->smgr_rnode, forkNum)),
+                                         errhint("This has been seen to occur with buggy kernels; consider updating your system.")));
+-
+               }
+                /*
+                 * We *must* do smgrextend before succeeding, else the page will not
+                 * be reserved by the kernel, and the next P_NEW call will decide to
+```
+
+PostgreSQL is a bit sloppy with extending relations. Usually, the relation is extended with zeros
+first, then the page is filled, and finally the new page WAL-logged. But if multiple backends extend
+a relation at the same time, the pages can be WAL-logged in different order.
+
+I'm not sure what scenario exactly required this change in Neon, though.
+
+### How to get rid of the patch
+
+Submit patches to pgsql-hackers, to tighten up the WAL-logging around relation extension. It's a bit
+confusing even in PostgreSQL. Maybe WAL log the intention to extend first, then extend the relation,
+and finally WAL-log that the extension succeeded.
+
+## Make smgr interface available to extensions
+
+```
+ src/backend/storage/smgr/smgr.c                             |  203 +++---
+ src/include/storage/smgr.h                                  |   72 +-
+```
+
+### How to get rid of the patch
+
+Submit to upstream. This could be useful for the Disk Encryption patches too, or for compression.
+
+
+## Added relpersistence argument to smgropen()
+
+```
+ src/backend/access/heap/heapam_handler.c                    |    2 +-
+ src/backend/catalog/storage.c                               |   10 +-
+ src/backend/commands/tablecmds.c                            |    2 +-
+ src/backend/storage/smgr/md.c                               |    4 +-
+ src/include/utils/rel.h                                     |    3 +-
+```
+
+Neon needs to treat unlogged relations differently from others, so the smgrread(), smgrwrite() etc.
+implementations need to know the 'relpersistence' of the relation. To get that information where
+it's needed, we added the 'relpersistence' field to smgropen().
+
+### How to get rid of the patch
+
+Maybe 'relpersistence' would be useful in PostgreSQL for debugging purposes? Or simply for the
+benefit of extensions like Neon. Should consider this in the patch to make smgr API usable to
+extensions.
+
+## Alternatives
+
+Currently in Neon, unlogged tables live on local disk in the compute node, and are wiped away on
+compute node restart. One alternative would be to instead WAL-log even unlogged tables, essentially
+ignoring the UNLOGGED option. Or prohibit UNLOGGED tables completely. But would we still need the
+relpersistence argument to handle index builds? See item on "Mark index builds that use buffer
+manager without logging explicitly".
+
+## Use smgr and dbsize_hook for size calculations
+
+```
+ src/backend/utils/adt/dbsize.c                              |   61 +-
+```
+
+In PostgreSQL, the rel and db-size functions scan the data directory directly. That won't work in Neon.
+
+### How to get rid of the patch
+
+Send patch to PostgreSQL, to use smgr API functions for relation size calculation instead. Maybe as
+part of the general smgr API patch.
+
+
+
+# WAL redo process changes
+
+Pageserver delegates complex WAL decoding duties to Postgres, which means that the latter might fall
+victim to carefully designed malicious WAL records and start doing harmful things to the system.  To
+prevent this, the redo functions are executed in a separate process that is sandboxed with Linux
+Secure Computing mode (see seccomp(2) man page).
+
+As an alternative to having a separate WAL redo process, we could rewrite all redo handlers in Rust
+This is infeasible. However, it would take a lot of effort to rewrite them, ensure that you've done
+the rewrite correctly, and once you've done that, it would be a lot of ongoing maintenance effort to
+keep the rewritten code in sync over time, across new PostgreSQL versions. That's why we want to
+leverage PostgreSQL code.
+
+Another alternative would be to harden all the PostgreSQL WAL redo functions so that it would be
+safe to call them directly from Rust code, without needing the security sandbox. That's not feasible
+for similar reasons as rewriting them in Rust.
+
+
+## Don't replay change in XLogReadBufferForRedo that are not for the target page we're replaying
+
+```
+ src/backend/access/gin/ginxlog.c                            |   19 +-
+
+Also some changes in xlog.c and xlogutils.c
+
+Example:
+
+@@ -415,21 +416,27 @@ ginRedoSplit(XLogReaderState *record)
+        if (!isLeaf)
+                ginRedoClearIncompleteSplit(record, 3);
+ 
+-       if (XLogReadBufferForRedo(record, 0, &lbuffer) != BLK_RESTORED)
+       action = XLogReadBufferForRedo(record, 0, &lbuffer);
+       if (action != BLK_RESTORED && action != BLK_DONE)
+                elog(ERROR, "GIN split record did not contain a full-page image of left page");
+```
+
+### Problem we're trying to solve
+
+In PostgreSQL, if a WAL redo function calls XLogReadBufferForRead() for a page that has a full-page
+image, it always succeeds. However, Neon WAL redo process is only concerned about replaying changes
+to a singe page, so replaying any changes for other pages is a waste of cycles. We have modified
+XLogReadBufferForRead() to return BLK_DONE for all other pages, to avoid the overhead. That is
+unexpected by code like the above.
+
+### How to get rid of the patch
+
+Submit the changes to upstream, hope the community accepts them. There's no harm to PostgreSQL from
+these changes, although it doesn't have any benefit either.
+
+To make these changes useful to upstream PostgreSQL, we could implement a feature to look ahead the
+WAL, and detect truncated relations. Even in PostgreSQL, it is a waste of cycles to replay changes
+to pages that are later truncated away, so we could have XLogReadBufferForRedo() return BLK_DONE or
+BLK_NOTFOUND for pages that are known to be truncated away later in the WAL stream.
+
+### Alternatives
+
+Maybe we could revert this optimization, and restore pages other than the target page too.
+
+## Add predefined_sysidentifier flag to initdb
+
+```
+ src/backend/bootstrap/bootstrap.c                           |   13 +-
+ src/bin/initdb/initdb.c                                     |    4 +
+
+And some changes in xlog.c
+```
+
+This is used to help with restoring a database when you have all the WAL, all the way back to
+initdb, but no backup. You can reconstruct the missing backup by running initdb again, with the same
+sysidentifier.
+
+
+### How to get rid of the patch
+
+Ignore it. This is only needed for disaster recovery, so once we've eliminated all other Postgres
+patches, we can just keep it around as a patch or as separate branch in a repo.
+
+
+# Not currently committed but proposed
+
+## Disable ring buffer buffer manager strategies
+
+### Why?
+
+Postgres tries to avoid cache flushing by bulk operations (copy, seqscan, vacuum,...).
+Even if there are free space in buffer cache, pages may be evicted.
+Negative effect of it can be somehow compensated by file system cache, but in Neon,
+cost of requesting page from page server is much higher.
+
+### Alternatives?
+
+Instead of just prohibiting ring buffer we may try to implement more flexible eviction policy,
+for example copy evicted page from ring buffer to some other buffer if there is free space
+in buffer cache.
+
+## Disable marking page as dirty when hint bits are set.
+
+### Why?
+
+Postgres has to modify page twice: first time when some tuple is updated and second time when
+hint bits are set. Wal logging hint bits updates requires FPI which significantly increase size of WAL.
+
+### Alternatives?
+
+Add special WAL record for setting page hints.
+
+## Prefetching
+
+### Why?
+
+As far as pages in Neon are loaded on demand, to reduce node startup time
+and also speedup some massive queries we need some mechanism for bulk loading to
+reduce page request round-trip overhead.
+
+Currently Postgres is supporting prefetching only for bitmap scan.
+In Neon we should also use prefetch for sequential and index scans, because the OS is not doing it for us.
+For sequential scan we could prefetch some number of following pages. For index scan we could prefetch pages
+of heap relation addressed by TIDs.
+
+## Prewarming
+
+### Why?
+
+Short downtime (or, in other words, fast compute node restart time) is one of the key feature of Zenith.
+But overhead of request-response round-trip for loading pages on demand can make started node warm-up quite slow.
+We can capture state of compute node buffer cache and send bulk request for this pages at startup.
--- a/docs/docker.md
+++ b/docs/docker.md
@@ -0,0 +1,20 @@
+# Docker images of Neon
+
+## Images
+
+Currently we build two main images:
+
+- [neondatabase/neon](https://hub.docker.com/repository/docker/zenithdb/zenith) — image with pre-built `pageserver`, `safekeeper` and `proxy` binaries and all the required runtime dependencies. Built from [/Dockerfile](/Dockerfile).
+- [neondatabase/compute-node](https://hub.docker.com/repository/docker/zenithdb/compute-node) — compute node image with pre-built Postgres binaries from [neondatabase/postgres](https://github.com/neondatabase/postgres).
+
+And additional intermediate image:
+
+- [neondatabase/compute-tools](https://hub.docker.com/repository/docker/neondatabase/compute-tools) — compute node configuration management tools.
+
+## Building pipeline
+
+We build all images after a successful `release` tests run and push automatically to Docker Hub with two parallel CI jobs
+
+1. `neondatabase/compute-tools` and `neondatabase/compute-node`
+
+2. `neondatabase/neon`
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -0,0 +1,252 @@
+# Glossary
+
+### Authentication
+
+### Backpressure
+
+Backpressure is used to limit the lag between pageserver and compute node or WAL service.
+
+If compute node or WAL service run far ahead of Page Server,
+the time of serving page requests increases. This may lead to timeout errors.
+
+To tune backpressure limits use `max_replication_write_lag`, `max_replication_flush_lag` and `max_replication_apply_lag` settings.
+When lag between current LSN (pg_current_wal_flush_lsn() at compute node) and minimal write/flush/apply position of replica exceeds the limit
+backends performing writes are blocked until the replica is caught up.
+### Base image (page image)
+
+### Basebackup
+
+A tarball with files needed to bootstrap a compute node[] and a corresponding command to create it.
+NOTE:It has nothing to do with PostgreSQL pg_basebackup.
+
+### Branch
+
+We can create branch at certain LSN using `neon_local timeline branch` command.
+Each Branch lives in a corresponding timeline[] and has an ancestor[].
+
+
+### Checkpoint (PostgreSQL)
+
+NOTE: This is an overloaded term.
+
+A checkpoint record in the WAL marks a point in the WAL sequence at which it is guaranteed that all data files have been updated with all information from shared memory modified before that checkpoint;
+
+### Checkpoint (Layered repository)
+
+NOTE: This is an overloaded term.
+
+Whenever enough WAL has been accumulated in memory, the page server []
+writes out the changes from the in-memory layer into a new delta layer file. This process
+is called "checkpointing".
+
+Configuration parameter `checkpoint_distance` defines the distance
+from current LSN to perform checkpoint of in-memory layers.
+Default is `DEFAULT_CHECKPOINT_DISTANCE`.
+
+### Compaction
+
+A background operation on layer files. Compaction takes a number of L0
+layer files, each of which covers the whole key space and a range of
+LSN, and reshuffles the data in them into L1 files so that each file
+covers the whole LSN range, but only part of the key space.
+
+Compaction should also opportunistically leave obsolete page versions
+from the L1 files, and materialize other page versions for faster
+access. That hasn't been implemented as of this writing, though.
+
+
+### Compute node
+
+Stateless Postgres node that stores data in pageserver.
+
+### Garbage collection
+
+The process of removing old on-disk layers that are not needed by any timeline anymore.
+
+### Fork
+
+Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map.
+
+### Layer
+
+A layer contains data needed to reconstruct any page versions within the
+layer's Segment and range of LSNs.
+
+There are two kinds of layers, in-memory and on-disk layers. In-memory
+layers are used to ingest incoming WAL, and provide fast access
+to the recent page versions. On-disk layers are stored as files on disk, and
+are immutable. See [pageserver-storage.md](./pageserver-storage.md) for more.
+
+### Layer file (on-disk layer)
+
+Layered repository on-disk format is based on immutable files.  The
+files are called "layer files". There are two kinds of layer files:
+image files and delta files. An image file contains a "snapshot" of a
+range of keys at a particular LSN, and a delta file contains WAL
+records applicable to a range of keys, in a range of LSNs.
+
+### Layer map
+
+The layer map tracks what layers exist in a timeline.
+
+### Layered repository
+
+Neon repository implementation that keeps data in layers.
+### LSN
+
+The Log Sequence Number (LSN) is a unique identifier of the WAL record[] in the WAL log.
+The insert position is a byte offset into the logs, increasing monotonically with each new record.
+Internally, an LSN is a 64-bit integer, representing a byte position in the write-ahead log stream.
+It is printed as two hexadecimal numbers of up to 8 digits each, separated by a slash.
+Check also [PostgreSQL doc about pg_lsn type](https://www.postgresql.org/docs/devel/datatype-pg-lsn.html)
+Values can be compared to calculate the volume of WAL data that separates them, so they are used to measure the progress of replication and recovery.
+
+In Postgres and Neon LSNs are used to describe certain points in WAL handling.
+
+PostgreSQL LSNs and functions to monitor them:
+* `pg_current_wal_insert_lsn()` - Returns the current write-ahead log insert location.
+* `pg_current_wal_lsn()` - Returns the current write-ahead log write location.
+* `pg_current_wal_flush_lsn()` - Returns the current write-ahead log flush location.
+* `pg_last_wal_receive_lsn()` - Returns the last write-ahead log location that has been received and synced to disk by streaming replication. While streaming replication is in progress this will increase monotonically.
+* `pg_last_wal_replay_lsn ()` - Returns the last write-ahead log location that has been replayed during recovery. If recovery is still in progress this will increase monotonically.
+[source PostgreSQL documentation](https://www.postgresql.org/docs/devel/functions-admin.html):
+
+Neon safekeeper LSNs. See [safekeeper protocol section](safekeeper-protocol.md) for more information.
+* `CommitLSN`: position in WAL confirmed by quorum safekeepers.
+* `RestartLSN`: position in WAL confirmed by all safekeepers.
+* `FlushLSN`: part of WAL persisted to the disk by safekeeper.
+* `VCL`: the largest LSN for which we can guarantee availability of all prior records.
+
+Neon pageserver LSNs:
+* `last_record_lsn` - the end of last processed WAL record.
+* `disk_consistent_lsn` - data is known to be fully flushed and fsync'd to local disk on pageserver up to this LSN.
+* `remote_consistent_lsn` - The last LSN that is synced to remote storage and is guaranteed to survive pageserver crash.
+TODO: use this name consistently in remote storage code. Now `disk_consistent_lsn` is used and meaning depends on the context.
+* `ancestor_lsn` - LSN of the branch point (the LSN at which this branch was created)
+
+TODO: add table that describes mapping between PostgreSQL (compute), safekeeper and pageserver LSNs.
+### Page (block)
+
+The basic structure used to store relation data. All pages are of the same size.
+This is the unit of data exchange between compute node and pageserver.
+
+### Pageserver
+
+Neon storage engine: repositories + wal receiver + page service + wal redo.
+
+### Page service
+
+The Page Service listens for GetPage@LSN requests from the Compute Nodes,
+and responds with pages from the repository.
+
+
+### PITR (Point-in-time-recovery)
+
+PostgreSQL's ability to restore up to a specified LSN.
+
+### Primary node
+
+
+### Proxy
+
+Postgres protocol proxy/router.
+This service listens psql port, can check auth via external service
+and create new databases and accounts (control plane API in our case).
+
+### Relation
+
+The generic term in PostgreSQL for all objects in a database that have a name and a list of attributes defined in a specific order.
+
+### Replication slot
+
+
+### Replica node
+
+
+### Repository
+
+Repository stores multiple timelines, forked off from the same initial call to 'initdb'
+and has associated WAL redo service.
+One repository corresponds to one Tenant.
+
+### Retention policy
+
+How much history do we need to keep around for PITR and read-only nodes?
+
+### Segment
+
+A physical file that stores data for a given relation. File segments are
+limited in size by a compile-time setting (1 gigabyte by default), so if a
+relation exceeds that size, it is split into multiple segments.
+
+### SLRU
+
+SLRUs include pg_clog, pg_multixact/members, and
+pg_multixact/offsets. There are other SLRUs in PostgreSQL, but
+they don't need to be stored permanently (e.g. pg_subtrans),
+or we do not support them in neon yet (pg_commit_ts).
+
+### Tenant (Multitenancy)
+Tenant represents a single customer, interacting with Neon.
+Wal redo[] activity, timelines[], layers[] are managed for each tenant independently.
+One pageserver[] can serve multiple tenants at once.
+One safekeeper
+
+See `docs/multitenancy.md` for more.
+
+### Timeline
+
+Timeline accepts page changes and serves get_page_at_lsn() and
+get_rel_size() requests. The term "timeline" is used internally
+in the system, but to users they are exposed as "branches", with
+human-friendly names.
+
+NOTE: this has nothing to do with PostgreSQL WAL timelines.
+
+### XLOG
+
+PostgreSQL alias for WAL[].
+
+### WAL (Write-ahead log)
+
+The journal that keeps track of the changes in the database cluster as user- and system-invoked operations take place. It comprises many individual WAL records[] written sequentially to WAL files[].
+
+### WAL acceptor, WAL proposer
+
+In the context of the consensus algorithm, the Postgres
+compute node is also known as the WAL proposer, and the safekeeper is also known
+as the acceptor. Those are the standard terms in the Paxos algorithm.
+
+### WAL receiver (WAL decoder)
+
+The WAL receiver connects to the external WAL safekeeping service (or
+directly to the primary) using PostgreSQL physical streaming
+replication, and continuously receives WAL. It decodes the WAL records,
+and stores them to the repository.
+
+We keep one WAL receiver active per timeline.
+
+### WAL record
+
+A low-level description of an individual data change.
+
+### WAL redo
+
+A service that runs PostgreSQL in a special wal_redo mode
+to apply given WAL records over an old page image and return new page image.
+
+### WAL safekeeper
+
+One node that participates in the quorum. All the safekeepers
+together form the WAL service.
+
+### WAL segment (WAL file)
+
+Also known as WAL segment or WAL segment file. Each of the sequentially-numbered files that provide storage space for WAL. The files are all of the same predefined size and are written in sequential order, interspersing changes as they occur in multiple simultaneous sessions.
+
+### WAL service
+
+The service as whole that ensures that WAL is stored durably.
+
+### Web console
+
--- a/docs/multitenancy.md
+++ b/docs/multitenancy.md
@@ -0,0 +1,59 @@
+## Multitenancy
+
+### Overview
+
+Zenith supports multitenancy. One pageserver can serve multiple tenants at once. Tenants can be managed via zenith CLI. During page server setup tenant can be created using ```zenith init --create-tenant``` Also tenants can be added into the system on the fly without pageserver restart. This can be done using the following cli command: ```zenith tenant create``` Tenants use random identifiers which can be represented as a 32 symbols hexadecimal string. So zenith tenant create accepts desired tenant id as an optional argument. The concept of timelines/branches is working independently per tenant.
+
+### Tenants in other commands
+
+By default during `zenith init` new tenant is created on the pageserver. Newly created tenant's id is saved to cli config, so other commands can use it automatically if no direct argument `--tenantid=<tenantid>` is provided. So generally tenantid more frequently appears in internal pageserver interface. Its commands take tenantid argument to distinguish to which tenant operation should be applied. CLI support creation of new tenants.
+
+Examples for cli:
+
+```sh
+zenith tenant list
+
+zenith tenant create // generates new id
+
+zenith tenant create ee6016ec31116c1b7c33dfdfca38892f
+
+zenith pg create main // default tenant from zenith init
+
+zenith pg create main --tenantid=ee6016ec31116c1b7c33dfdfca38892f
+
+zenith branch --tenantid=ee6016ec31116c1b7c33dfdfca38892f
+```
+
+### Data layout
+
+On the page server tenants introduce one level of indirection, so data directory structured the following way:
+```
+<pageserver working directory>
+├── pageserver.log
+├── pageserver.pid
+├── pageserver.toml
+└── tenants
+   ├── 537cffa58a4fa557e49e19951b5a9d6b
+   ├── de182bc61fb11a5a6b390a8aed3a804a
+   └── ee6016ec31116c1b7c33dfdfca38891f
+```
+Wal redo activity and timelines are managed for each tenant independently.
+
+For local environment used for example in tests there also new level of indirection for tenants. It touches `pgdatadirs` directory. Now it contains `tenants` subdirectory so the structure looks the following way:
+
+```
+pgdatadirs
+└── tenants
+   ├── de182bc61fb11a5a6b390a8aed3a804a
+   │  └── main
+   └── ee6016ec31116c1b7c33dfdfca38892f
+      └── main
+```
+
+### Changes to postgres
+
+Tenant id is passed to postgres via GUC the same way as the timeline. Tenant id is added to commands issued to pageserver, namely: pagestream, callmemaybe. Tenant id is also exists in ServerInfo structure, this is needed to pass the value to wal receiver to be able to forward it to the pageserver.
+
+### Safety
+
+For now particular tenant can only appear on a particular pageserver. Set of safekeepers are also pinned to particular (tenantid, timeline) pair so there can only be one writer for particular (tenantid, timeline).
--- a/docs/pageserver-page-service.md
+++ b/docs/pageserver-page-service.md
@@ -0,0 +1,9 @@
+# Page Service
+
+The Page Service listens for GetPage@LSN requests from the Compute Nodes,
+and responds with pages from the repository. On each GetPage@LSN request,
+it calls into the Repository function
+
+A separate thread is spawned for each incoming connection to the page
+service. The page service uses the libpq protocol to communicate with
+the client. The client is a Compute Postgres instance.
--- a/docs/pageserver-pagecache.md
+++ b/docs/pageserver-pagecache.md
@@ -0,0 +1,8 @@
+# Page cache
+
+TODO:
+
+- shared across tenants
+- store pages from layer files
+- store pages from "in-memory layer"
+- store materialized pages
--- a/docs/pageserver-processing-getpage.md
+++ b/docs/pageserver-processing-getpage.md
@@ -0,0 +1,4 @@
+# Processing a GetPage request
+
+TODO:
+- sequence diagram that shows how a GetPage@LSN request is processed
--- a/docs/pageserver-processing-wal.md
+++ b/docs/pageserver-processing-wal.md
@@ -0,0 +1,5 @@
+# Processing WAL
+
+TODO:
+- diagram that shows how incoming WAL is processed
+- explain durability, what is fsync'd when, disk_consistent_lsn
--- a/docs/pageserver-services.md
+++ b/docs/pageserver-services.md
@@ -0,0 +1,163 @@
+# Services
+
+The Page Server consists of multiple threads that operate on a shared
+repository of page versions:
+```
+                                           | WAL
+                                           V
+                                   +--------------+
+                                   |              |
+                                   | WAL receiver |
+                                   |              |
+                                   +--------------+
+                                                                                 ......
+                  +---------+                              +--------+            .    .
+                  |         |                              |        |            .    .
+ GetPage@LSN      |         |                              | backup |  ------->  . S3 .
+------------->    |  Page   |         repository           |        |            .    .
+                  | Service |                              +--------+            .    .
+   page           |         |                                                    ......
+<-------------    |         |
+                  +---------+     +-----------+     +--------------------+
+                                  | WAL redo  |     | Checkpointing,     |
+                  +----------+    | processes |     | Garbage collection |
+                  |          |    +-----------+     +--------------------+
+                  |   HTTP   |
+                  | mgmt API |
+                  |          |
+                  +----------+
+
+Legend:
+
+--+
+|  |   A thread or multi-threaded service
+--+
+
+--->   Data flow
+<---
+```
+
+## Page Service
+
+The Page Service listens for GetPage@LSN requests from the Compute Nodes,
+and responds with pages from the repository. On each GetPage@LSN request,
+it calls into the Repository function
+
+A separate thread is spawned for each incoming connection to the page
+service. The page service uses the libpq protocol to communicate with
+the client. The client is a Compute Postgres instance.
+
+## WAL Receiver
+
+The WAL receiver connects to the external WAL safekeeping service
+using PostgreSQL physical streaming replication, and continuously
+receives WAL. It decodes the WAL records, and stores them to the
+repository.
+
+
+## Backup service
+
+The backup service, responsible for storing pageserver recovery data externally.
+
+Currently, pageserver stores its files in a filesystem directory it's pointed to.
+That working directory could be rather ephemeral for such cases as "a pageserver pod running in k8s with no persistent volumes attached".
+Therefore, the server interacts with external, more reliable storage to back up and restore its state.
+
+The code for storage support is extensible and can support arbitrary ones as long as they implement a certain Rust trait.
+There are the following implementations present:
+* local filesystem — to use in tests mainly
+* AWS S3           - to use in production
+
+The backup service is disabled by default and can be enabled to interact with a single remote storage.
+
+CLI examples:
+* Local FS: `${PAGESERVER_BIN} -c "remote_storage={local_path='/some/local/path/'}"`
+* AWS S3  : `env AWS_ACCESS_KEY_ID='SOMEKEYAAAAASADSAH*#' AWS_SECRET_ACCESS_KEY='SOMEsEcReTsd292v' ${PAGESERVER_BIN} -c "remote_storage={bucket_name='some-sample-bucket',bucket_region='eu-north-1', prefix_in_bucket='/test_prefix/'}"`
+
+For Amazon AWS S3, a key id and secret access key could be located in `~/.aws/credentials` if awscli was ever configured to work with the desired bucket, on the AWS Settings page for a certain user. Also note, that the bucket names does not contain any protocols when used on AWS.
+For local S3 installations, refer to the their documentation for name format and credentials.
+
+Similar to other pageserver settings, toml config file can be used to configure either of the storages as backup targets.
+Required sections are:
+
+```toml
+[remote_storage]
+local_path = '/Users/someonetoignore/Downloads/tmp_dir/'
+```
+
+or
+
+```toml
+[remote_storage]
+bucket_name = 'some-sample-bucket'
+bucket_region = 'eu-north-1'
+prefix_in_bucket = '/test_prefix/'
+```
+
+`AWS_SECRET_ACCESS_KEY` and `AWS_ACCESS_KEY_ID` env variables can be used to specify the S3 credentials if needed.
+
+
+## Repository background tasks
+
+The Repository also has a few different background threads and tokio tasks that perform
+background duties like dumping accumulated WAL data from memory to disk, reorganizing
+files for performance (compaction), and garbage collecting old files.
+
+
+Repository
+----------
+
+The repository stores all the page versions, or WAL records needed to
+reconstruct them. Each tenant has a separate Repository, which is
+stored in the .neon/tenants/<tenantid> directory.
+
+Repository is an abstract trait, defined in `repository.rs`. It is
+implemented by the LayeredRepository object in
+`layered_repository.rs`. There is only that one implementation of the
+Repository trait, but it's still a useful abstraction that keeps the
+interface for the low-level storage functionality clean. The layered
+storage format is described in [pageserver-storage.md](./pageserver-storage.md).
+
+Each repository consists of multiple Timelines. Timeline is a
+workhorse that accepts page changes from the WAL, and serves
+get_page_at_lsn() and get_rel_size() requests. Note: this has nothing
+to do with PostgreSQL WAL timeline. The term "timeline" is mostly
+interchangeable with "branch", there is a one-to-one mapping from
+branch to timeline. A timeline has a unique ID within the tenant,
+represented as 16-byte hex string that never changes, whereas a
+branch is a user-given name for a timeline.
+
+Each repository also has a WAL redo manager associated with it, see
+`walredo.rs`. The WAL redo manager is used to replay PostgreSQL WAL
+records, whenever we need to reconstruct a page version from WAL to
+satisfy a GetPage@LSN request, or to avoid accumulating too much WAL
+for a page. The WAL redo manager uses a Postgres process running in
+special Neon wal-redo mode to do the actual WAL redo, and
+communicates with the process using a pipe.
+
+
+Checkpointing / Garbage Collection
+----------------------------------
+
+Periodically, the checkpointer thread wakes up and performs housekeeping
+duties on the repository. It has two duties:
+
+### Checkpointing
+
+Flush WAL that has accumulated in memory to disk, so that the old WAL
+can be truncated away in the WAL safekeepers. Also, to free up memory
+for receiving new WAL. This process is called "checkpointing". It's
+similar to checkpointing in PostgreSQL or other DBMSs, but in the page
+server, checkpointing happens on a per-segment basis.
+
+### Garbage collection
+
+Remove old on-disk layer files that are no longer needed according to the
+PITR retention policy
+
+
+
+TODO: Sharding
+--------------------
+
+We should be able to run multiple Page Servers that handle sharded data.
--- a/docs/pageserver-storage.md
+++ b/docs/pageserver-storage.md
@@ -0,0 +1,518 @@
+# Pageserver storage
+
+The main responsibility of the Page Server is to process the incoming WAL, and
+reprocess it into a format that allows reasonably quick access to any page
+version. The page server slices the incoming WAL per relation and page, and
+packages the sliced WAL into suitably-sized "layer files". The layer files
+contain all the history of the database, back to some reasonable retention
+period. This system replaces the base backups and the WAL archive used in a
+traditional PostgreSQL installation. The layer files are immutable, they are not
+modified in-place after creation. New layer files are created for new incoming
+WAL, and old layer files are removed when they are no longer needed.
+
+The on-disk format is based on immutable files. The page server receives a
+stream of incoming WAL, parses the WAL records to determine which pages they
+apply to, and accumulates the incoming changes in memory. Whenever enough WAL
+has been accumulated in memory, it is written out to a new immutable file. That
+process accumulates "L0 delta files" on disk. When enough L0 files have been
+accumulated, they are merged and re-partitioned into L1 files, and old files
+that are no longer needed are removed by Garbage Collection (GC).
+
+The incoming WAL contains updates to arbitrary pages in the system. The
+distribution depends on the workload: the updates could be totally random, or
+there could be a long stream of updates to a single relation when data is bulk
+loaded, for example, or something in between.
+
+```
+Cloud Storage                   Page Server                           Safekeeper
+                        L1               L0             Memory            WAL
+
+----+               +----+----+
+|AAAA|               |AAAA|AAAA|      +---+-----+         |
+----+               +----+----+      |   |     |         |AA
+|BBBB|               |BBBB|BBBB|      |BB | AA  |         |BB
+----+----+          +----+----+      |C  | BB  |         |CC
+|CCCC|CCCC|  <----   |CCCC|CCCC| <--- |D  | CC  |  <---   |DDD     <----   ADEBAABED
+----+----+          +----+----+      |   | DDD |         |E
+|DDDD|DDDD|          |DDDD|DDDD|      |E  |     |         |
+----+----+          +----+----+      |   |     |
+|EEEE|               |EEEE|EEEE|      +---+-----+
+----+               +----+----+
+```
+
+In this illustration, WAL is received as a stream from the Safekeeper, from the
+right.  It is immediately captured by the page server and stored quickly in
+memory. The page server memory can be thought of as a quick "reorder buffer",
+used to hold the incoming WAL and reorder it so that we keep the WAL records for
+the same page and relation close to each other.
+
+From the page server memory, whenever enough WAL has been accumulated, it is flushed
+to disk into a new L0 layer file, and the memory is released.
+
+When enough L0 files have been accumulated, they are merged together and sliced
+per key-space, producing a new set of files where each file contains a more
+narrow key range, but larger LSN range.
+
+From the local disk, the layers are further copied to Cloud Storage, for
+long-term archival. After a layer has been copied to Cloud Storage, it can be
+removed from local disk, although we currently keep everything locally for fast
+access. If a layer is needed that isn't found locally, it is fetched from Cloud
+Storage and stored in local disk. L0 and L1 files are both uploaded to Cloud
+Storage.
+
+# Layer map
+
+The LayerMap tracks what layers exist in a timeline.
+
+Currently, the layer map is just a resizeable array (Vec). On a GetPage@LSN or
+other read request, the layer map scans through the array to find the right layer
+that contains the data for the requested page. The read-code in LayeredTimeline
+is aware of the ancestor, and returns data from the ancestor timeline if it's
+not found on the current timeline.
+
+# Different kinds of layers
+
+A layer can be in different states:
+
+- Open - a layer where new WAL records can be appended to.
+- Closed - a layer that is read-only, no new WAL records can be appended to it
+- Historic: synonym for closed
+- InMemory: A layer that needs to be rebuilt from WAL on pageserver start.
+To avoid OOM errors, InMemory layers can be spilled to disk into ephemeral file.
+- OnDisk: A layer that is stored on disk. If its end-LSN is older than
+  disk_consistent_lsn, it is known to be fully flushed and fsync'd to local disk.
+- Frozen layer: an in-memory layer that is Closed.
+
+TODO: Clarify the difference between Closed, Historic and Frozen.
+
+There are two kinds of OnDisk layers:
+- ImageLayer represents a snapshot of all the keys in a particular range, at one
+  particular LSN. Any keys that are not present in the ImageLayer are known not
+  to exist at that LSN.
+- DeltaLayer represents a collection of WAL records or page images in a range of
+  LSNs, for a range of keys.
+
+# Layer life cycle
+
+LSN range defined by start_lsn and end_lsn:
+- start_lsn is inclusive.
+- end_lsn is exclusive.
+
+For an open in-memory layer, the end_lsn is MAX_LSN. For a frozen in-memory
+layer or a delta layer, it is a valid end bound. An image layer represents
+snapshot at one LSN, so end_lsn is always the snapshot LSN + 1
+
+Every layer starts its life as an Open In-Memory layer. When the page server
+receives the first WAL record for a timeline, it creates a new In-Memory layer
+for it, and puts it to the layer map. Later, when the layer becomes full, its
+contents are written to disk, as an on-disk layers.
+
+Flushing a layer is a two-step process: First, the layer is marked as closed, so
+that it no longer accepts new WAL records, and a new in-memory layer is created
+to hold any WAL after that point. After this first step, the layer is a Closed
+InMemory state. This first step is called "freezing" the layer.
+
+In the second step, a new Delta layers is created, containing all the data from
+the Frozen InMemory layer. When it has been created and flushed to disk, the
+original frozen layer is replaced with the new layers in the layer map, and the
+original frozen layer is dropped, releasing the memory.
+
+# Layer files (On-disk layers)
+
+The files are called "layer files". Each layer file covers a range of keys, and
+a range of LSNs (or a single LSN, in case of image layers). You can think of it
+as a rectangle in the two-dimensional key-LSN space. The layer files for each
+timeline are stored in the timeline's subdirectory under
+`.neon/tenants/<tenantid>/timelines`.
+
+There are two kind of layer files: images, and delta layers. An image file
+contains a snapshot of all keys at a particular LSN, whereas a delta file
+contains modifications to a segment - mostly in the form of WAL records - in a
+range of LSN.
+
+image file:
+
+```
+    000000067F000032BE0000400000000070B6-000000067F000032BE0000400000000080B6__00000000346BC568
+              start key                          end key                           LSN
+```
+
+
+The first parts define the key range that the layer covers. See
+pgdatadir_mapping.rs for how the key space is used. The last part is the LSN.
+
+delta file:
+
+Delta files are named similarly, but they cover a range of LSNs:
+
+```
+    000000067F000032BE0000400000000020B6-000000067F000032BE0000400000000030B6__000000578C6B29-0000000057A50051
+              start key                          end key                          start LSN     end LSN
+```
+
+A delta file contains all the key-values in the key-range that were updated in
+the LSN range. If a key has not been modified, there is no trace of it in the
+delta layer.
+
+
+A delta layer file can cover a part of the overall key space, as in the previous
+example, or the whole key range like this:
+
+```
+    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__000000578C6B29-0000000057A50051
+```
+
+A file that covers the whole key range is called a L0 file (Level 0), while a
+file that covers only part of the key range is called a L1 file. The "level" of
+a file is not explicitly stored anywhere, you can only distinguish them by
+looking at the key range that a file covers. The read-path doesn't need to
+treat L0 and L1 files any differently.
+
+
+## Notation used in this document
+
+FIXME: This is somewhat obsolete, the layer files cover a key-range rather than
+a particular relation nowadays. However, the description on how you find a page
+version, and how branching and GC works is still valid.
+
+The full path of a delta file looks like this:
+
+```
+    .neon/tenants/941ddc8604413b88b3d208bddf90396c/timelines/4af489b06af8eed9e27a841775616962/rel_1663_13990_2609_0_10_000000000169C348_0000000001702000
+```
+
+For simplicity, the examples below use a simplified notation for the
+paths.  The tenant ID is left out, the timeline ID is replaced with
+the human-readable branch name, and spcnode+dbnode+relnode+forkum+segno
+with a human-readable table name. The LSNs are also shorter. For
+example, a base image file at LSN 100 and a delta file between 100-200
+for 'orders' table on 'main' branch is represented like this:
+
+```
+    main/orders_100
+    main/orders_100_200
+```
+
+
+# Creating layer files
+
+Let's start with a simple example with a system that contains one
+branch called 'main' and two tables, 'orders' and 'customers'. The end
+of WAL is currently at LSN 250. In this starting situation, you would
+have these files on disk:
+
+```
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/customers_100
+	main/customers_100_200
+	main/customers_200
+```
+
+In addition to those files, the recent changes between LSN 200 and the
+end of WAL at 250 are kept in memory. If the page server crashes, the
+latest records between 200-250 need to be re-read from the WAL.
+
+Whenever enough WAL has been accumulated in memory, the page server
+writes out the changes in memory into new layer files. This process
+is called "checkpointing" (not to be confused with the PostgreSQL
+checkpoints, that's a different thing). The page server only creates
+layer files for relations that have been modified since the last
+checkpoint. For example, if the current end of WAL is at LSN 450, and
+the last checkpoint happened at LSN 400 but there hasn't been any
+recent changes to 'customers' table, you would have these files on
+disk:
+
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+	main/orders_300_400
+	main/orders_400
+	main/customers_100
+	main/customers_100_200
+	main/customers_200
+
+If the customers table is modified later, a new file is created for it
+at the next checkpoint. The new file will cover the "gap" from the
+last layer file, so the LSN ranges are always contiguous:
+
+```
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+	main/orders_300_400
+	main/orders_400
+	main/customers_100
+	main/customers_100_200
+	main/customers_200
+	main/customers_200_500
+	main/customers_500
+```
+
+## Reading page versions
+
+Whenever a GetPage@LSN request comes in from the compute node, the
+page server needs to reconstruct the requested page, as it was at the
+requested LSN. To do that, the page server first checks the recent
+in-memory layer; if the requested page version is found there, it can
+be returned immediately without looking at the files on
+disk. Otherwise the page server needs to locate the layer file that
+contains the requested page version.
+
+For example, if a request comes in for table 'orders' at LSN 250, the
+page server would load the 'main/orders_200_300' file into memory, and
+reconstruct and return the requested page from it, as it was at
+LSN 250. Because the layer file consists of a full image of the
+relation at the start LSN and the WAL, reconstructing the page
+involves replaying any WAL records applicable to the page between LSNs
+200-250, starting from the base image at LSN 200.
+
+# Multiple branches
+
+Imagine that a child branch is created at LSN 250:
+
+```
+            @250
+    ----main--+-------------------------->
+               \
+                +---child-------------->
+```
+
+
+Then, the 'orders' table is updated differently on the 'main' and
+'child' branches. You now have this situation on disk:
+
+```
+    main/orders_100
+    main/orders_100_200
+    main/orders_200
+    main/orders_200_300
+    main/orders_300
+    main/orders_300_400
+    main/orders_400
+    main/customers_100
+    main/customers_100_200
+    main/customers_200
+    child/orders_250_300
+    child/orders_300
+    child/orders_300_400
+    child/orders_400
+```
+
+Because the 'customers' table hasn't been modified on the child
+branch, there is no file for it there. If you request a page for it on
+the 'child' branch, the page server will not find any layer file
+for it in the 'child' directory, so it will recurse to look into the
+parent 'main' branch instead.
+
+From the 'child' branch's point of view, the history for each relation
+is linear, and the request's LSN identifies unambiguously which file
+you need to look at. For example, the history for the 'orders' table
+on the 'main' branch consists of these files:
+
+```
+    main/orders_100
+    main/orders_100_200
+    main/orders_200
+    main/orders_200_300
+    main/orders_300
+    main/orders_300_400
+    main/orders_400
+```
+
+And from the 'child' branch's point of view, it consists of these
+files:
+
+```
+    main/orders_100
+    main/orders_100_200
+    main/orders_200
+    main/orders_200_300
+    child/orders_250_300
+    child/orders_300
+    child/orders_300_400
+    child/orders_400
+```
+
+The branch metadata includes the point where the child branch was
+created, LSN 250. If a page request comes with LSN 275, we read the
+page version from the 'child/orders_250_300' file. We might also
+need to reconstruct the page version as it was at LSN 250, in order
+to replay the WAL up to LSN 275, using 'main/orders_200_300' and
+'main/orders_200'. The page versions between 250-300 in the
+'main/orders_200_300' file are ignored when operating on the child
+branch.
+
+Note: It doesn't make any difference if the child branch is created
+when the end of the main branch was at LSN 250, or later when the tip of
+the main branch had already moved on. The latter case, creating a
+branch at a historic LSN, is how we support PITR in Zenith.
+
+
+# Garbage collection
+
+In this scheme, we keep creating new layer files over time. We also
+need a mechanism to remove old files that are no longer needed,
+because disk space isn't infinite.
+
+What files are still needed? Currently, the page server supports PITR
+and branching from any branch at any LSN that is "recent enough" from
+the tip of the branch.  "Recent enough" is defined as an LSN horizon,
+which by default is 64 MB.  (See DEFAULT_GC_HORIZON). For this
+example, let's assume that the LSN horizon is 150 units.
+
+Let's look at the single branch scenario again. Imagine that the end
+of the branch is LSN 525, so that the GC horizon is currently at
+525-150 = 375
+
+```
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+	main/orders_300_400
+	main/orders_400
+	main/orders_400_500
+	main/orders_500
+	main/customers_100
+	main/customers_100_200
+	main/customers_200
+```
+
+We can remove the following files because the end LSNs of those files are
+older than GC horizon 375, and there are more recent layer files for the
+table:
+
+```
+	main/orders_100       DELETE
+	main/orders_100_200   DELETE
+	main/orders_200       DELETE
+	main/orders_200_300   DELETE
+	main/orders_300       STILL NEEDED BY orders_300_400
+	main/orders_300_400   KEEP, NEWER THAN GC HORIZON
+	main/orders_400       .. 
+	main/orders_400_500   .. 
+	main/orders_500       .. 
+	main/customers_100      DELETE
+	main/customers_100_200  DELETE
+	main/customers_200      KEEP, NO NEWER VERSION
+```
+
+'main/customers_200' is old enough, but it cannot be
+removed because there is no newer layer file for the table.
+
+Things get slightly more complicated with multiple branches. All of
+the above still holds, but in addition to recent files we must also
+retain older snapshot files that are still needed by child branches.
+For example, if child branch is created at LSN 150, and the 'customers'
+table is updated on the branch, you would have these files:
+
+```
+	main/orders_100        KEEP, NEEDED BY child BRANCH
+	main/orders_100_200    KEEP, NEEDED BY child BRANCH
+	main/orders_200        DELETE
+	main/orders_200_300    DELETE
+	main/orders_300        KEEP, NEWER THAN GC HORIZON
+	main/orders_300_400    KEEP, NEWER THAN GC HORIZON
+	main/orders_400        KEEP, NEWER THAN GC HORIZON
+	main/orders_400_500    KEEP, NEWER THAN GC HORIZON
+	main/orders_500        KEEP, NEWER THAN GC HORIZON
+	main/customers_100       DELETE
+	main/customers_100_200   DELETE
+	main/customers_200       KEEP, NO NEWER VERSION
+	child/customers_150_300  DELETE
+	child/customers_300      KEEP, NO NEWER VERSION
+```
+
+In this situation, 'main/orders_100' and 'main/orders_100_200' cannot
+be removed, even though they are older than the GC horizon, because
+they are still needed by the child branch. 'main/orders_200'
+and 'main/orders_200_300' can still be removed.
+
+If 'orders' is modified later on the 'child' branch, we will create a
+new base image and delta file for it on the child:
+
+```
+	main/orders_100
+	main/orders_100_200
+
+	main/orders_300
+	main/orders_300_400
+	main/orders_400
+	main/orders_400_500
+	main/orders_500
+	main/customers_200
+	child/customers_300
+	child/orders_150_400
+	child/orders_400
+```
+
+After this, the 'main/orders_100' and 'main/orders_100_200' file could
+be removed. It is no longer needed by the child branch, because there
+is a newer layer file there. TODO: This optimization hasn't been
+implemented! The GC algorithm will currently keep the file on the
+'main' branch anyway, for as long as the child branch exists.
+
+TODO:
+Describe GC and checkpoint interval settings.
+
+# TODO: On LSN ranges
+
+In principle, each relation can be checkpointed separately, i.e. the
+LSN ranges of the files don't need to line up. So this would be legal:
+
+```
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+	main/orders_300_400
+	main/orders_400
+	main/customers_150
+	main/customers_150_250
+	main/customers_250
+	main/customers_250_500
+	main/customers_500
+```
+
+However, the code currently always checkpoints all relations together.
+So that situation doesn't arise in practice.
+
+It would also be OK to have overlapping LSN ranges for the same relation:
+
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+	main/orders_250_350
+	main/orders_350
+	main/orders_300_400
+	main/orders_400
+
+The code that reads the layer files should cope with this, but this
+situation doesn't arise either, because the checkpointing code never
+does that.  It could be useful, however, as a transient state when
+garbage collecting around branch points, or explicit recovery
+points. For example, if we start with this:
+
+```
+	main/orders_100
+	main/orders_100_200
+	main/orders_200
+	main/orders_200_300
+	main/orders_300
+```
+
+And there is a branch or explicit recovery point at LSN 150, we could
+replace 'main/orders_100_200' with 'main/orders_150' to keep a
+layer only at that exact point that's still needed, removing the
+other page versions around it. But such compaction has not been
+implemented yet.
--- a/docs/pageserver-tenant-migration.md
+++ b/docs/pageserver-tenant-migration.md
@@ -0,0 +1,22 @@
+## Pageserver tenant migration
+
+### Overview
+
+This feature allows to migrate a timeline from one pageserver to another by utilizing remote storage capability.
+
+### Migration process
+
+Pageserver implements two new http handlers: timeline attach and timeline detach.
+Timeline migration is performed in a following way:
+1. Timeline attach is called on a target pageserver. This asks pageserver to download latest checkpoint uploaded to s3.
+2. For now it is necessary to manually initialize replication stream via callmemaybe call so target pageserver initializes replication from safekeeper (it is desired to avoid this and initialize replication directly in attach handler, but this requires some refactoring (probably [#997](https://github.com/zenithdb/zenith/issues/997)/[#1049](https://github.com/zenithdb/zenith/issues/1049))
+3. Replication state can be tracked via timeline detail pageserver call.
+4. Compute node should be restarted with new pageserver connection string. Issue with multiple compute nodes for one timeline is handled on the safekeeper consensus level. So this is not a problem here.Currently responsibility for rescheduling the compute with updated config lies on external coordinator (console).
+5. Timeline is detached from old pageserver. On disk data is removed.
+
+
+### Implementation details
+
+Now safekeeper needs to track which pageserver it is replicating to. This introduces complications into replication code:
+* We need to distinguish different pageservers (now this is done by connection string which is imperfect and is covered here: https://github.com/zenithdb/zenith/issues/1105). Callmemaybe subscription management also needs to track that (this is already implemented).
+* We need to track which pageserver is the primary. This is needed to avoid reconnections to non primary pageservers. Because we shouldn't reconnect to them when they decide to stop their walreceiver. I e this can appear when there is a load on the compute and we are trying to detach timeline from old pageserver. In this case callmemaybe will try to reconnect to it because replication termination condition is not met (page server with active compute could never catch up to the latest lsn, so there is always some wal tail)
--- a/docs/pageserver-thread-mgmt.md
+++ b/docs/pageserver-thread-mgmt.md
@@ -0,0 +1,26 @@
+## Thread management
+
+Each thread in the system is tracked by the `thread_mgr` module. It
+maintains a registry of threads, and which tenant or timeline they are
+operating on. This is used for safe shutdown of a tenant, or the whole
+system.
+
+### Handling shutdown
+
+When a tenant or timeline is deleted, we need to shut down all threads
+operating on it, before deleting the data on disk. A thread registered
+in the thread registry can check if it has been requested to shut down,
+by calling `is_shutdown_requested()`. For async operations, there's also
+a `shudown_watcher()` async task that can be used to wake up on shutdown.
+
+### Sync vs async
+
+The primary programming model in the page server is synchronous,
+blocking code. However, there are some places where async code is
+used. Be very careful when mixing sync and async code.
+
+Async is primarily used to wait for incoming data on network
+connections. For example, all WAL receivers have a shared thread pool,
+with one async Task for each connection. Once a piece of WAL has been
+received from the network, the thread calls the blocking functions in
+the Repository to process the WAL.
--- a/docs/pageserver-walredo.md
+++ b/docs/pageserver-walredo.md
@@ -0,0 +1,77 @@
+# WAL Redo
+
+To reconstruct a particular page version from an image of the page and
+some WAL records, the pageserver needs to replay the WAL records. This
+happens on-demand, when a GetPage@LSN request comes in, or as part of
+background jobs that reorganize data for faster access.
+
+It's important that data cannot leak from one tenant to another, and
+that a corrupt WAL record on one timeline doesn't affect other tenants
+or timelines.
+
+## Multi-tenant security
+
+If you have direct access to the WAL directory, or if you have
+superuser access to a running PostgreSQL server, it's easy to
+construct a malicious or corrupt WAL record that causes the WAL redo
+functions to crash, or to execute arbitrary code. That is not a
+security problem for PostgreSQL; if you have superuser access, you
+have full access to the system anyway.
+
+The Neon pageserver, however, is multi-tenant. It needs to execute WAL
+belonging to different tenants in the same system, and malicious WAL
+in one tenant must not affect other tenants.
+
+A separate WAL redo process is launched for each tenant, and the
+process uses the seccomp(2) system call to restrict its access to the
+bare minimum needed to replay WAL records. The process does not have
+access to the filesystem or network. It can only communicate with the
+parent pageserver process through a pipe.
+
+If an attacker creates a malicious WAL record and injects it into the
+WAL stream of a timeline, he can take control of the WAL redo process
+in the pageserver. However, the WAL redo process cannot access the
+rest of the system. And because there is a separate WAL redo process
+for each tenant, the hijacked WAL redo process can only see WAL and
+data belonging to the same tenant, which the attacker would have
+access to anyway.
+
+## WAL-redo process communication
+
+The WAL redo process runs the 'postgres' executable, launched with a
+Neon-specific command-line option to put it into WAL-redo process
+mode.  The pageserver controls the lifetime of the WAL redo processes,
+launching them as needed. If a tenant is detached from the pageserver,
+any WAL redo processes for that tenant are killed.
+
+The pageserver communicates with each WAL redo process over its
+stdin/stdout/stderr. It works in request-response model with a simple
+custom protocol, described in walredo.rs. To replay a set of WAL
+records for a page, the pageserver sends the "before" image of the
+page and the WAL records over 'stdin', followed by a command to
+perform the replay. The WAL redo process responds with an "after"
+image of the page.
+
+## Special handling of some records
+
+Some WAL record types are handled directly in the pageserver, by
+bespoken Rust code, and are not sent over to the WAL redo process.
+This includes SLRU-related WAL records, like commit records. SLRUs
+don't use the standard Postgres buffer manager, so dealing with them
+in the Neon WAL redo mode would require quite a few changes to
+Postgres code and special handling in the protocol anyway.
+
+Some record types that include a full-page-image (e.g. XLOG_FPI) are
+also handled specially when incoming WAL is processed already, and are
+stored as page images rather than WAL records.
+
+
+## Records that modify multiple pages
+
+Some Postgres WAL records modify multiple pages. Such WAL records are
+duplicated, so that a copy is stored for each affected page. This is
+somewhat wasteful, but because most WAL records only affect one page,
+the overhead is acceptable.
+
+The WAL redo always happens for one particular page. If the WAL record
+coantains changes to other pages, they are ignored.
--- a/docs/pageserver.md
+++ b/docs/pageserver.md
@@ -0,0 +1,11 @@
+# Page server architecture
+
+The Page Server has a few different duties:
+
+- Respond to GetPage@LSN requests from the Compute Nodes
+- Receive WAL from WAL safekeeper, and store it
+- Upload data to S3 to make it durable, download files from S3 as needed
+
+S3 is the main fault-tolerant storage of all data, as there are no Page Server
+replicas. We use a separate fault-tolerant WAL service to reduce latency. It
+keeps track of WAL records which are not synced to S3 yet.
--- a/docs/rfcs/002-storage.md
+++ b/docs/rfcs/002-storage.md
@@ -0,0 +1,186 @@
+# Zenith storage node — alternative
+
+## **Design considerations**
+
+Simplify storage operations for people => Gain adoption/installs on laptops and small private installation => Attract customers to DBaaS by seamless integration between our tooling and cloud.
+
+Proposed architecture addresses:
+
+- High availability -- tolerates n/2 - 1 failures
+- Multi-tenancy -- one storage for all databases
+- Elasticity -- increase storage size on the go by adding nodes
+- Snapshots / backups / PITR with S3 offload
+- Compression
+
+Minuses are:
+
+- Quite a lot of work
+- Single page access may touch few disk pages
+- Some bloat in data — may slowdown sequential scans
+
+## **Summary**
+
+Storage cluster is sharded key-value store with ordered keys. Key (****page_key****) is a tuple of `(pg_id, db_id, timeline_id, rel_id, forkno, segno, pageno, lsn)`. Value is either page or page diff/wal. Each chunk (chunk == shard) stores approx 50-100GB ~~and automatically splits in half when grows bigger then soft 100GB limit~~. by having a fixed range of pageno's it is responsible for. Chunks placement on storage nodes is stored in a separate metadata service, so chunk can be freely moved around the cluster if it is need. Chunk itself is a filesystem directory with following sub directories:
+
+```
+
+|-chunk_42/
+  |-store/ -- contains lsm with pages/pagediffs ranging from
+  |	      page_key_lo to page_key_hi
+  |-wal/
+  |  |- db_1234/ db-specific wal files with pages from page_key_lo
+  |		 to page_key_hi
+  |
+  |-chunk.meta -- small file with snapshot references
+		  (page_key_prefix+lsn+name)
+		  and PITR regions (page_key_start, page_key_end)
+```
+
+## **Chunk**
+
+Chunk is responsible for storing pages potentially from different databases and relations. Each page is addressed by a lexicographically ordered tuple (****page_key****) with following fields:
+
+- `pg_id` -- unique id of given postgres instance (or postgres cluster as it is called in postgres docs)
+- `db_id` -- database that was created by 'CREATE DATABASE' in a given postgres instance
+- `db_timeline` -- used to create Copy-on-Write instances from snapshots, described later
+- `rel_id` -- tuple of (relation_id, 0) for tables and (indexed_relation_id, rel_id) for indices. Done this way so table indices were closer to table itself on our global key space.
+- `(forkno, segno, pageno)` -- page coordinates in postgres data files
+- `lsn_timeline` -- postgres feature, increments when PITR was done.
+- `lsn` -- lsn of current page version.
+
+Chunk stores pages and page diffs ranging from page_key_lo to page_key_hi. Processing node looks at page in wal record and sends record to a chunk responsible for this page range. When wal record arrives to a chunk it is initially stored in `chunk_id/wal/db_id/wal_segno.wal`. Then background process moves records from that wal files to the lsm tree in `chunk_id/store`. Or, more precisely, wal records would be materialized into lsm memtable and when that memtable is flushed to SSTable on disk we may trim the wal. That way some not durably (in the distributed sense) committed pages may enter the tree -- here we rely on processing node behavior: page request from processing node should contain proper lsm horizons so that storage node may respond with proper page version.
+
+LSM here is a usual LSM for variable-length values: at first data is stored in memory (we hold incoming wal records to be able to regenerate it after restart) at some balanced tree. When this tree grows big enough we dump it into disk file (SSTable) sorting records by key. Then SStables are mergesorted in the background to a different files. All file operation are sequential and do not require WAL for durability.
+
+Content of SSTable can be following:
+
+```jsx
+(pg_id, db_id, ... , pageno=42, lsn=100) (full 8k page data)
+(pg_id, db_id, ... , pageno=42, lsn=150) (per-page diff)
+(pg_id, db_id, ... , pageno=42, lsn=180) (per-page diff)
+(pg_id, db_id, ... , pageno=42, lsn=200) (per-page diff)
+(pg_id, db_id, ... , pageno=42, lsn=220) (full 8k page data)
+(pg_id, db_id, ... , pageno=42, lsn=250) (per-page diff)
+(pg_id, db_id, ... , pageno=42, lsn=270) (per-page diff)
+(pg_id, db_id, ... , pageno=5000, lsn=100) (full 8k page data)
+```
+
+So query for `pageno=42 up to lsn=260` would need to find closest entry less then this key, iterate back to the latest full page and iterate forward to apply diffs. How often page is materialized in lsn-version sequence is up to us -- let's say each 5th version should be a full page.
+
+### **Page deletion**
+
+To delete old pages we insert blind deletion marker `(pg_id, db_id, #trim_lsn < 150)` into a lsm tree. During merges such marker would indicate that all pages with smaller lsn should be discarded. Delete marker will travel down the tree levels hierarchy until it reaches last level. In non-PITR scenario where old page version are not needed at all such deletion marker would (in average) prevent old page versions propagation down the tree -- so all bloat would concentrate at higher tree layers without affecting bigger bottom layers.
+
+### **Recovery**
+
+Upon storage node restart recent WAL files are applied to appropriate pages and resulting pages stored in lsm memtable. So this should be fast since we are not writing anything to disk.
+
+### **Checkpointing**
+
+No such mechanism is needed. Or we may look at the storage node as at kind of continuous checkpointer.
+
+### **Full page writes (torn page protection)**
+
+Storage node never updates individual pages, only merges SSTable, so torn pages is not an issue.
+
+### **Snapshot**
+
+That is the part that I like about this design -- snapshot creation is instant and cheap operation that can have flexible granularity level: whole instance, database, table. Snapshot creation inserts a record in `chunk.meta` file with lsn of this snapshot and key prefix `(pg_id, db_id, db_timeline, rel_id, *)` that prohibits pages deletion within this range. Storage node may not know anything about page internals, but by changing number of fields in our prefix we may change snapshot granularity.
+
+It is again useful to remap `rel_id` to `(indexed_relation_id, rel_id)` so that snapshot of relation would include it's indices. Also table snapshot would trickily interact with catalog. Probably all table snapshots should hold also a catalog snapshot. And when node is started with such snapshot it should check that only tables from snapshot are queried. I assume here that for snapshot reading one need to start a new postgres instance.
+
+Storage consumed by snapshot is proportional to the amount of data changed. We may have some heuristic (calculated based on cost of different storages) about when to offload old snapshot to s3. For example, if current database has more then 40% of changed pages with respect to previous snapshot then we may offload that snapshot to s3, and release this space.
+
+**Starting db from snapshot**
+
+When we are starting database from snapshot it can be done in two ways. First, we may create new db_id, move all the data from snapshot to a new db and start a database. Second option is to create Copy-on-Write (CoW) instance out of snapshot and read old pages from old snapshot and store new pages separately. That is why there is `db_timeline` key field near `db_id` -- CoW (🐮) database should create new `db_timeline` and remember old `db_timeline`. Such a database can have hashmap of pages that it is changed to query pages from proper snapshot on the first try. `db_timeline` is located near `db_id` so that new page versions generated by new instance would not bloat data of initial snapshot. It is not clear for whether it is possibly to effectively support "stacked" CoW snapshot, so we may disallow them. (Well, one way to support them is to move `db_timeline` close to `lsn` -- so we may scan neighboring pages and find right one. But again that way we bloat snapshot with unrelated data and may slowdown full scans that are happening in different database).
+
+**Snapshot export/import**
+
+Once we may start CoW instances it is easy to run auxiliary postgres instance on this snapshot and run `COPY FROM (...) TO stdout` or `pg_dump` and export data from the snapshot to some portable formats. Also we may start postgres on a new empty database and run `COPY FROM stdin`. This way we can initialize new non-CoW databases and transfer snapshots via network.
+
+### **PITR area**
+
+In described scheme PITR is just a prohibition to delete any versions within some key prefix, either it is a database or a table key prefix. So PITR may have different settings for different tables, databases, etc.
+
+PITR is quite bloaty, so we may aggressively offload it to s3 -- we may push same (or bigger) SSTables to s3 and maintain lsm structure there.
+
+### **Compression**
+
+Since we are storing page diffs of variable sizes there is no structural dependency on a page size and we may compress it. Again that could be enabled only on pages with some key prefixes, so we may have this with db/table granularity.
+
+### **Chunk metadata**
+
+Chunk metadata is a file lies in chunk directory that stores info about current snapshots and PITR regions. Chunk should always consult this data when merging SSTables and applying delete markers.
+
+### **Chunk splitting**
+
+*(NB: following paragraph is about how to avoid page splitting)*
+
+When chunks hits some soft storage limit (let's say 100Gb) it should be split in half and global metadata about chunk boundaries should be updated. Here i assume that chunk split is a local operation happening on single node. Process of chink splitting should look like following:
+
+1. Find separation key and spawn two new chunks with [lo, mid) [mid, hi) boundaries.
+
+2. Prohibit WAL deletion and old SSTables deletion on original chunk.
+
+3. On each lsm layer we would need to split only one SSTable, all other would fit within left or right range. Symlink/split that files to new chunks.
+
+4. Start WAL replay on new chunks.
+
+5. Update global metadata about new chunk boundaries.
+
+6. Eventually (metadata update should be pushed to processing node by metadata service) storage node will start sending WAL and page requests to the new nodes.
+
+7. New chunk may start serving read queries when following conditions are met:
+
+a) it receives at least on WAL record from processing node
+
+b) it replayed all WAL up to the new received one
+
+c) checked by downlinks that there were no WAL gaps.
+
+Chunk split as it is described here is quite fast operation when it is happening on the local disk -- vast majority of files will be just moved without copying anything. I suggest to keep split always local and not to mix it with chunk moving around cluster. So if we want to split some chunk but there is small amount of free space left on the device, we should first move some chunks away from the node and then proceed with splitting.
+
+### Fixed chunks
+
+Alternative strategy is to not to split at all and have pageno-fixed chunk boundaries. When table is created we first materialize this chunk by storing first new pages only and chunks is small. Then chunk is growing while table is filled, but it can't grow substantially bigger then allowed pageno range, so at max it would be 1GB or whatever limit we want + some bloat due to snapshots and old page versions.
+
+### **Chunk lsm internals**
+
+So how to implement chunk's lsm?
+
+- Write from scratch and use RocksDB to prototype/benchmark, then switch to own lsm implementation. RocksDB can provide some sanity check for performance of home-brewed implementation and it would be easier to prototype.
+- Use postgres as lego constructor. We may model memtable with postgres B-tree referencing some in-memory log of incoming records. SSTable merging may reuse postgres external merging algorithm, etc. One thing that would definitely not fit (or I didn't came up with idea how to fit that) -- is multi-tenancy. If we are storing pages from different databases we can't use postgres buffer pool, since there is no db_id in the page header. We can add new field there but IMO it would be no go for committing that to vanilla.
+
+Other possibility is to not to try to fit few databases in one storage node. But that way it is no go for multi-tenant cloud installation: we would need to run a lot of storage node instances on one physical storage node, all with it own local page cache. So that would be much closer to ordinary managed RDS.
+
+Multi-tenant storage makes sense even on a laptop, when you work with different databases, running tests with temp database, etc. And when installation grows bigger it start to make more and more sense, so it seems important.
+
+# Storage fleet
+
+# **Storage fleet**
+
+- When database is smaller then a chunk size we naturally can store them in one chunk (since their page_key would fit in some chunk's [hi, lo) range).
+
+<img width="937" alt="Screenshot_2021-02-22_at_16 49 17" src="https://user-images.githubusercontent.com/284219/108729836-ffcbd200-753b-11eb-9412-db802ec30021.png">
+
+Few databases are stored in one chunk, replicated three times
+
+- When database can't fit into one storage node it can occupy lots of chunks that were split while database was growing. Chunk placement on nodes is controlled by us with some automatization, but we always may manually move chunks around the cluster.
+
+<img width="940" alt="Screenshot_2021-02-22_at_16 49 10" src="https://user-images.githubusercontent.com/284219/108729815-fb071e00-753b-11eb-86e0-be6703e47d82.png">
+
+Here one big database occupies two set of nodes. Also some chunks were moved around to restore replication factor after disk failure. In this case we also have "sharded" storage for a big database and issue wal writes to different chunks in parallel.
+
+## **Chunk placement strategies**
+
+There are few scenarios where we may want to move chunks around the cluster:
+
+- disk usage on some node is big
+- some disk experienced a failure
+- some node experienced a failure or need maintenance
+
+## **Chunk replication**
+
+Chunk replication may be done by cloning page ranges with respect to some lsn from peer nodes, updating global metadata, waiting for WAL to come, replaying previous WAL and becoming online -- more or less like during chunk split.
+
--- a/docs/rfcs/003-laptop-cli.md
+++ b/docs/rfcs/003-laptop-cli.md
@@ -0,0 +1,267 @@
+# Command line interface (end-user)
+
+Zenith CLI as it is described here mostly resides on the same conceptual level as pg_ctl/initdb/pg_recvxlog/etc and replaces some of them in an opinionated way. I would also suggest bundling our patched postgres inside zenith distribution at least at the start.
+
+This proposal is focused on managing local installations. For cluster operations, different tooling would be needed. The point of integration between the two is storage URL: no matter how complex cluster setup is it may provide an endpoint where the user may push snapshots.
+
+The most important concept here is a snapshot, which can be created/pushed/pulled/exported. Also, we may start temporary read-only postgres instance over any local snapshot. A more complex scenario would consist of several basic operations over snapshots.
+
+# Possible usage scenarios
+
+## Install zenith, run a postgres
+
+```
+> brew install pg-zenith 
+> zenith pg create # creates pgdata with default pattern pgdata$i
+> zenith pg list
+ID            PGDATA        USED    STORAGE            ENDPOINT
+primary1      pgdata1       0G      zenith-local       localhost:5432
+```
+
+## Import standalone postgres to zenith
+
+```
+> zenith snapshot import --from=basebackup://replication@localhost:5432/ oldpg
+[====================------------] 60% | 20MB/s
+> zenith snapshot list
+ID          SIZE        PARENT
+oldpg       5G          -
+
+> zenith pg create --snapshot oldpg
+Started postgres on localhost:5432
+
+> zenith pg list
+ID            PGDATA        USED    STORAGE            ENDPOINT
+primary1      pgdata1       5G      zenith-local       localhost:5432
+
+> zenith snapshot destroy oldpg
+Ok
+```
+
+Also, we may start snapshot import implicitly by looking at snapshot schema
+
+```
+> zenith pg create --snapshot basebackup://replication@localhost:5432/
+Downloading snapshot... Done.
+Started postgres on localhost:5432
+Destroying snapshot... Done.
+```
+
+## Pull snapshot with some publicly shared database
+
+Since we may export the whole snapshot as one big file (tar of basebackup, maybe with some manifest) it may be shared over conventional means: http, ssh, [git+lfs](https://docs.github.com/en/github/managing-large-files/about-git-large-file-storage).
+
+```
+> zenith pg create --snapshot http://learn-postgres.com/movies_db.zenith movies
+```
+
+## Create snapshot and push it to the cloud
+
+```
+> zenith snapshot create pgdata1@snap1
+> zenith snapshot push --to ssh://stas@zenith.tech pgdata1@snap1
+```
+
+## Rollback database to the snapshot
+
+One way to rollback the database is just to init a new database from the snapshot and destroy the old one. But creating a new database from a snapshot would require a copy of that snapshot which is time consuming operation. Another option that would be cool to support is the ability to create the copy-on-write database from the snapshot without copying data, and store updated pages in a separate location, however that way would have performance implications. So to properly rollback the database to the older state we have `zenith pg checkout`.
+
+```
+> zenith pg list
+ID            PGDATA        USED    STORAGE            ENDPOINT
+primary1      pgdata1       5G      zenith-local       localhost:5432
+
+> zenith snapshot create pgdata1@snap1
+
+> zenith snapshot list
+ID                    SIZE        PARENT
+oldpg                 5G          -
+pgdata1@snap1         6G          -
+pgdata1@CURRENT       6G          -
+
+> zenith pg checkout pgdata1@snap1
+Stopping postgres on pgdata1.
+Rolling back pgdata1@CURRENT to pgdata1@snap1.
+Starting postgres on pgdata1.
+
+> zenith snapshot list
+ID                    SIZE        PARENT
+oldpg                 5G          -
+pgdata1@snap1         6G          -
+pgdata1@HEAD{0}       6G          -
+pgdata1@CURRENT       6G          -
+```
+
+Some notes: pgdata1@CURRENT -- implicit snapshot representing the current state of the database in the data directory. When we are checking out some snapshot CURRENT will be set to this snapshot and the old CURRENT state will be named HEAD{0} (0 is the number of postgres timeline, it would be incremented after each such checkout).
+
+## Configure PITR area (Point In Time Recovery).
+
+PITR area acts like a continuous snapshot where you can reset the database to any point in time within this area (by area I mean some TTL period or some size limit, both possibly infinite).
+
+```
+> zenith pitr create --storage s3tank --ttl 30d --name pitr_last_month
+```
+
+Resetting the database to some state in past would require creating a snapshot on some lsn / time in this pirt area.
+
+# Manual
+
+## storage
+
+Storage is either zenith pagestore or s3. Users may create a database in a pagestore and create/move *snapshots* and *pitr regions* in both pagestore and s3. Storage is a concept similar to `git remote`. After installation, I imagine one local storage is available by default.
+
+**zenith storage attach** -t [native|s3] -c key=value -n name
+
+Attaches/initializes storage. For --type=s3, user credentials and path should be provided. For --type=native we may support --path=/local/path and --url=zenith.tech/stas/mystore. Other possible term for native is 'zstore'.
+
+
+**zenith storage list**
+
+Show currently attached storages. For example:
+
+```
+> zenith storage list
+NAME            USED    TYPE                OPTIONS          PATH
+local           5.1G    zenith-local                         /opt/zenith/store/local
+local.compr     20.4G   zenith-local        compression=on    /opt/zenith/store/local.compr
+zcloud          60G     zenith-remote                        zenith.tech/stas/mystore
+s3tank          80G     S3
+```
+
+**zenith storage detach**
+
+**zenith storage show**
+
+
+
+## pg
+
+Manages postgres data directories and can start postgres instances with proper configuration. An experienced user may avoid using that (except pg create) and configure/run postgres by themselves.
+
+Pg is a term for a single postgres running on some data. I'm trying to avoid separation of datadir management and postgres instance management -- both that concepts bundled here together.
+
+**zenith pg create** [--no-start --snapshot --cow] -s storage-name -n pgdata
+
+Creates (initializes) new data directory in given storage and starts postgres. I imagine that storage for this operation may be only local and data movement to remote location happens through snapshots/pitr.
+
+--no-start: just init datadir without creating 
+
+--snapshot snap: init from the snapshot. Snap is a name or URL (zenith.tech/stas/mystore/snap1)
+
+--cow: initialize Copy-on-Write data directory on top of some snapshot (makes sense if it is a snapshot of currently running a database)
+
+**zenith pg destroy**
+
+**zenith pg start** [--replica] pgdata
+
+Start postgres with proper extensions preloaded/installed.
+
+**zenith pg checkout**
+
+Rollback data directory to some previous snapshot. 
+
+**zenith pg stop** pg_id
+
+**zenith pg list**
+
+```
+ROLE                 PGDATA        USED    STORAGE            ENDPOINT
+primary              my_pg         5.1G    local              localhost:5432
+replica-1                                                     localhost:5433
+replica-2                                                     localhost:5434
+primary              my_pg2        3.2G    local.compr        localhost:5435
+-                    my_pg3        9.2G    local.compr        -
+```
+
+**zenith pg show**
+
+```
+my_pg:
+    storage: local
+    space used on local: 5.1G
+    space used on all storages: 15.1G
+    snapshots:
+        on local:
+            snap1: 1G
+            snap2: 1G
+        on zcloud:
+            snap2: 1G
+        on s3tank:
+            snap5: 2G
+    pitr:
+        on s3tank:
+            pitr_one_month: 45G
+
+```
+
+**zenith pg start-rest/graphql** pgdata
+
+Starts REST/GraphQL proxy on top of postgres master. Not sure we should do that, just an idea.
+
+
+## snapshot
+
+Snapshot creation is cheap -- no actual data is copied, we just start retaining old pages. Snapshot size means the amount of retained data, not all data. Snapshot name looks like pgdata_name@tag_name. tag_name is set by the user during snapshot creation. There are some reserved tag names: CURRENT represents the current state of the data directory; HEAD{i} represents the data directory state that resided in the database before i-th checkout.
+
+**zenith snapshot create** pgdata_name@snap_name
+
+Creates a new snapshot in the same storage where pgdata_name exists.
+
+**zenith snapshot push** --to url pgdata_name@snap_name
+
+Produces binary stream of a given snapshot. Under the hood starts temp read-only postgres over this snapshot and sends basebackup stream. Receiving side should start `zenith snapshot recv` before push happens. If url has some special schema like zenith:// receiving side may require auth start `zenith snapshot recv` on the go.
+
+**zenith snapshot recv**
+
+Starts a port listening for a basebackup stream, prints connection info to stdout (so that user may use that in push command), and expects data on that socket.
+
+**zenith snapshot pull** --from url or path
+
+Connects to a remote zenith/s3/file and pulls snapshot. The remote site should be zenith service or files in our format.
+
+**zenith snapshot import** --from basebackup://<...>  or path
+
+Creates a new snapshot out of running postgres via basebackup protocol or basebackup files.
+
+**zenith snapshot export**
+
+Starts read-only postgres over this snapshot and exports data in some format (pg_dump, or COPY TO on some/all tables). One of the options may be zenith own format which is handy for us (but I think just tar of basebackup would be okay).
+
+**zenith snapshot diff** snap1 snap2
+
+Shows size of data changed between two snapshots. We also may provide options to diff schema/data in tables. To do that start temp read-only postgreses.
+
+**zenith snapshot destroy**
+
+## pitr
+
+Pitr represents wal stream and ttl policy for that stream
+
+XXX: any suggestions on a better name?
+
+**zenith pitr create** name
+
+--ttl = inf | period
+
+--size-limit = inf | limit
+
+--storage = storage_name
+
+**zenith pitr extract-snapshot** pitr_name --lsn xxx
+
+Creates a snapshot out of some lsn in PITR area. The obtained snapshot may be managed with snapshot routines (move/send/export)
+
+**zenith pitr gc** pitr_name
+
+Force garbage collection on some PITR area.
+
+**zenith pitr list**
+
+**zenith pitr destroy**
+
+
+## console
+
+**zenith console**
+
+Opens browser targeted at web console with the more or less same functionality as described here.
--- a/Show More
+++ b/Show More