Compare commits

...

35 Commits

Author SHA1 Message Date
Konstantin Knizhnik
cca517fb94 Add test for prewarm under workload 2024-12-23 11:36:47 +02:00
Konstantin Knizhnik
5f74ff1307 Make ruffhappy 2024-12-15 08:02:35 +02:00
Konstantin Knizhnik
1d77fb0dea Eliminate stale reads from LFC in case of prewarm conflict 2024-12-14 21:58:26 +02:00
Konstantin Knizhnik
07027bde7d Do not run test_lfc_prewarm test without LFC 2024-12-14 21:11:20 +02:00
Konstantin Knizhnik
1f2b47c70f Set LFC path in test-lfc_prewarm test 2024-12-14 21:09:58 +02:00
Konstantin Knizhnik
7b80ad4950 Fix format warning 2024-12-14 21:09:58 +02:00
Konstantin Knizhnik
e07eedca5d correctly handle PS disconect duriug prewarm 2024-12-14 21:09:58 +02:00
Konstantin Knizhnik
7e2fb10cca Fix handling zero neon.file_cache_prewarm_limit 2024-12-14 21:09:57 +02:00
Konstantin Knizhnik
dc1684efcc Add delay between upgrade of extension version 2024-12-14 21:09:57 +02:00
Konstantin Knizhnik
ec8b8b941d Add functions to get LFC state, prewarm LFC and monitor prewarm process 2024-12-14 21:09:57 +02:00
Mikhail Kot
cf161e1556 fix(adapter): password not set in role drop (#10130)
## Problem

When entry was dropped and password wasn't set, new entry
had uninitialized memory in controlplane adapter

Resolves: https://github.com/neondatabase/cloud/issues/14914

## Summary of changes

Initialize password in all cases, add tests.
Minor formatting for less indentation
2024-12-14 17:37:13 +00:00
Konstantin Knizhnik
2521eba674 Check for invalid down link while prefetching B-Tree leave pages for index-only scan (#9867)
## Problem

See #9866

Index-only scan prefetch implementation doesn't take in account that
down link may be invalid

## Summary of changes

Check that downlink is valid block number


Correspondent Postgres PRs:
https://github.com/neondatabase/postgres/pull/534
https://github.com/neondatabase/postgres/pull/535
https://github.com/neondatabase/postgres/pull/536
https://github.com/neondatabase/postgres/pull/537

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 20:46:41 +00:00
Alexander Bayandin
d56fea680e CI: always require aws-oicd-role-arn input to be set (#10145)
## Problem
`benchmarking` job fails because `aws-oicd-role-arn` input is not set

## Summary of changes:
- Set `aws-oicd-role-arn` for `benchmarking job
- Always require `aws-oicd-role-arn` to be set
- Rename `aws_oicd_role_arn` to `aws-oicd-role-arn` for consistency
2024-12-13 19:56:32 +00:00
Alex Chi Z.
7ee5dca752 fix(pageserver): race between gc-compaction and repartition (#10127)
## Problem

close https://github.com/neondatabase/neon/issues/10124

gc-compaction split_gc_jobs is holding the repartition lock for too long
time.

## Summary of changes

* Ensure split_gc_compaction_jobs drops the repartition lock once it
finishes cloning the structures.
* Update comments.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-13 18:22:25 +00:00
Tristan Partin
07d1db54b3 Improve comments and log messages in the logical replication monitor (#9974)
Improved comments will help others when they read the code, and the log
messages will help others understand why the logical replication monitor
works the way it does.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-13 18:10:42 +00:00
Konstantin Knizhnik
eeabecd89f Correctly update LFC used_pages in case of LFC resize (#10128)
## Problem

LFC used_pages statistic is not updated in case of LFC resize (shrinking
`neon.file_cache_size_limit`)

## Summary of changes

Update `lfc_ctl->used_pages` in `lfc_change_limit_hook`

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 17:40:26 +00:00
Christian Schwarz
fcff752851 fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#10083)
The test was failing with the scary but generic message `Remote storage
metadata corrupted`.

The underlying scrubber error is `Orphan layer detected: ...`.

The test kills pageserver at random points, hence it's expected that we
leak layers if we're killed in the window after layer upload but before
it's referenced from index part.

Refer to generation numbers RFC for details.

Refs:
- fixes https://github.com/neondatabase/neon/issues/9988
- root-cause analysis
https://github.com/neondatabase/neon/issues/9988#issuecomment-2520673167
2024-12-13 16:28:21 +00:00
Alexander Bayandin
2c91062828 test_prefetch: reduce timeout to default 5m from 10m (#10105)
## Problem

`test_prefetch` is flaky
(https://github.com/neondatabase/neon/issues/9961), but if it passes,
the run time is less than 30 seconds — we don't need an extended timeout
for it.

## Summary of changes
- Remove extended test timeout for `test_prefetch`
2024-12-13 14:52:54 +00:00
Arseny Sher
ce8eb089f3 Extract public sk types to safekeeper_api (#10137)
## Problem

We want to extract safekeeper http client to separate crate for use in
storage controller and neon_local. However, many types used in the API
are internal to safekeeper.

## Summary of changes

Move them to safekeeper_api crate. No functional changes.

ref https://github.com/neondatabase/neon/issues/9011
2024-12-13 14:06:27 +00:00
a-masterov
7dc382601c Fix pg_regress tests on a cloud staging instance (#10134)
## Problem
pg_regress tests start failing due to unique ids added to Neon error
messages
## Summary of changes
Patches updated
2024-12-13 13:59:04 +00:00
Rahul Patil
2451969d5c fix(ci): Allow github-action-script to post reports (#10136)
Allow github-action-script to post reports.

Failed CI:
https://github.com/neondatabase/neon/actions/runs/12304655364/job/34342554049#step:13:514
2024-12-13 12:22:15 +00:00
JC Grünhage
59ef701925 CI(deploy): fix git tag/release creation (#10119)
## Problem

When moving the comment on proxy-releases from the yaml doc into a
javascript code block, I missed converting the comment marker from `#`
to `//`.

## Summary of changes

Correctly convert comment marker.
2024-12-12 23:38:20 +00:00
Alexander Bayandin
ac04bad457 CI: don't run debug builds with LFC (#10123)
## Problem

I've noticed that debug builds with LFC fail more frequently and for
some reason ,their failure do block merging (but it should not)

## Summary of changes
- Do not run Debug builds with LFC
2024-12-12 22:55:38 +00:00
Peter Bendel
2f3f98a319 use OIDC role instead of AWS access keys for managing test runner (#10117)
in periodic pagebench workflow

## Problem

for background see https://github.com/neondatabase/cloud/issues/21545

## Summary of changes

use OIDC role to manage runners instead of AWS access key which needs to
be periodically rotated

## logs

seems to work in
https://github.com/neondatabase/neon/actions/runs/12298575888/job/34322306127#step:6:1
2024-12-12 20:25:39 +00:00
Alex Chi Z.
5ff4b991c7 feat(pageserver): gc-compaction split over LSN (#9900)
## Problem

part of https://github.com/neondatabase/neon/issues/9114, stacked PR
over https://github.com/neondatabase/neon/pull/9897, partially
refactored to help with
https://github.com/neondatabase/neon/issues/10031

## Summary of changes

* gc-compaction takes `above_lsn` parameter. We only compact the layers
above this LSN, and all data below the LSN are treated as if they are on
the ancestor branch.
* refactored gc-compaction to take `GcCompactJob` that describes the
rectangular range to be compacted.
* Added unit test for this case.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-12 20:23:24 +00:00
John Spray
a93e3d31cc storcon: refine logic for choosing AZ on tenant creation (#10054)
## Problem

When we update our scheduler/optimization code to respect AZs properly
(https://github.com/neondatabase/neon/pull/9916), the choice of AZ
becomes a much higher-stakes decision. We will pretty much always run a
tenant in its preferred AZ, and that AZ is fixed for the lifetime of the
tenant (unless a human intervenes)

Eventually, when we do auto-balancing based on utilization, I anticipate
that part of that will be to automatically change the AZ of tenants if
our original scheduling decisions have caused imbalance, but as an
interim measure, we can at least avoid making this scheduling decision
based purely on which AZ contains the emptiest node.

This is a precursor to https://github.com/neondatabase/neon/pull/9947

## Summary of changes

- When creating a tenant, instead of scheduling a shard and then reading
its preferred AZ back, make the AZ decision first.
- Instead of choosing AZ based on which node is emptiest, use the median
utilization of nodes in each AZ to pick the AZ to use. This avoids bad
AZ decisions during periods when some node has very low utilization
(such as after replacing a dead node)

I considered also making the selection a weighted pseudo-random choice
based on utilization, but wanted to avoid destabilising tests with that
for now.
2024-12-12 19:35:38 +00:00
Rahul Patil
6d5687521b fix(ci): Allow github-script to post test reports (#10120)
Allow github-script to post test reports
2024-12-12 18:53:35 +00:00
Heikki Linnakangas
53721266f1 Disable connection logging in pgbouncer by default (#10118)
It can produce a lot of logs, making pgbouncer itself consume all CPU in
extreme cases. We saw that happen in stress testing.
2024-12-12 17:05:58 +00:00
a-masterov
2f3433876f Change the channel for notification. (#10112)
## Problem
Now notifications about failures in `pg_regress` tests run on the
staging cloud instance, reach the channel `on-call-staging-stream`,
while they should reach `on-call-qa-staging-stream`
## Summary of changes
The channel changed.
2024-12-12 16:34:07 +00:00
Rahul Patil
58d45c6e86 ci(fix): Use OIDC auth to login on ECR (#10055)
## Problem

CI currently uses static credentials in some places. These are less
secure and hard to maintain, so we are going to deprecate them and use
OIDC auth.

## Summary of changes
- ci(fix): Use OIDC auth to upload artifact on s3
- ci(fix): Use OIDC auth to login on ECR
2024-12-12 15:13:08 +00:00
Conrad Ludgate
e502e880b5 chore(proxy): remove code for old API (#10109)
## Problem

Now that https://github.com/neondatabase/cloud/issues/15245 is done, we
can remove the old code.

## Summary of changes

Removes support for the ManagementV2 API, in favour of the ProxyV1 API.
2024-12-12 13:42:50 +00:00
Arseny Sher
c9a773af37 Fix test_subscriber_synchronous_commit flakiness. (#10057)
6f7aeaa configured LFC for USE_LFC case, but omitted setting
shared_buffers for non USE_LFC, causing flakiness.

ref https://github.com/neondatabase/neon/issues/9989
2024-12-12 11:57:00 +00:00
Vlad Lazar
ec0ce06c16 tests: default interpreted proto in tests (#10079)
## Problem

We aren't using the sharded interpreted wal receiver protocol in all
tests.

## Summary of changes

Default to the interpreted protocol.
2024-12-12 10:53:10 +00:00
Alexander Bayandin
0bd8eca9ca Storage: create release PRs On Fridays (#10017)
## Problem

To give Storage more time on preprod — create a release branch on Friday

## Summary of changes
- Automatically create Storage release PR on Friday instead of Monday
2024-12-12 09:18:50 +00:00
Misha Sakhnov
739f627b96 Bump vm-builder v0.35.0 -> v0.37.1 (#10015)
Bump version to pick up changes introduced in the neonvm-daemon to
support sys fs based CPU scaling
(https://github.com/neondatabase/autoscaling/issues/1082).

Previous update: https://github.com/neondatabase/neon/pull/9208
2024-12-12 08:45:52 +00:00
79 changed files with 2428 additions and 1413 deletions

View File

@@ -23,3 +23,5 @@ config-variables:
- BENCHMARK_INGEST_TARGET_PROJECTID
- PGREGRESS_PG16_PROJECT_ID
- PGREGRESS_PG17_PROJECT_ID
- SLACK_ON_CALL_QA_STAGING_STREAM
- DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN

View File

@@ -7,10 +7,9 @@ inputs:
type: boolean
required: false
default: false
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
outputs:
base-url:
@@ -84,12 +83,11 @@ runs:
ALLURE_VERSION: 2.27.0
ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
# Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this

View File

@@ -8,10 +8,9 @@ inputs:
unique-key:
description: 'string to distinguish different results in the same run'
required: true
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -36,12 +35,11 @@ runs:
env:
REPORT_DIR: ${{ inputs.report-dir }}
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results

View File

@@ -15,10 +15,19 @@ inputs:
prefix:
description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600
- name: Download artifact
id: download-artifact
shell: bash -euxo pipefail {0}

View File

@@ -48,10 +48,9 @@ inputs:
description: 'benchmark durations JSON'
required: false
default: '{}'
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -62,6 +61,7 @@ runs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Download Neon binaries for the previous release
if: inputs.build_type != 'remote'
@@ -70,6 +70,7 @@ runs:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
path: /tmp/neon-previous
prefix: latest
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Download compatibility snapshot
if: inputs.build_type != 'remote'
@@ -81,6 +82,7 @@ runs:
# The lack of compatibility snapshot (for example, for the new Postgres version)
# shouldn't fail the whole job. Only relevant test should fail.
skip-if-does-not-exist: true
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Checkout
if: inputs.needs_postgres_source == 'true'
@@ -218,17 +220,19 @@ runs:
# The lack of compatibility snapshot shouldn't fail the job
# (for example if we didn't run the test for non build-and-test workflow)
skip-if-does-not-exist: true
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
report-dir: /tmp/test_output/allure/results
unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

View File

@@ -14,9 +14,11 @@ runs:
name: coverage-data-artifact
path: /tmp/coverage
skip-if-does-not-exist: true # skip if there's no previous coverage to download
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Upload coverage data
uses: ./.github/actions/upload
with:
name: coverage-data-artifact
path: /tmp/coverage
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

View File

@@ -14,6 +14,10 @@ inputs:
prefix:
description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oicd-role-arn:
description: "the OIDC role arn for aws auth"
required: false
default: ""
runs:
using: "composite"
@@ -53,6 +57,13 @@ runs:
echo 'SKIPPED=false' >> $GITHUB_OUTPUT
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600
- name: Upload artifact
if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}
shell: bash -euxo pipefail {0}

View File

@@ -70,6 +70,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# we create a table that has one row for each database that we want to restore with the status whether the restore is done
- name: Create benchmark_restore_status table if it does not exist

View File

@@ -31,12 +31,13 @@ defaults:
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
jobs:
build-neon:
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
container:
image: ${{ inputs.build-tools-image }}
credentials:
@@ -205,6 +206,13 @@ jobs:
done
fi
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours
- name: Run rust tests
env:
NEXTEST_RETRIES: 3
@@ -256,6 +264,7 @@ jobs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# XXX: keep this after the binaries.list is formed, so the coverage can properly work later
- name: Merge and upload coverage data
@@ -265,6 +274,10 @@ jobs:
regress-tests:
# Don't run regression tests on debug arm64 builds
if: inputs.build-type != 'debug' || inputs.arch != 'arm64'
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
statuses: write
needs: [ build-neon ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
container:
@@ -295,6 +308,7 @@ jobs:
real_s3_region: eu-central-1
rerun_failed: true
pg_version: ${{ matrix.pg_version }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
CHECK_ONDISK_DATA_COMPATIBILITY: nonempty

View File

@@ -105,6 +105,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -122,7 +123,7 @@ jobs:
run_in_parallel: false
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# Set --sparse-ordering option of pytest-order plugin
# to ensure tests are running in order of appears in the file.
# It's important for test_perf_pgbench.py::test_pgbench_remote_* tests
@@ -152,7 +153,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -204,6 +205,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Run Logical Replication benchmarks
uses: ./.github/actions/run-python-test-set
@@ -214,7 +216,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 5400
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -231,7 +233,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 5400
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -243,7 +245,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
@@ -405,6 +407,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
if: contains(fromJson('["neonvm-captest-new", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)
@@ -452,7 +455,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_init
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -467,7 +470,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_simple_update
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -482,7 +485,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_select_only
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -500,7 +503,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -611,7 +614,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgvector_indexing
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -626,7 +629,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -637,7 +640,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -708,6 +711,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Set up Connection String
id: set-up-connstr
@@ -739,7 +743,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 43200 -k test_clickbench
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -753,7 +757,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -818,6 +822,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Get Connstring Secret Name
run: |
@@ -856,7 +861,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_tpch
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -868,7 +873,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -926,6 +931,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Set up Connection String
id: set-up-connstr
@@ -957,7 +963,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_user_examples
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -968,7 +974,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}

View File

@@ -21,8 +21,6 @@ concurrency:
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
# A concurrency group that we use for e2e-tests runs, matches `concurrency.group` above with `github.repository` as a prefix
E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
@@ -256,16 +254,14 @@ jobs:
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
# Run tests on all Postgres versions in release builds and only on the latest version in debug builds.
# Run without LFC on v17 release and debug builds only. For all the other cases LFC is enabled. Failure on the
# debug build with LFC enabled doesn't block merging.
# Run without LFC on v17 release and debug builds only. For all the other cases LFC is enabled.
test-cfg: |
${{ matrix.build-type == 'release' && '[{"pg_version":"v14", "lfc_state": "with-lfc"},
{"pg_version":"v15", "lfc_state": "with-lfc"},
{"pg_version":"v16", "lfc_state": "with-lfc"},
{"pg_version":"v17", "lfc_state": "with-lfc"},
{"pg_version":"v17", "lfc_state": "without-lfc"}]'
|| '[{"pg_version":"v17", "lfc_state": "without-lfc"},
{"pg_version":"v17", "lfc_state": "with-lfc" }]' }}
|| '[{"pg_version":"v17", "lfc_state": "without-lfc" }]' }}
secrets: inherit
# Keep `benchmarks` job outside of `build-and-test-locally` workflow to make job failures non-blocking
@@ -307,6 +303,11 @@ jobs:
benchmarks:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-benchmarks')
needs: [ check-permissions, build-and-test-locally, build-build-tools-image, get-benchmarks-durations ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
@@ -335,6 +336,7 @@ jobs:
extra_params: --splits 5 --group ${{ matrix.pytest_split_group }}
benchmark_durations: ${{ needs.get-benchmarks-durations.outputs.json }}
pg_version: v16
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -347,6 +349,11 @@ jobs:
report-benchmarks-failures:
needs: [ benchmarks, create-test-report ]
if: github.ref_name == 'main' && failure() && needs.benchmarks.result == 'failure'
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: ubuntu-22.04
steps:
@@ -362,6 +369,11 @@ jobs:
create-test-report:
needs: [ check-permissions, build-and-test-locally, coverage-report, build-build-tools-image, benchmarks ]
if: ${{ !cancelled() && contains(fromJSON('["skipped", "success"]'), needs.check-permissions.result) }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
outputs:
report-url: ${{ steps.create-allure-report.outputs.report-url }}
@@ -382,6 +394,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
@@ -413,6 +426,10 @@ jobs:
coverage-report:
if: ${{ !startsWith(github.ref_name, 'release') }}
needs: [ check-permissions, build-build-tools-image, build-and-test-locally ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
@@ -439,12 +456,14 @@ jobs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build_type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Get coverage artifact
uses: ./.github/actions/download
with:
name: coverage-data-artifact
path: /tmp/coverage
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Merge coverage data
run: scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage merge
@@ -575,6 +594,10 @@ jobs:
neon-image:
needs: [ neon-image-arch, tag ]
runs-on: ubuntu-22.04
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
steps:
- uses: docker/login-action@v3
@@ -589,11 +612,15 @@ jobs:
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-x64 \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-arm64
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Push multi-arch image to ECR
run: |
@@ -602,6 +629,10 @@ jobs:
compute-node-image-arch:
needs: [ check-permissions, build-build-tools-image, tag ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
strategy:
fail-fast: false
matrix:
@@ -642,11 +673,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- uses: docker/login-action@v3
with:
@@ -719,6 +754,10 @@ jobs:
compute-node-image:
needs: [ compute-node-image-arch, tag ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
runs-on: ubuntu-22.04
strategy:
@@ -763,11 +802,15 @@ jobs:
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Push multi-arch compute-node-${{ matrix.version.pg }} image to ECR
run: |
@@ -797,7 +840,7 @@ jobs:
- pg: v17
debian: bookworm
env:
VM_BUILDER_VERSION: v0.35.0
VM_BUILDER_VERSION: v0.37.1
steps:
- uses: actions/checkout@v4
@@ -892,7 +935,9 @@ jobs:
runs-on: ubuntu-22.04
permissions:
id-token: write # for `aws-actions/configure-aws-credentials`
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
env:
VERSIONS: v14 v15 v16 v17
@@ -903,12 +948,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- name: Login to dev ECR
uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Copy vm-compute-node images to ECR
run: |
@@ -987,6 +1035,11 @@ jobs:
trigger-custom-extensions-build-and-wait:
needs: [ check-permissions, tag ]
runs-on: ubuntu-22.04
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
steps:
- name: Set PR's status to pending and request a remote CI test
run: |
@@ -1062,7 +1115,10 @@ jobs:
needs: [ check-permissions, promote-images, tag, build-and-test-locally, trigger-custom-extensions-build-and-wait, push-to-acr-dev, push-to-acr-prod ]
# `!failure() && !cancelled()` is required because the workflow depends on the job that can be skipped: `push-to-acr-dev` and `push-to-acr-prod`
if: (github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute') && !failure() && !cancelled()
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
runs-on: [ self-hosted, small ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
steps:
@@ -1103,7 +1159,7 @@ jobs:
console.log(`Tag ${tag} created successfully.`);
}
# TODO: check how GitHub releases looks for proxy/compute releases and enable them if they're ok
// TODO: check how GitHub releases looks for proxy/compute releases and enable them if they're ok
if (context.ref !== 'refs/heads/release') {
console.log(`GitHub release skipped for ${context.ref}.`);
return;
@@ -1184,6 +1240,10 @@ jobs:
# The job runs on `release` branch and copies compatibility data and Neon artifact from the last *release PR* to the latest directory
promote-compatibility-data:
needs: [ deploy ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
# `!failure() && !cancelled()` is required because the workflow transitively depends on the job that can be skipped: `push-to-acr-dev` and `push-to-acr-prod`
if: github.ref_name == 'release' && !failure() && !cancelled()
@@ -1220,6 +1280,12 @@ jobs:
echo "run-id=${run_id}" | tee -a ${GITHUB_OUTPUT}
echo "commit-sha=${last_commit_sha}" | tee -a ${GITHUB_OUTPUT}
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Promote compatibility snapshot and Neon artifact
env:
BUCKET: neon-github-public-dev

View File

@@ -19,14 +19,15 @@ concurrency:
group: ${{ github.workflow }}
cancel-in-progress: true
permissions:
id-token: write # aws-actions/configure-aws-credentials
jobs:
regress:
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
strategy:
fail-fast: false
matrix:
@@ -78,6 +79,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create a new branch
id: create-branch
@@ -93,10 +95,12 @@ jobs:
test_selection: cloud_regress
pg_version: ${{matrix.pg-version}}
extra_params: -m remote_cluster
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}
- name: Delete branch
if: always()
uses: ./.github/actions/neon-branch-delete
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
@@ -107,12 +111,14 @@ jobs:
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C033QLM5P7D" # on-call-staging-stream
channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
slack-message: |
Periodic pg_regress on staging: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

View File

@@ -13,7 +13,7 @@ on:
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 9 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
@@ -28,7 +28,7 @@ jobs:
strategy:
fail-fast: false # allow other variants to continue even if one fails
matrix:
target_project: [new_empty_project, large_existing_project]
target_project: [new_empty_project, large_existing_project]
permissions:
contents: write
statuses: write
@@ -56,7 +56,7 @@ jobs:
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
- name: Download Neon artifact
uses: ./.github/actions/download
@@ -64,6 +64,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
if: ${{ matrix.target_project == 'new_empty_project' }}
@@ -94,7 +95,7 @@ jobs:
project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Initialize Neon project
- name: Initialize Neon project
if: ${{ matrix.target_project == 'large_existing_project' }}
env:
BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-branch-ingest-target.outputs.dsn }}
@@ -122,7 +123,7 @@ jobs:
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"
echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV
- name: Invoke pgcopydb
- name: Invoke pgcopydb
uses: ./.github/actions/run-python-test-set
with:
build_type: remote
@@ -131,7 +132,7 @@ jobs:
extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb
pg_version: v16
save_perf_report: true
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}
TARGET_PROJECT_TYPE: ${{ matrix.target_project }}
@@ -143,7 +144,7 @@ jobs:
run: |
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"
- name: Delete Neon Project
if: ${{ always() && matrix.target_project == 'new_empty_project' }}
uses: ./.github/actions/neon-project-delete

View File

@@ -143,6 +143,10 @@ jobs:
gather-rust-build-stats:
needs: [ check-permissions, build-build-tools-image ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
@@ -177,13 +181,18 @@ jobs:
- name: Produce the build stats
run: PQ_LIB_DIR=$(pwd)/pg_install/v17/lib cargo build --all --release --timings -j$(nproc)
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Upload the build stats
id: upload-stats
env:
BUCKET: neon-github-public-dev
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
run: |
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html
aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"

View File

@@ -27,6 +27,11 @@ concurrency:
jobs:
trigger_bench_on_ec2_machine_in_eu_central_1:
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [ self-hosted, small ]
container:
image: neondatabase/build-tools:pinned-bookworm
@@ -38,8 +43,6 @@ jobs:
env:
API_KEY: ${{ secrets.PERIODIC_PAGEBENCH_EC2_RUNNER_API_KEY }}
RUN_ID: ${{ github.run_id }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY : ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION : "eu-central-1"
AWS_INSTANCE_ID : "i-02a59a3bf86bc7e74"
steps:
@@ -50,6 +53,13 @@ jobs:
- name: Show my own (github runner) external IP address - usefull for IP allowlisting
run: curl https://ifconfig.me
- name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}
role-duration-seconds: 3600
- name: Start EC2 instance and wait for the instance to boot up
run: |
aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID
@@ -124,11 +134,10 @@ jobs:
cat "test_log_${GITHUB_RUN_ID}"
- name: Create Allure report
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -148,6 +157,14 @@ jobs:
-H "Authorization: Bearer $API_KEY" \
-d ''
- name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}
role-duration-seconds: 3600
- name: Stop EC2 instance and wait for the instance to be stopped
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
run: |

View File

@@ -25,11 +25,13 @@ defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write # require for posting a status update
env:
DEFAULT_PG_VERSION: 16
PLATFORM: neon-captest-new
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
AWS_DEFAULT_REGION: eu-central-1
jobs:
@@ -94,6 +96,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -110,6 +113,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -126,6 +130,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
@@ -159,6 +164,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -175,6 +181,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -191,6 +198,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

View File

@@ -67,7 +67,7 @@ jobs:
runs-on: ubuntu-22.04
permissions:
id-token: write # for `azure/login`
id-token: write # for `azure/login` and aws auth
steps:
- uses: docker/login-action@v3
@@ -75,11 +75,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Azure login
uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a # @v2.1.1

View File

@@ -63,6 +63,7 @@ jobs:
if: always()
permissions:
statuses: write # for `github.repos.createCommitStatus(...)`
contents: write
needs:
- get-changed-files
- check-codestyle-python

View File

@@ -3,7 +3,7 @@ name: Create Release Branch
on:
schedule:
# It should be kept in sync with if-condition in jobs
- cron: '0 6 * * MON' # Storage release
- cron: '0 6 * * FRI' # Storage release
- cron: '0 6 * * THU' # Proxy release
workflow_dispatch:
inputs:
@@ -29,7 +29,7 @@ defaults:
jobs:
create-storage-release-branch:
if: ${{ github.event.schedule == '0 6 * * MON' || inputs.create-storage-release-branch }}
if: ${{ github.event.schedule == '0 6 * * FRI' || inputs.create-storage-release-branch }}
permissions:
contents: write

3
Cargo.lock generated
View File

@@ -5565,7 +5565,10 @@ name = "safekeeper_api"
version = "0.1.0"
dependencies = [
"const_format",
"postgres_ffi",
"pq_proto",
"serde",
"tokio",
"utils",
]

View File

@@ -19,3 +19,10 @@ max_prepared_statements=0
admin_users=postgres
unix_socket_dir=/tmp/
unix_socket_mode=0777
;; Disable connection logging. It produces a lot of logs that no one looks at,
;; and we can get similar log entries from the proxy too. We had incidents in
;; the past where the logging significantly stressed the log device or pgbouncer
;; itself.
log_connections=0
log_disconnections=0

View File

@@ -981,7 +981,7 @@ index fc42d418bf..e38f517574 100644
CREATE SCHEMA addr_nsp;
SET search_path TO 'addr_nsp';
diff --git a/src/test/regress/expected/password.out b/src/test/regress/expected/password.out
index 8475231735..1afae5395f 100644
index 8475231735..0653946337 100644
--- a/src/test/regress/expected/password.out
+++ b/src/test/regress/expected/password.out
@@ -12,11 +12,11 @@ SET password_encryption = 'md5'; -- ok
@@ -1006,65 +1006,63 @@ index 8475231735..1afae5395f 100644
-----------------+---------------------------------------------------
- regress_passwd1 | md5783277baca28003b33453252be4dbb34
- regress_passwd2 | md54044304ba511dd062133eb5b4b84a2a3
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd4 |
+ regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
(4 rows)
-- Rename a role
@@ -54,24 +54,30 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -54,24 +54,16 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
-- passwords.
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
--- already encrypted with MD5, use as it is
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- This looks like a valid SCRAM-SHA-256 secret, but it is not
-- so it should be hashed with SCRAM-SHA-256.
CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- These may look like valid MD5 secrets, but they are not, so they
-- should be hashed with SCRAM-SHA-256.
-- trailing garbage at the end
CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- invalid length
CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
CREATE ROLE regress_passwd9 PASSWORD 'alterediterationcount';
@@ -81,63 +87,67 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
@@ -81,11 +73,11 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
ORDER BY rolname, rolpassword;
rolname | rolpassword_masked
-----------------+---------------------------------------------------
- regress_passwd1 | md5cd3578025fe2c3d7ed1b9a9b26238b70
- regress_passwd2 | md5dfa155cadd5f4ad57860162f3fab9cdb
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd5 | md5e73a4b11df52a6068f8b39f90be36023
- regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd9 | SCRAM-SHA-256$1024:<salt>$<storedkey>:<serverkey>
-(9 rows)
+(5 rows)
+ regress_passwd5 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
@@ -95,23 +87,20 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
-- An empty password is not allowed, in any form
CREATE ROLE regress_passwd_empty PASSWORD '';
NOTICE: empty string is not a valid password, clearing password
@@ -1082,56 +1080,37 @@ index 8475231735..1afae5395f 100644
-(1 row)
+(0 rows)
-- Test with invalid stored and server keys.
--
-- The first is valid, to act as a control. The others have too long
-- stored/server keys. They will be re-hashed.
CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassword_rehashed
FROM pg_authid
WHERE rolname LIKE 'regress_passwd_sha_len%'
@@ -120,7 +109,7 @@ SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassw
ORDER BY rolname;
- rolname | is_rolpassword_rehashed
--------------------------+-------------------------
rolname | is_rolpassword_rehashed
-------------------------+-------------------------
- regress_passwd_sha_len0 | f
- regress_passwd_sha_len1 | t
- regress_passwd_sha_len2 | t
-(3 rows)
+ rolname | is_rolpassword_rehashed
+---------+-------------------------
+(0 rows)
DROP ROLE regress_passwd1;
DROP ROLE regress_passwd2;
DROP ROLE regress_passwd3;
DROP ROLE regress_passwd4;
DROP ROLE regress_passwd5;
+ERROR: role "regress_passwd5" does not exist
DROP ROLE regress_passwd6;
+ERROR: role "regress_passwd6" does not exist
DROP ROLE regress_passwd7;
+ERROR: role "regress_passwd7" does not exist
+ regress_passwd_sha_len0 | t
regress_passwd_sha_len1 | t
regress_passwd_sha_len2 | t
(3 rows)
@@ -135,6 +124,7 @@ DROP ROLE regress_passwd7;
DROP ROLE regress_passwd8;
+ERROR: role "regress_passwd8" does not exist
DROP ROLE regress_passwd9;
DROP ROLE regress_passwd_empty;
+ERROR: role "regress_passwd_empty" does not exist
DROP ROLE regress_passwd_sha_len0;
+ERROR: role "regress_passwd_sha_len0" does not exist
DROP ROLE regress_passwd_sha_len1;
+ERROR: role "regress_passwd_sha_len1" does not exist
DROP ROLE regress_passwd_sha_len2;
+ERROR: role "regress_passwd_sha_len2" does not exist
-- all entries should have been removed
SELECT rolname, rolpassword
FROM pg_authid
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 5b9dba7b32..cc408dad42 100644
--- a/src/test/regress/expected/privileges.out
@@ -3194,7 +3173,7 @@ index 1a6c61f49d..1c31ac6a53 100644
-- Test generic object addressing/identification functions
CREATE SCHEMA addr_nsp;
diff --git a/src/test/regress/sql/password.sql b/src/test/regress/sql/password.sql
index 53e86b0b6c..f07cf1ec54 100644
index 53e86b0b6c..0303fdfe96 100644
--- a/src/test/regress/sql/password.sql
+++ b/src/test/regress/sql/password.sql
@@ -10,11 +10,11 @@ SET password_encryption = 'scram-sha-256'; -- ok
@@ -3213,23 +3192,59 @@ index 53e86b0b6c..f07cf1ec54 100644
-- check list of created entries
--
@@ -42,14 +42,14 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -42,26 +42,18 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
--- already encrypted with MD5, use as it is
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
@@ -78,13 +70,10 @@ ALTER ROLE regress_passwd_empty PASSWORD 'md585939a5ce845f1a1b620742e3c659e0a';
ALTER ROLE regress_passwd_empty PASSWORD 'SCRAM-SHA-256$4096:hpFyHTUsSWcR7O9P$LgZFIt6Oqdo27ZFKbZ2nV+vtnYM995pDh9ca6WSi120=:qVV5NeluNfUPkwm7Vqat25RjSPLkGeoZBQs6wVv+um4=';
SELECT rolpassword FROM pg_authid WHERE rolname='regress_passwd_empty';
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 249df17a58..b258e7f26a 100644
--- a/src/test/regress/sql/privileges.sql

View File

@@ -1014,10 +1014,10 @@ index fc42d418bf..e38f517574 100644
CREATE SCHEMA addr_nsp;
SET search_path TO 'addr_nsp';
diff --git a/src/test/regress/expected/password.out b/src/test/regress/expected/password.out
index 924d6e001d..5966531db6 100644
index 924d6e001d..7fdda73439 100644
--- a/src/test/regress/expected/password.out
+++ b/src/test/regress/expected/password.out
@@ -12,13 +12,13 @@ SET password_encryption = 'md5'; -- ok
@@ -12,13 +12,11 @@ SET password_encryption = 'md5'; -- ok
SET password_encryption = 'scram-sha-256'; -- ok
-- consistency of password entries
SET password_encryption = 'md5';
@@ -1026,9 +1026,7 @@ index 924d6e001d..5966531db6 100644
-CREATE ROLE regress_passwd2;
-ALTER ROLE regress_passwd2 PASSWORD 'role_pwd2';
+CREATE ROLE regress_passwd1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+ALTER ROLE regress_passwd1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
SET password_encryption = 'scram-sha-256';
-CREATE ROLE regress_passwd3 PASSWORD 'role_pwd3';
-CREATE ROLE regress_passwd4 PASSWORD NULL;
@@ -1037,71 +1035,69 @@ index 924d6e001d..5966531db6 100644
-- check list of created entries
--
-- The scram secret will look something like:
@@ -32,10 +32,10 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
@@ -32,10 +30,10 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
ORDER BY rolname, rolpassword;
rolname | rolpassword_masked
-----------------+---------------------------------------------------
- regress_passwd1 | md5783277baca28003b33453252be4dbb34
- regress_passwd2 | md54044304ba511dd062133eb5b4b84a2a3
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd4 |
+ regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
(4 rows)
-- Rename a role
@@ -56,24 +56,30 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -56,24 +54,17 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
-- passwords.
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- This looks like a valid SCRAM-SHA-256 secret, but it is not
-- so it should be hashed with SCRAM-SHA-256.
CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- These may look like valid MD5 secrets, but they are not, so they
-- should be hashed with SCRAM-SHA-256.
-- trailing garbage at the end
CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- invalid length
CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
CREATE ROLE regress_passwd9 PASSWORD 'alterediterationcount';
@@ -83,63 +89,67 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
@@ -83,11 +74,11 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
ORDER BY rolname, rolpassword;
rolname | rolpassword_masked
-----------------+---------------------------------------------------
- regress_passwd1 | md5cd3578025fe2c3d7ed1b9a9b26238b70
- regress_passwd2 | md5dfa155cadd5f4ad57860162f3fab9cdb
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd5 | md5e73a4b11df52a6068f8b39f90be36023
- regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd9 | SCRAM-SHA-256$1024:<salt>$<storedkey>:<serverkey>
-(9 rows)
+(5 rows)
+ regress_passwd5 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
@@ -97,23 +88,20 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
-- An empty password is not allowed, in any form
CREATE ROLE regress_passwd_empty PASSWORD '';
NOTICE: empty string is not a valid password, clearing password
@@ -1119,56 +1115,37 @@ index 924d6e001d..5966531db6 100644
-(1 row)
+(0 rows)
-- Test with invalid stored and server keys.
--
-- The first is valid, to act as a control. The others have too long
-- stored/server keys. They will be re-hashed.
CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassword_rehashed
FROM pg_authid
WHERE rolname LIKE 'regress_passwd_sha_len%'
@@ -122,7 +110,7 @@ SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassw
ORDER BY rolname;
- rolname | is_rolpassword_rehashed
--------------------------+-------------------------
rolname | is_rolpassword_rehashed
-------------------------+-------------------------
- regress_passwd_sha_len0 | f
- regress_passwd_sha_len1 | t
- regress_passwd_sha_len2 | t
-(3 rows)
+ rolname | is_rolpassword_rehashed
+---------+-------------------------
+(0 rows)
DROP ROLE regress_passwd1;
DROP ROLE regress_passwd2;
DROP ROLE regress_passwd3;
DROP ROLE regress_passwd4;
DROP ROLE regress_passwd5;
+ERROR: role "regress_passwd5" does not exist
DROP ROLE regress_passwd6;
+ERROR: role "regress_passwd6" does not exist
DROP ROLE regress_passwd7;
+ERROR: role "regress_passwd7" does not exist
+ regress_passwd_sha_len0 | t
regress_passwd_sha_len1 | t
regress_passwd_sha_len2 | t
(3 rows)
@@ -137,6 +125,7 @@ DROP ROLE regress_passwd7;
DROP ROLE regress_passwd8;
+ERROR: role "regress_passwd8" does not exist
DROP ROLE regress_passwd9;
DROP ROLE regress_passwd_empty;
+ERROR: role "regress_passwd_empty" does not exist
DROP ROLE regress_passwd_sha_len0;
+ERROR: role "regress_passwd_sha_len0" does not exist
DROP ROLE regress_passwd_sha_len1;
+ERROR: role "regress_passwd_sha_len1" does not exist
DROP ROLE regress_passwd_sha_len2;
+ERROR: role "regress_passwd_sha_len2" does not exist
-- all entries should have been removed
SELECT rolname, rolpassword
FROM pg_authid
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 1296da0d57..f43fffa44c 100644
--- a/src/test/regress/expected/privileges.out
@@ -3249,10 +3226,10 @@ index 1a6c61f49d..1c31ac6a53 100644
-- Test generic object addressing/identification functions
CREATE SCHEMA addr_nsp;
diff --git a/src/test/regress/sql/password.sql b/src/test/regress/sql/password.sql
index bb82aa4aa2..7424c91b10 100644
index bb82aa4aa2..dd8a05e24d 100644
--- a/src/test/regress/sql/password.sql
+++ b/src/test/regress/sql/password.sql
@@ -10,13 +10,13 @@ SET password_encryption = 'scram-sha-256'; -- ok
@@ -10,13 +10,11 @@ SET password_encryption = 'scram-sha-256'; -- ok
-- consistency of password entries
SET password_encryption = 'md5';
@@ -3261,9 +3238,7 @@ index bb82aa4aa2..7424c91b10 100644
-CREATE ROLE regress_passwd2;
-ALTER ROLE regress_passwd2 PASSWORD 'role_pwd2';
+CREATE ROLE regress_passwd1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+ALTER ROLE regress_passwd1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
SET password_encryption = 'scram-sha-256';
-CREATE ROLE regress_passwd3 PASSWORD 'role_pwd3';
-CREATE ROLE regress_passwd4 PASSWORD NULL;
@@ -3272,23 +3247,59 @@ index bb82aa4aa2..7424c91b10 100644
-- check list of created entries
--
@@ -44,14 +44,14 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -44,26 +42,19 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
@@ -80,13 +71,10 @@ ALTER ROLE regress_passwd_empty PASSWORD 'md585939a5ce845f1a1b620742e3c659e0a';
ALTER ROLE regress_passwd_empty PASSWORD 'SCRAM-SHA-256$4096:hpFyHTUsSWcR7O9P$LgZFIt6Oqdo27ZFKbZ2nV+vtnYM995pDh9ca6WSi120=:qVV5NeluNfUPkwm7Vqat25RjSPLkGeoZBQs6wVv+um4=';
SELECT rolpassword FROM pg_authid WHERE rolname='regress_passwd_empty';
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 5880bc018d..27aa952b18 100644
--- a/src/test/regress/sql/privileges.sql

View File

@@ -274,6 +274,7 @@ fn fill_remote_storage_secrets_vars(mut cmd: &mut Command) -> &mut Command {
for env_key in [
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_PROFILE",
// HOME is needed in combination with `AWS_PROFILE` to pick up the SSO sessions.
"HOME",

View File

@@ -75,7 +75,7 @@ pub struct TenantPolicyRequest {
pub scheduling: Option<ShardSchedulingPolicy>,
}
#[derive(Clone, Serialize, Deserialize, PartialEq, Eq, Hash, Debug)]
#[derive(Clone, Serialize, Deserialize, PartialEq, Eq, Hash, Debug, PartialOrd, Ord)]
pub struct AvailabilityZone(pub String);
impl Display for AvailabilityZone {

View File

@@ -5,6 +5,9 @@ edition.workspace = true
license.workspace = true
[dependencies]
serde.workspace = true
const_format.workspace = true
serde.workspace = true
postgres_ffi.workspace = true
pq_proto.workspace = true
tokio.workspace = true
utils.workspace = true

View File

@@ -1,10 +1,27 @@
#![deny(unsafe_code)]
#![deny(clippy::undocumented_unsafe_blocks)]
use const_format::formatcp;
use pq_proto::SystemId;
use serde::{Deserialize, Serialize};
/// Public API types
pub mod models;
/// Consensus logical timestamp. Note: it is a part of sk control file.
pub type Term = u64;
pub const INVALID_TERM: Term = 0;
/// Information about Postgres. Safekeeper gets it once and then verifies all
/// further connections from computes match. Note: it is a part of sk control
/// file.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ServerInfo {
/// Postgres server version
pub pg_version: u32,
pub system_id: SystemId,
pub wal_seg_size: u32,
}
pub const DEFAULT_PG_LISTEN_PORT: u16 = 5454;
pub const DEFAULT_PG_LISTEN_ADDR: &str = formatcp!("127.0.0.1:{DEFAULT_PG_LISTEN_PORT}");

View File

@@ -1,10 +1,23 @@
//! Types used in safekeeper http API. Many of them are also reused internally.
use postgres_ffi::TimestampTz;
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
use tokio::time::Instant;
use utils::{
id::{NodeId, TenantId, TimelineId},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
pageserver_feedback::PageserverFeedback,
};
use crate::{ServerInfo, Term};
#[derive(Debug, Serialize)]
pub struct SafekeeperStatus {
pub id: NodeId,
}
#[derive(Serialize, Deserialize)]
pub struct TimelineCreateRequest {
pub tenant_id: TenantId,
@@ -18,6 +31,161 @@ pub struct TimelineCreateRequest {
pub local_start_lsn: Option<Lsn>,
}
/// Same as TermLsn, but serializes LSN using display serializer
/// in Postgres format, i.e. 0/FFFFFFFF. Used only for the API response.
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct TermSwitchApiEntry {
pub term: Term,
pub lsn: Lsn,
}
/// Augment AcceptorState with last_log_term for convenience
#[derive(Debug, Serialize, Deserialize)]
pub struct AcceptorStateStatus {
pub term: Term,
pub epoch: Term, // aka last_log_term, old `epoch` name is left for compatibility
pub term_history: Vec<TermSwitchApiEntry>,
}
/// Things safekeeper should know about timeline state on peers.
/// Used as both model and internally.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PeerInfo {
pub sk_id: NodeId,
pub term: Term,
/// Term of the last entry.
pub last_log_term: Term,
/// LSN of the last record.
pub flush_lsn: Lsn,
pub commit_lsn: Lsn,
/// Since which LSN safekeeper has WAL.
pub local_start_lsn: Lsn,
/// When info was received. Serde annotations are not very useful but make
/// the code compile -- we don't rely on this field externally.
#[serde(skip)]
#[serde(default = "Instant::now")]
pub ts: Instant,
pub pg_connstr: String,
pub http_connstr: String,
}
pub type FullTransactionId = u64;
/// Hot standby feedback received from replica
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct HotStandbyFeedback {
pub ts: TimestampTz,
pub xmin: FullTransactionId,
pub catalog_xmin: FullTransactionId,
}
pub const INVALID_FULL_TRANSACTION_ID: FullTransactionId = 0;
impl HotStandbyFeedback {
pub fn empty() -> HotStandbyFeedback {
HotStandbyFeedback {
ts: 0,
xmin: 0,
catalog_xmin: 0,
}
}
}
/// Standby status update
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct StandbyReply {
pub write_lsn: Lsn, // The location of the last WAL byte + 1 received and written to disk in the standby.
pub flush_lsn: Lsn, // The location of the last WAL byte + 1 flushed to disk in the standby.
pub apply_lsn: Lsn, // The location of the last WAL byte + 1 applied in the standby.
pub reply_ts: TimestampTz, // The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.
pub reply_requested: bool,
}
impl StandbyReply {
pub fn empty() -> Self {
StandbyReply {
write_lsn: Lsn::INVALID,
flush_lsn: Lsn::INVALID,
apply_lsn: Lsn::INVALID,
reply_ts: 0,
reply_requested: false,
}
}
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct StandbyFeedback {
pub reply: StandbyReply,
pub hs_feedback: HotStandbyFeedback,
}
impl StandbyFeedback {
pub fn empty() -> Self {
StandbyFeedback {
reply: StandbyReply::empty(),
hs_feedback: HotStandbyFeedback::empty(),
}
}
}
/// Receiver is either pageserver or regular standby, which have different
/// feedbacks.
/// Used as both model and internally.
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum ReplicationFeedback {
Pageserver(PageserverFeedback),
Standby(StandbyFeedback),
}
/// Uniquely identifies a WAL service connection. Logged in spans for
/// observability.
pub type ConnectionId = u32;
/// Serialize is used only for json'ing in API response. Also used internally.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WalSenderState {
pub ttid: TenantTimelineId,
pub addr: SocketAddr,
pub conn_id: ConnectionId,
// postgres application_name
pub appname: Option<String>,
pub feedback: ReplicationFeedback,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WalReceiverState {
/// None means it is recovery initiated by us (this safekeeper).
pub conn_id: Option<ConnectionId>,
pub status: WalReceiverStatus,
}
/// Walreceiver status. Currently only whether it passed voting stage and
/// started receiving the stream, but it is easy to add more if needed.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum WalReceiverStatus {
Voting,
Streaming,
}
/// Info about timeline on safekeeper ready for reporting.
#[derive(Debug, Serialize, Deserialize)]
pub struct TimelineStatus {
pub tenant_id: TenantId,
pub timeline_id: TimelineId,
pub acceptor_state: AcceptorStateStatus,
pub pg_info: ServerInfo,
pub flush_lsn: Lsn,
pub timeline_start_lsn: Lsn,
pub local_start_lsn: Lsn,
pub commit_lsn: Lsn,
pub backup_lsn: Lsn,
pub peer_horizon_lsn: Lsn,
pub remote_consistent_lsn: Lsn,
pub peers: Vec<PeerInfo>,
pub walsenders: Vec<WalSenderState>,
pub walreceivers: Vec<WalReceiverState>,
}
fn lsn_invalid() -> Lsn {
Lsn::INVALID
}

View File

@@ -2081,13 +2081,20 @@ async fn timeline_compact_handler(
.as_ref()
.map(|r| r.sub_compaction)
.unwrap_or(false);
let sub_compaction_max_job_size_mb = compact_request
.as_ref()
.and_then(|r| r.sub_compaction_max_job_size_mb);
let options = CompactOptions {
compact_range: compact_request
compact_key_range: compact_request
.as_ref()
.and_then(|r| r.compact_range.clone()),
compact_below_lsn: compact_request.as_ref().and_then(|r| r.compact_below_lsn),
.and_then(|r| r.compact_key_range.clone()),
compact_lsn_range: compact_request
.as_ref()
.and_then(|r| r.compact_lsn_range.clone()),
flags,
sub_compaction,
sub_compaction_max_job_size_mb,
};
let scheduled = compact_request

View File

@@ -44,6 +44,7 @@ use std::sync::atomic::AtomicBool;
use std::sync::Weak;
use std::time::SystemTime;
use storage_broker::BrokerClientChannel;
use timeline::compaction::GcCompactJob;
use timeline::compaction::ScheduledCompactionTask;
use timeline::import_pgdata;
use timeline::offload::offload_timeline;
@@ -3017,8 +3018,15 @@ impl Tenant {
warn!("ignoring scheduled compaction task: scheduled task must be gc compaction: {:?}", next_scheduled_compaction_task.options);
} else if next_scheduled_compaction_task.options.sub_compaction {
info!("running scheduled enhanced gc bottom-most compaction with sub-compaction, splitting compaction jobs");
let jobs = timeline
.gc_compaction_split_jobs(next_scheduled_compaction_task.options)
let jobs: Vec<GcCompactJob> = timeline
.gc_compaction_split_jobs(
GcCompactJob::from_compact_options(
next_scheduled_compaction_task.options.clone(),
),
next_scheduled_compaction_task
.options
.sub_compaction_max_job_size_mb,
)
.await
.map_err(CompactionError::Other)?;
if jobs.is_empty() {
@@ -3029,9 +3037,23 @@ impl Tenant {
let mut guard = self.scheduled_compaction_tasks.lock().unwrap();
let tline_pending_tasks = guard.entry(*timeline_id).or_default();
for (idx, job) in jobs.into_iter().enumerate() {
// Unfortunately we need to convert the `GcCompactJob` back to `CompactionOptions`
// until we do further refactors to allow directly call `compact_with_gc`.
let mut flags: EnumSet<CompactFlags> = EnumSet::default();
flags |= CompactFlags::EnhancedGcBottomMostCompaction;
if job.dry_run {
flags |= CompactFlags::DryRun;
}
let options = CompactOptions {
flags,
sub_compaction: false,
compact_key_range: Some(job.compact_key_range.into()),
compact_lsn_range: Some(job.compact_lsn_range.into()),
sub_compaction_max_job_size_mb: None,
};
tline_pending_tasks.push_back(if idx == jobs_len - 1 {
ScheduledCompactionTask {
options: job,
options,
// The last job in the queue sends the signal and releases the gc guard
result_tx: next_scheduled_compaction_task
.result_tx
@@ -3042,7 +3064,7 @@ impl Tenant {
}
} else {
ScheduledCompactionTask {
options: job,
options,
result_tx: None,
gc_block: None,
}
@@ -5742,6 +5764,8 @@ mod tests {
#[cfg(feature = "testing")]
use timeline::compaction::{KeyHistoryRetention, KeyLogAtLsn};
#[cfg(feature = "testing")]
use timeline::CompactLsnRange;
#[cfg(feature = "testing")]
use timeline::GcInfo;
static TEST_KEY: Lazy<Key> =
@@ -9333,7 +9357,6 @@ mod tests {
&cancel,
CompactOptions {
flags: dryrun_flags,
compact_range: None,
..Default::default()
},
&ctx,
@@ -9582,7 +9605,6 @@ mod tests {
&cancel,
CompactOptions {
flags: dryrun_flags,
compact_range: None,
..Default::default()
},
&ctx,
@@ -9612,6 +9634,8 @@ mod tests {
#[cfg(feature = "testing")]
#[tokio::test]
async fn test_simple_bottom_most_compaction_on_branch() -> anyhow::Result<()> {
use timeline::CompactLsnRange;
let harness = TenantHarness::create("test_simple_bottom_most_compaction_on_branch").await?;
let (tenant, ctx) = harness.load().await;
@@ -9804,6 +9828,22 @@ mod tests {
verify_result().await;
// Piggyback a compaction with above_lsn. Ensure it works correctly when the specified LSN intersects with the layer files.
// Now we already have a single large delta layer, so the compaction min_layer_lsn should be the same as ancestor LSN (0x18).
branch_tline
.compact_with_gc(
&cancel,
CompactOptions {
compact_lsn_range: Some(CompactLsnRange::above(Lsn(0x40))),
..Default::default()
},
&ctx,
)
.await
.unwrap();
verify_result().await;
Ok(())
}
@@ -10092,7 +10132,7 @@ mod tests {
&cancel,
CompactOptions {
flags: EnumSet::new(),
compact_range: Some((get_key(0)..get_key(2)).into()),
compact_key_range: Some((get_key(0)..get_key(2)).into()),
..Default::default()
},
&ctx,
@@ -10139,7 +10179,7 @@ mod tests {
&cancel,
CompactOptions {
flags: EnumSet::new(),
compact_range: Some((get_key(2)..get_key(4)).into()),
compact_key_range: Some((get_key(2)..get_key(4)).into()),
..Default::default()
},
&ctx,
@@ -10191,7 +10231,7 @@ mod tests {
&cancel,
CompactOptions {
flags: EnumSet::new(),
compact_range: Some((get_key(4)..get_key(9)).into()),
compact_key_range: Some((get_key(4)..get_key(9)).into()),
..Default::default()
},
&ctx,
@@ -10242,7 +10282,7 @@ mod tests {
&cancel,
CompactOptions {
flags: EnumSet::new(),
compact_range: Some((get_key(9)..get_key(10)).into()),
compact_key_range: Some((get_key(9)..get_key(10)).into()),
..Default::default()
},
&ctx,
@@ -10298,7 +10338,7 @@ mod tests {
&cancel,
CompactOptions {
flags: EnumSet::new(),
compact_range: Some((get_key(0)..get_key(10)).into()),
compact_key_range: Some((get_key(0)..get_key(10)).into()),
..Default::default()
},
&ctx,
@@ -10327,7 +10367,6 @@ mod tests {
},
],
);
Ok(())
}
@@ -10380,4 +10419,602 @@ mod tests {
Ok(())
}
#[cfg(feature = "testing")]
#[tokio::test]
async fn test_simple_bottom_most_compaction_above_lsn() -> anyhow::Result<()> {
let harness = TenantHarness::create("test_simple_bottom_most_compaction_above_lsn").await?;
let (tenant, ctx) = harness.load().await;
fn get_key(id: u32) -> Key {
// using aux key here b/c they are guaranteed to be inside `collect_keyspace`.
let mut key = Key::from_hex("620000000033333333444444445500000000").unwrap();
key.field6 = id;
key
}
let img_layer = (0..10)
.map(|id| (get_key(id), Bytes::from(format!("value {id}@0x10"))))
.collect_vec();
let delta1 = vec![(
get_key(1),
Lsn(0x20),
Value::WalRecord(NeonWalRecord::wal_append("@0x20")),
)];
let delta4 = vec![(
get_key(1),
Lsn(0x28),
Value::WalRecord(NeonWalRecord::wal_append("@0x28")),
)];
let delta2 = vec![
(
get_key(1),
Lsn(0x30),
Value::WalRecord(NeonWalRecord::wal_append("@0x30")),
),
(
get_key(1),
Lsn(0x38),
Value::WalRecord(NeonWalRecord::wal_append("@0x38")),
),
];
let delta3 = vec![
(
get_key(8),
Lsn(0x48),
Value::WalRecord(NeonWalRecord::wal_append("@0x48")),
),
(
get_key(9),
Lsn(0x48),
Value::WalRecord(NeonWalRecord::wal_append("@0x48")),
),
];
let tline = tenant
.create_test_timeline_with_layers(
TIMELINE_ID,
Lsn(0x10),
DEFAULT_PG_VERSION,
&ctx,
vec![
// delta1/2/4 only contain a single key but multiple updates
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x20)..Lsn(0x28), delta1),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x30)..Lsn(0x50), delta2),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x28)..Lsn(0x30), delta4),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x30)..Lsn(0x50), delta3),
], // delta layers
vec![(Lsn(0x10), img_layer)], // image layers
Lsn(0x50),
)
.await?;
{
tline
.latest_gc_cutoff_lsn
.lock_for_write()
.store_and_unlock(Lsn(0x30))
.wait()
.await;
// Update GC info
let mut guard = tline.gc_info.write().unwrap();
*guard = GcInfo {
retain_lsns: vec![
(Lsn(0x10), tline.timeline_id, MaybeOffloaded::No),
(Lsn(0x20), tline.timeline_id, MaybeOffloaded::No),
],
cutoffs: GcCutoffs {
time: Lsn(0x30),
space: Lsn(0x30),
},
leases: Default::default(),
within_ancestor_pitr: false,
};
}
let expected_result = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20@0x28@0x30@0x38"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10@0x48"),
Bytes::from_static(b"value 9@0x10@0x48"),
];
let expected_result_at_gc_horizon = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20@0x28@0x30"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let expected_result_at_lsn_20 = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let expected_result_at_lsn_10 = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let verify_result = || async {
let gc_horizon = {
let gc_info = tline.gc_info.read().unwrap();
gc_info.cutoffs.time
};
for idx in 0..10 {
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x50), &ctx)
.await
.unwrap(),
&expected_result[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), gc_horizon, &ctx)
.await
.unwrap(),
&expected_result_at_gc_horizon[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x20), &ctx)
.await
.unwrap(),
&expected_result_at_lsn_20[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x10), &ctx)
.await
.unwrap(),
&expected_result_at_lsn_10[idx]
);
}
};
verify_result().await;
let cancel = CancellationToken::new();
tline
.compact_with_gc(
&cancel,
CompactOptions {
compact_lsn_range: Some(CompactLsnRange::above(Lsn(0x28))),
..Default::default()
},
&ctx,
)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The original image layer, not compacted
PersistentLayerKey {
key_range: get_key(0)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// Delta layer below the specified above_lsn not compacted
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x20)..Lsn(0x28),
is_delta: true,
},
// Delta layer compacted above the LSN
PersistentLayerKey {
key_range: get_key(1)..get_key(10),
lsn_range: Lsn(0x28)..Lsn(0x50),
is_delta: true,
},
],
);
// compact again
tline
.compact_with_gc(&cancel, CompactOptions::default(), &ctx)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The compacted image layer (full key range)
PersistentLayerKey {
key_range: Key::MIN..Key::MAX,
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// All other data in the delta layer
PersistentLayerKey {
key_range: get_key(1)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x50),
is_delta: true,
},
],
);
Ok(())
}
#[cfg(feature = "testing")]
#[tokio::test]
async fn test_simple_bottom_most_compaction_rectangle() -> anyhow::Result<()> {
let harness = TenantHarness::create("test_simple_bottom_most_compaction_rectangle").await?;
let (tenant, ctx) = harness.load().await;
fn get_key(id: u32) -> Key {
// using aux key here b/c they are guaranteed to be inside `collect_keyspace`.
let mut key = Key::from_hex("620000000033333333444444445500000000").unwrap();
key.field6 = id;
key
}
let img_layer = (0..10)
.map(|id| (get_key(id), Bytes::from(format!("value {id}@0x10"))))
.collect_vec();
let delta1 = vec![(
get_key(1),
Lsn(0x20),
Value::WalRecord(NeonWalRecord::wal_append("@0x20")),
)];
let delta4 = vec![(
get_key(1),
Lsn(0x28),
Value::WalRecord(NeonWalRecord::wal_append("@0x28")),
)];
let delta2 = vec![
(
get_key(1),
Lsn(0x30),
Value::WalRecord(NeonWalRecord::wal_append("@0x30")),
),
(
get_key(1),
Lsn(0x38),
Value::WalRecord(NeonWalRecord::wal_append("@0x38")),
),
];
let delta3 = vec![
(
get_key(8),
Lsn(0x48),
Value::WalRecord(NeonWalRecord::wal_append("@0x48")),
),
(
get_key(9),
Lsn(0x48),
Value::WalRecord(NeonWalRecord::wal_append("@0x48")),
),
];
let tline = tenant
.create_test_timeline_with_layers(
TIMELINE_ID,
Lsn(0x10),
DEFAULT_PG_VERSION,
&ctx,
vec![
// delta1/2/4 only contain a single key but multiple updates
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x20)..Lsn(0x28), delta1),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x30)..Lsn(0x50), delta2),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x28)..Lsn(0x30), delta4),
DeltaLayerTestDesc::new_with_inferred_key_range(Lsn(0x30)..Lsn(0x50), delta3),
], // delta layers
vec![(Lsn(0x10), img_layer)], // image layers
Lsn(0x50),
)
.await?;
{
tline
.latest_gc_cutoff_lsn
.lock_for_write()
.store_and_unlock(Lsn(0x30))
.wait()
.await;
// Update GC info
let mut guard = tline.gc_info.write().unwrap();
*guard = GcInfo {
retain_lsns: vec![
(Lsn(0x10), tline.timeline_id, MaybeOffloaded::No),
(Lsn(0x20), tline.timeline_id, MaybeOffloaded::No),
],
cutoffs: GcCutoffs {
time: Lsn(0x30),
space: Lsn(0x30),
},
leases: Default::default(),
within_ancestor_pitr: false,
};
}
let expected_result = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20@0x28@0x30@0x38"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10@0x48"),
Bytes::from_static(b"value 9@0x10@0x48"),
];
let expected_result_at_gc_horizon = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20@0x28@0x30"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let expected_result_at_lsn_20 = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10@0x20"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let expected_result_at_lsn_10 = [
Bytes::from_static(b"value 0@0x10"),
Bytes::from_static(b"value 1@0x10"),
Bytes::from_static(b"value 2@0x10"),
Bytes::from_static(b"value 3@0x10"),
Bytes::from_static(b"value 4@0x10"),
Bytes::from_static(b"value 5@0x10"),
Bytes::from_static(b"value 6@0x10"),
Bytes::from_static(b"value 7@0x10"),
Bytes::from_static(b"value 8@0x10"),
Bytes::from_static(b"value 9@0x10"),
];
let verify_result = || async {
let gc_horizon = {
let gc_info = tline.gc_info.read().unwrap();
gc_info.cutoffs.time
};
for idx in 0..10 {
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x50), &ctx)
.await
.unwrap(),
&expected_result[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), gc_horizon, &ctx)
.await
.unwrap(),
&expected_result_at_gc_horizon[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x20), &ctx)
.await
.unwrap(),
&expected_result_at_lsn_20[idx]
);
assert_eq!(
tline
.get(get_key(idx as u32), Lsn(0x10), &ctx)
.await
.unwrap(),
&expected_result_at_lsn_10[idx]
);
}
};
verify_result().await;
let cancel = CancellationToken::new();
tline
.compact_with_gc(
&cancel,
CompactOptions {
compact_key_range: Some((get_key(0)..get_key(2)).into()),
compact_lsn_range: Some((Lsn(0x20)..Lsn(0x28)).into()),
..Default::default()
},
&ctx,
)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The original image layer, not compacted
PersistentLayerKey {
key_range: get_key(0)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// According the selection logic, we select all layers with start key <= 0x28, so we would merge the layer 0x20-0x28 and
// the layer 0x28-0x30 into one.
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x20)..Lsn(0x30),
is_delta: true,
},
// Above the upper bound and untouched
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x30)..Lsn(0x50),
is_delta: true,
},
// This layer is untouched
PersistentLayerKey {
key_range: get_key(8)..get_key(10),
lsn_range: Lsn(0x30)..Lsn(0x50),
is_delta: true,
},
],
);
tline
.compact_with_gc(
&cancel,
CompactOptions {
compact_key_range: Some((get_key(3)..get_key(8)).into()),
compact_lsn_range: Some((Lsn(0x28)..Lsn(0x40)).into()),
..Default::default()
},
&ctx,
)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The original image layer, not compacted
PersistentLayerKey {
key_range: get_key(0)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// Not in the compaction key range, uncompacted
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x20)..Lsn(0x30),
is_delta: true,
},
// Not in the compaction key range, uncompacted but need rewrite because the delta layer overlaps with the range
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x30)..Lsn(0x50),
is_delta: true,
},
// Note that when we specify the LSN upper bound to be 0x40, the compaction algorithm will not try to cut the layer
// horizontally in half. Instead, it will include all LSNs that overlap with 0x40. So the real max_lsn of the compaction
// becomes 0x50.
PersistentLayerKey {
key_range: get_key(8)..get_key(10),
lsn_range: Lsn(0x30)..Lsn(0x50),
is_delta: true,
},
],
);
// compact again
tline
.compact_with_gc(
&cancel,
CompactOptions {
compact_key_range: Some((get_key(0)..get_key(5)).into()),
compact_lsn_range: Some((Lsn(0x20)..Lsn(0x50)).into()),
..Default::default()
},
&ctx,
)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The original image layer, not compacted
PersistentLayerKey {
key_range: get_key(0)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// The range gets compacted
PersistentLayerKey {
key_range: get_key(1)..get_key(2),
lsn_range: Lsn(0x20)..Lsn(0x50),
is_delta: true,
},
// Not touched during this iteration of compaction
PersistentLayerKey {
key_range: get_key(8)..get_key(10),
lsn_range: Lsn(0x30)..Lsn(0x50),
is_delta: true,
},
],
);
// final full compaction
tline
.compact_with_gc(&cancel, CompactOptions::default(), &ctx)
.await
.unwrap();
verify_result().await;
let all_layers = inspect_and_sort(&tline, Some(get_key(0)..get_key(10))).await;
check_layer_map_key_eq(
all_layers,
vec![
// The compacted image layer (full key range)
PersistentLayerKey {
key_range: Key::MIN..Key::MAX,
lsn_range: Lsn(0x10)..Lsn(0x11),
is_delta: false,
},
// All other data in the delta layer
PersistentLayerKey {
key_range: get_key(1)..get_key(10),
lsn_range: Lsn(0x10)..Lsn(0x50),
is_delta: true,
},
],
);
Ok(())
}
}

View File

@@ -780,46 +780,90 @@ pub(crate) enum CompactFlags {
#[serde_with::serde_as]
#[derive(Debug, Clone, serde::Deserialize)]
pub(crate) struct CompactRequest {
pub compact_range: Option<CompactRange>,
pub compact_below_lsn: Option<Lsn>,
pub compact_key_range: Option<CompactKeyRange>,
pub compact_lsn_range: Option<CompactLsnRange>,
/// Whether the compaction job should be scheduled.
#[serde(default)]
pub scheduled: bool,
/// Whether the compaction job should be split across key ranges.
#[serde(default)]
pub sub_compaction: bool,
/// Max job size for each subcompaction job.
pub sub_compaction_max_job_size_mb: Option<u64>,
}
#[serde_with::serde_as]
#[derive(Debug, Clone, serde::Deserialize)]
pub(crate) struct CompactRange {
pub(crate) struct CompactLsnRange {
pub start: Lsn,
pub end: Lsn,
}
#[serde_with::serde_as]
#[derive(Debug, Clone, serde::Deserialize)]
pub(crate) struct CompactKeyRange {
#[serde_as(as = "serde_with::DisplayFromStr")]
pub start: Key,
#[serde_as(as = "serde_with::DisplayFromStr")]
pub end: Key,
}
impl From<Range<Key>> for CompactRange {
fn from(range: Range<Key>) -> Self {
CompactRange {
impl From<Range<Lsn>> for CompactLsnRange {
fn from(range: Range<Lsn>) -> Self {
Self {
start: range.start,
end: range.end,
}
}
}
impl From<Range<Key>> for CompactKeyRange {
fn from(range: Range<Key>) -> Self {
Self {
start: range.start,
end: range.end,
}
}
}
impl From<CompactLsnRange> for Range<Lsn> {
fn from(range: CompactLsnRange) -> Self {
range.start..range.end
}
}
impl From<CompactKeyRange> for Range<Key> {
fn from(range: CompactKeyRange) -> Self {
range.start..range.end
}
}
impl CompactLsnRange {
#[cfg(test)]
#[cfg(feature = "testing")]
pub fn above(lsn: Lsn) -> Self {
Self {
start: lsn,
end: Lsn::MAX,
}
}
}
#[derive(Debug, Clone, Default)]
pub(crate) struct CompactOptions {
pub flags: EnumSet<CompactFlags>,
/// If set, the compaction will only compact the key range specified by this option.
/// This option is only used by GC compaction.
pub compact_range: Option<CompactRange>,
/// If set, the compaction will only compact the LSN below this value.
/// This option is only used by GC compaction.
pub compact_below_lsn: Option<Lsn>,
/// This option is only used by GC compaction. For the full explanation, see [`compaction::GcCompactJob`].
pub compact_key_range: Option<CompactKeyRange>,
/// If set, the compaction will only compact the LSN within this value.
/// This option is only used by GC compaction. For the full explanation, see [`compaction::GcCompactJob`].
pub compact_lsn_range: Option<CompactLsnRange>,
/// Enable sub-compaction (split compaction job across key ranges).
/// This option is only used by GC compaction.
pub sub_compaction: bool,
/// Set job size for the GC compaction.
/// This option is only used by GC compaction.
pub sub_compaction_max_job_size_mb: Option<u64>,
}
impl std::fmt::Debug for Timeline {
@@ -1641,9 +1685,10 @@ impl Timeline {
cancel,
CompactOptions {
flags,
compact_range: None,
compact_below_lsn: None,
compact_key_range: None,
compact_lsn_range: None,
sub_compaction: false,
sub_compaction_max_job_size_mb: None,
},
ctx,
)
@@ -4019,8 +4064,11 @@ impl Timeline {
// NB: there are two callers, one is the compaction task, of which there is only one per struct Tenant and hence Timeline.
// The other is the initdb optimization in flush_frozen_layer, used by `boostrap_timeline`, which runs before `.activate()`
// and hence before the compaction task starts.
// Note that there are a third "caller" that will take the `partitioning` lock. It is `gc_compaction_split_jobs` for
// gc-compaction where it uses the repartition data to determine the split jobs. In the future, it might use its own
// heuristics, but for now, we should allow concurrent access to it and let the caller retry compaction.
return Err(CompactionError::Other(anyhow!(
"repartition() called concurrently, this should not happen"
"repartition() called concurrently, this is rare and a retry should be fine"
)));
};
let ((dense_partition, sparse_partition), partition_lsn) = &*partitioning_guard;

View File

@@ -10,8 +10,8 @@ use std::sync::Arc;
use super::layer_manager::LayerManager;
use super::{
CompactFlags, CompactOptions, CompactRange, CreateImageLayersError, DurationRecorder,
ImageLayerCreationMode, RecordedDuration, Timeline,
CompactFlags, CompactOptions, CreateImageLayersError, DurationRecorder, ImageLayerCreationMode,
RecordedDuration, Timeline,
};
use anyhow::{anyhow, bail, Context};
@@ -64,6 +64,9 @@ const COMPACTION_DELTA_THRESHOLD: usize = 5;
/// A scheduled compaction task.
pub(crate) struct ScheduledCompactionTask {
/// It's unfortunate that we need to store a compact options struct here because the only outer
/// API we can call here is `compact_with_options` which does a few setup calls before starting the
/// actual compaction job... We should refactor this to store `GcCompactionJob` in the future.
pub options: CompactOptions,
/// The channel to send the compaction result. If this is a subcompaction, the last compaction job holds the sender.
pub result_tx: Option<tokio::sync::oneshot::Sender<()>>,
@@ -71,16 +74,57 @@ pub(crate) struct ScheduledCompactionTask {
pub gc_block: Option<gc_block::Guard>,
}
/// A job description for the gc-compaction job. This structure describes the rectangle range that the job will
/// process. The exact layers that need to be compacted/rewritten will be generated when `compact_with_gc` gets
/// called.
#[derive(Debug, Clone)]
pub(crate) struct GcCompactJob {
pub dry_run: bool,
/// The key range to be compacted. The compaction algorithm will only regenerate key-value pairs within this range
/// [left inclusive, right exclusive), and other pairs will be rewritten into new files if necessary.
pub compact_key_range: Range<Key>,
/// The LSN range to be compacted. The compaction algorithm will use this range to determine the layers to be
/// selected for the compaction, and it does not guarantee the generated layers will have exactly the same LSN range
/// as specified here. The true range being compacted is `min_lsn/max_lsn` in [`GcCompactionJobDescription`].
/// min_lsn will always <= the lower bound specified here, and max_lsn will always >= the upper bound specified here.
pub compact_lsn_range: Range<Lsn>,
}
impl GcCompactJob {
pub fn from_compact_options(options: CompactOptions) -> Self {
GcCompactJob {
dry_run: options.flags.contains(CompactFlags::DryRun),
compact_key_range: options
.compact_key_range
.map(|x| x.into())
.unwrap_or(Key::MIN..Key::MAX),
compact_lsn_range: options
.compact_lsn_range
.map(|x| x.into())
.unwrap_or(Lsn::INVALID..Lsn::MAX),
}
}
}
/// A job description for the gc-compaction job. This structure is generated when `compact_with_gc` is called
/// and contains the exact layers we want to compact.
pub struct GcCompactionJobDescription {
/// All layers to read in the compaction job
selected_layers: Vec<Layer>,
/// GC cutoff of the job
/// GC cutoff of the job. This is the lowest LSN that will be accessed by the read/GC path and we need to
/// keep all deltas <= this LSN or generate an image == this LSN.
gc_cutoff: Lsn,
/// LSNs to retain for the job
/// LSNs to retain for the job. Read path will use this LSN so we need to keep deltas <= this LSN or
/// generate an image == this LSN.
retain_lsns_below_horizon: Vec<Lsn>,
/// Maximum layer LSN processed in this compaction
/// Maximum layer LSN processed in this compaction, that is max(end_lsn of layers). Exclusive. All data
/// \>= this LSN will be kept and will not be rewritten.
max_layer_lsn: Lsn,
/// Only compact layers overlapping with this range
/// Minimum layer LSN processed in this compaction, that is min(start_lsn of layers). Inclusive.
/// All access below (strict lower than `<`) this LSN will be routed through the normal read path instead of
/// k-merge within gc-compaction.
min_layer_lsn: Lsn,
/// Only compact layers overlapping with this range.
compaction_key_range: Range<Key>,
/// When partial compaction is enabled, these layers need to be rewritten to ensure no overlap.
/// This field is here solely for debugging. The field will not be read once the compaction
@@ -299,7 +343,7 @@ impl Timeline {
)));
}
if options.compact_range.is_some() {
if options.compact_key_range.is_some() || options.compact_lsn_range.is_some() {
// maybe useful in the future? could implement this at some point
return Err(CompactionError::Other(anyhow!(
"compaction range is not supported for legacy compaction for now"
@@ -1754,32 +1798,35 @@ impl Timeline {
Ok(())
}
/// Split a gc-compaction job into multiple compaction jobs. Optimally, this function should return a vector of
/// `GcCompactionJobDesc`. But we want to keep it simple on the tenant scheduling side without exposing too much
/// ad-hoc information about gc compaction itself.
/// Split a gc-compaction job into multiple compaction jobs. The split is based on the key range and the estimated size of the compaction job.
/// The function returns a list of compaction jobs that can be executed separately. If the upper bound of the compact LSN
/// range is not specified, we will use the latest gc_cutoff as the upper bound, so that all jobs in the jobset acts
/// like a full compaction of the specified keyspace.
pub(crate) async fn gc_compaction_split_jobs(
self: &Arc<Self>,
options: CompactOptions,
) -> anyhow::Result<Vec<CompactOptions>> {
if !options.sub_compaction {
return Ok(vec![options]);
}
let compact_range = options.compact_range.clone().unwrap_or(CompactRange {
start: Key::MIN,
end: Key::MAX,
});
let compact_below_lsn = if let Some(compact_below_lsn) = options.compact_below_lsn {
compact_below_lsn
job: GcCompactJob,
sub_compaction_max_job_size_mb: Option<u64>,
) -> anyhow::Result<Vec<GcCompactJob>> {
let compact_below_lsn = if job.compact_lsn_range.end != Lsn::MAX {
job.compact_lsn_range.end
} else {
*self.get_latest_gc_cutoff_lsn() // use the real gc cutoff
};
// Split compaction job to about 4GB each
const GC_COMPACT_MAX_SIZE_MB: u64 = 4 * 1024;
let sub_compaction_max_job_size_mb =
sub_compaction_max_job_size_mb.unwrap_or(GC_COMPACT_MAX_SIZE_MB);
let mut compact_jobs = Vec::new();
// For now, we simply use the key partitioning information; we should do a more fine-grained partitioning
// by estimating the amount of files read for a compaction job. We should also partition on LSN.
let Ok(partition) = self.partitioning.try_lock() else {
bail!("failed to acquire partition lock");
let ((dense_ks, sparse_ks), _) = {
let Ok(partition) = self.partitioning.try_lock() else {
bail!("failed to acquire partition lock");
};
partition.clone()
};
let ((dense_ks, sparse_ks), _) = &*partition;
// Truncate the key range to be within user specified compaction range.
fn truncate_to(
source_start: &Key,
@@ -1808,8 +1855,8 @@ impl Timeline {
let Some((start, end)) = truncate_to(
&range.start,
&range.end,
&compact_range.start,
&compact_range.end,
&job.compact_key_range.start,
&job.compact_key_range.end,
) else {
continue;
};
@@ -1819,8 +1866,6 @@ impl Timeline {
let guard = self.layers.read().await;
let layer_map = guard.layer_map()?;
let mut current_start = None;
// Split compaction job to about 2GB each
const GC_COMPACT_MAX_SIZE_MB: u64 = 4 * 1024; // 4GB, TODO: should be configuration in the future
let ranges_num = split_key_ranges.len();
for (idx, (start, end)) in split_key_ranges.into_iter().enumerate() {
if current_start.is_none() {
@@ -1833,8 +1878,7 @@ impl Timeline {
}
let res = layer_map.range_search(start..end, compact_below_lsn);
let total_size = res.found.keys().map(|x| x.layer.file_size()).sum::<u64>();
if total_size > GC_COMPACT_MAX_SIZE_MB * 1024 * 1024 || ranges_num == idx + 1 {
let mut compact_options = options.clone();
if total_size > sub_compaction_max_job_size_mb * 1024 * 1024 || ranges_num == idx + 1 {
// Try to extend the compaction range so that we include at least one full layer file.
let extended_end = res
.found
@@ -1852,10 +1896,11 @@ impl Timeline {
"splitting compaction job: {}..{}, estimated_size={}",
start, end, total_size
);
compact_options.compact_range = Some(CompactRange { start, end });
compact_options.compact_below_lsn = Some(compact_below_lsn);
compact_options.sub_compaction = false;
compact_jobs.push(compact_options);
compact_jobs.push(GcCompactJob {
dry_run: job.dry_run,
compact_key_range: start..end,
compact_lsn_range: job.compact_lsn_range.start..compact_below_lsn,
});
current_start = Some(end);
}
}
@@ -1877,7 +1922,7 @@ impl Timeline {
/// Key::MIN..Key..MAX to the function indicates a full compaction, though technically, `Key::MAX` is not
/// part of the range.
///
/// If `options.compact_below_lsn` is provided, the compaction will only compact layers below or intersect with
/// If `options.compact_lsn_range.end` is provided, the compaction will only compact layers below or intersect with
/// the LSN. Otherwise, it will use the gc cutoff by default.
pub(crate) async fn compact_with_gc(
self: &Arc<Self>,
@@ -1885,9 +1930,13 @@ impl Timeline {
options: CompactOptions,
ctx: &RequestContext,
) -> anyhow::Result<()> {
if options.sub_compaction {
let sub_compaction = options.sub_compaction;
let job = GcCompactJob::from_compact_options(options.clone());
if sub_compaction {
info!("running enhanced gc bottom-most compaction with sub-compaction, splitting compaction jobs");
let jobs = self.gc_compaction_split_jobs(options).await?;
let jobs = self
.gc_compaction_split_jobs(job, options.sub_compaction_max_job_size_mb)
.await?;
let jobs_len = jobs.len();
for (idx, job) in jobs.into_iter().enumerate() {
info!(
@@ -1902,19 +1951,15 @@ impl Timeline {
}
return Ok(());
}
self.compact_with_gc_inner(cancel, options, ctx).await
self.compact_with_gc_inner(cancel, job, ctx).await
}
async fn compact_with_gc_inner(
self: &Arc<Self>,
cancel: &CancellationToken,
options: CompactOptions,
job: GcCompactJob,
ctx: &RequestContext,
) -> anyhow::Result<()> {
assert!(
!options.sub_compaction,
"sub-compaction should be handled by the outer function"
);
// Block other compaction/GC tasks from running for now. GC-compaction could run along
// with legacy compaction tasks in the future. Always ensure the lock order is compaction -> gc.
// Note that we already acquired the compaction lock when the outer `compact` function gets called.
@@ -1934,19 +1979,11 @@ impl Timeline {
)
.await?;
let flags = options.flags;
let compaction_key_range = options
.compact_range
.map(|range| range.start..range.end)
.unwrap_or_else(|| Key::MIN..Key::MAX);
let dry_run = job.dry_run;
let compact_key_range = job.compact_key_range;
let compact_lsn_range = job.compact_lsn_range;
let dry_run = flags.contains(CompactFlags::DryRun);
if compaction_key_range == (Key::MIN..Key::MAX) {
info!("running enhanced gc bottom-most compaction, dry_run={dry_run}, compaction_key_range={}..{}", compaction_key_range.start, compaction_key_range.end);
} else {
info!("running enhanced gc bottom-most compaction, dry_run={dry_run}");
}
info!("running enhanced gc bottom-most compaction, dry_run={dry_run}, compact_key_range={}..{}, compact_lsn_range={}..{}", compact_key_range.start, compact_key_range.end, compact_lsn_range.start, compact_lsn_range.end);
scopeguard::defer! {
info!("done enhanced gc bottom-most compaction");
@@ -1970,11 +2007,15 @@ impl Timeline {
// to get the truth data.
let real_gc_cutoff = *self.get_latest_gc_cutoff_lsn();
// The compaction algorithm will keep all keys above the gc_cutoff while keeping only necessary keys below the gc_cutoff for
// each of the retain_lsn. Therefore, if the user-provided `compact_below_lsn` is larger than the real gc cutoff, we will use
// each of the retain_lsn. Therefore, if the user-provided `compact_lsn_range.end` is larger than the real gc cutoff, we will use
// the real cutoff.
let mut gc_cutoff = options.compact_below_lsn.unwrap_or(real_gc_cutoff);
let mut gc_cutoff = if compact_lsn_range.end == Lsn::MAX {
real_gc_cutoff
} else {
compact_lsn_range.end
};
if gc_cutoff > real_gc_cutoff {
warn!("provided compact_below_lsn={} is larger than the real_gc_cutoff={}, using the real gc cutoff", gc_cutoff, real_gc_cutoff);
warn!("provided compact_lsn_range.end={} is larger than the real_gc_cutoff={}, using the real gc cutoff", gc_cutoff, real_gc_cutoff);
gc_cutoff = real_gc_cutoff;
}
gc_cutoff
@@ -1991,7 +2032,7 @@ impl Timeline {
}
let mut selected_layers: Vec<Layer> = Vec::new();
drop(gc_info);
// Pick all the layers intersect or below the gc_cutoff, get the largest LSN in the selected layers.
// Firstly, pick all the layers intersect or below the gc_cutoff, get the largest LSN in the selected layers.
let Some(max_layer_lsn) = layers
.iter_historic_layers()
.filter(|desc| desc.get_lsn_range().start <= gc_cutoff)
@@ -2001,27 +2042,45 @@ impl Timeline {
info!("no layers to compact with gc: no historic layers below gc_cutoff, gc_cutoff={}", gc_cutoff);
return Ok(());
};
// Next, if the user specifies compact_lsn_range.start, we need to filter some layers out. All the layers (strictly) below
// the min_layer_lsn computed as below will be filtered out and the data will be accessed using the normal read path, as if
// it is a branch.
let Some(min_layer_lsn) = layers
.iter_historic_layers()
.filter(|desc| {
if compact_lsn_range.start == Lsn::INVALID {
true // select all layers below if start == Lsn(0)
} else {
desc.get_lsn_range().end > compact_lsn_range.start // strictly larger than compact_above_lsn
}
})
.map(|desc| desc.get_lsn_range().start)
.min()
else {
info!("no layers to compact with gc: no historic layers above compact_above_lsn, compact_above_lsn={}", compact_lsn_range.end);
return Ok(());
};
// Then, pick all the layers that are below the max_layer_lsn. This is to ensure we can pick all single-key
// layers to compact.
let mut rewrite_layers = Vec::new();
for desc in layers.iter_historic_layers() {
if desc.get_lsn_range().end <= max_layer_lsn
&& overlaps_with(&desc.get_key_range(), &compaction_key_range)
&& desc.get_lsn_range().start >= min_layer_lsn
&& overlaps_with(&desc.get_key_range(), &compact_key_range)
{
// If the layer overlaps with the compaction key range, we need to read it to obtain all keys within the range,
// even if it might contain extra keys
selected_layers.push(guard.get_from_desc(&desc));
// If the layer is not fully contained within the key range, we need to rewrite it if it's a delta layer (it's fine
// to overlap image layers)
if desc.is_delta()
&& !fully_contains(&compaction_key_range, &desc.get_key_range())
if desc.is_delta() && !fully_contains(&compact_key_range, &desc.get_key_range())
{
rewrite_layers.push(desc);
}
}
}
if selected_layers.is_empty() {
info!("no layers to compact with gc: no layers within the key range, gc_cutoff={}, key_range={}..{}", gc_cutoff, compaction_key_range.start, compaction_key_range.end);
info!("no layers to compact with gc: no layers within the key range, gc_cutoff={}, key_range={}..{}", gc_cutoff, compact_key_range.start, compact_key_range.end);
return Ok(());
}
retain_lsns_below_horizon.sort();
@@ -2029,13 +2088,20 @@ impl Timeline {
selected_layers,
gc_cutoff,
retain_lsns_below_horizon,
min_layer_lsn,
max_layer_lsn,
compaction_key_range,
compaction_key_range: compact_key_range,
rewrite_layers,
}
};
let lowest_retain_lsn = if self.ancestor_timeline.is_some() {
Lsn(self.ancestor_lsn.0 + 1)
let (has_data_below, lowest_retain_lsn) = if compact_lsn_range.start != Lsn::INVALID {
// If we only compact above some LSN, we should get the history from the current branch below the specified LSN.
// We use job_desc.min_layer_lsn as if it's the lowest branch point.
(true, job_desc.min_layer_lsn)
} else if self.ancestor_timeline.is_some() {
// In theory, we can also use min_layer_lsn here, but using ancestor LSN makes sure the delta layers cover the
// LSN ranges all the way to the ancestor timeline.
(true, self.ancestor_lsn)
} else {
let res = job_desc
.retain_lsns_below_horizon
@@ -2053,17 +2119,19 @@ impl Timeline {
.unwrap_or(job_desc.gc_cutoff)
);
}
res
(false, res)
};
info!(
"picked {} layers for compaction ({} layers need rewriting) with max_layer_lsn={} gc_cutoff={} lowest_retain_lsn={}, key_range={}..{}",
"picked {} layers for compaction ({} layers need rewriting) with max_layer_lsn={} min_layer_lsn={} gc_cutoff={} lowest_retain_lsn={}, key_range={}..{}, has_data_below={}",
job_desc.selected_layers.len(),
job_desc.rewrite_layers.len(),
job_desc.max_layer_lsn,
job_desc.min_layer_lsn,
job_desc.gc_cutoff,
lowest_retain_lsn,
job_desc.compaction_key_range.start,
job_desc.compaction_key_range.end
job_desc.compaction_key_range.end,
has_data_below,
);
for layer in &job_desc.selected_layers {
@@ -2107,10 +2175,22 @@ impl Timeline {
let mut delta_layers = Vec::new();
let mut image_layers = Vec::new();
let mut downloaded_layers = Vec::new();
let mut total_downloaded_size = 0;
let mut total_layer_size = 0;
for layer in &job_desc.selected_layers {
if layer.needs_download().await?.is_some() {
total_downloaded_size += layer.layer_desc().file_size;
}
total_layer_size += layer.layer_desc().file_size;
let resident_layer = layer.download_and_keep_resident().await?;
downloaded_layers.push(resident_layer);
}
info!(
"finish downloading layers, downloaded={}, total={}, ratio={:.2}",
total_downloaded_size,
total_layer_size,
total_downloaded_size as f64 / total_layer_size as f64
);
for resident_layer in &downloaded_layers {
if resident_layer.layer_desc().is_delta() {
let layer = resident_layer.get_as_delta(ctx).await?;
@@ -2133,7 +2213,7 @@ impl Timeline {
// Only create image layers when there is no ancestor branches. TODO: create covering image layer
// when some condition meet.
let mut image_layer_writer = if self.ancestor_timeline.is_none() {
let mut image_layer_writer = if !has_data_below {
Some(
SplitImageLayerWriter::new(
self.conf,
@@ -2166,7 +2246,11 @@ impl Timeline {
}
let mut delta_layer_rewriters = HashMap::<Arc<PersistentLayerKey>, RewritingLayers>::new();
/// Returns None if there is no ancestor branch. Throw an error when the key is not found.
/// When compacting not at a bottom range (=`[0,X)`) of the root branch, we "have data below" (`has_data_below=true`).
/// The two cases are compaction in ancestor branches and when `compact_lsn_range.start` is set.
/// In those cases, we need to pull up data from below the LSN range we're compaction.
///
/// This function unifies the cases so that later code doesn't have to think about it.
///
/// Currently, we always get the ancestor image for each key in the child branch no matter whether the image
/// is needed for reconstruction. This should be fixed in the future.
@@ -2174,17 +2258,19 @@ impl Timeline {
/// Furthermore, we should do vectored get instead of a single get, or better, use k-merge for ancestor
/// images.
async fn get_ancestor_image(
tline: &Arc<Timeline>,
this_tline: &Arc<Timeline>,
key: Key,
ctx: &RequestContext,
has_data_below: bool,
history_lsn_point: Lsn,
) -> anyhow::Result<Option<(Key, Lsn, Bytes)>> {
if tline.ancestor_timeline.is_none() {
if !has_data_below {
return Ok(None);
};
// This function is implemented as a get of the current timeline at ancestor LSN, therefore reusing
// as much existing code as possible.
let img = tline.get(key, tline.ancestor_lsn, ctx).await?;
Ok(Some((key, tline.ancestor_lsn, img)))
let img = this_tline.get(key, history_lsn_point, ctx).await?;
Ok(Some((key, history_lsn_point, img)))
}
// Actually, we can decide not to write to the image layer at all at this point because
@@ -2268,7 +2354,8 @@ impl Timeline {
job_desc.gc_cutoff,
&job_desc.retain_lsns_below_horizon,
COMPACTION_DELTA_THRESHOLD,
get_ancestor_image(self, *last_key, ctx).await?,
get_ancestor_image(self, *last_key, ctx, has_data_below, lowest_retain_lsn)
.await?,
)
.await?;
retention
@@ -2297,7 +2384,7 @@ impl Timeline {
job_desc.gc_cutoff,
&job_desc.retain_lsns_below_horizon,
COMPACTION_DELTA_THRESHOLD,
get_ancestor_image(self, last_key, ctx).await?,
get_ancestor_image(self, last_key, ctx, has_data_below, lowest_retain_lsn).await?,
)
.await?;
retention

View File

@@ -34,6 +34,8 @@ DATA = \
neon--1.2--1.3.sql \
neon--1.3--1.4.sql \
neon--1.4--1.5.sql \
neon--1.5--1.6.sql \
neon--1.6--1.5.sql \
neon--1.5--1.4.sql \
neon--1.4--1.3.sql \
neon--1.3--1.2.sql \

View File

@@ -428,6 +428,8 @@ MergeTable()
hash_seq_init(&status, old_table->role_table);
while ((entry = hash_seq_search(&status)) != NULL)
{
RoleEntry * old;
bool found_old = false;
RoleEntry *to_write = hash_search(
CurrentDdlTable->role_table,
entry->name,
@@ -435,30 +437,23 @@ MergeTable()
NULL);
to_write->type = entry->type;
if (entry->password)
to_write->password = entry->password;
to_write->password = entry->password;
strlcpy(to_write->old_name, entry->old_name, NAMEDATALEN);
if (entry->old_name[0] != '\0')
{
bool found_old = false;
RoleEntry *old = hash_search(
CurrentDdlTable->role_table,
entry->old_name,
HASH_FIND,
&found_old);
if (entry->old_name[0] == '\0')
continue;
if (found_old)
{
if (old->old_name[0] != '\0')
strlcpy(to_write->old_name, old->old_name, NAMEDATALEN);
else
strlcpy(to_write->old_name, entry->old_name, NAMEDATALEN);
hash_search(CurrentDdlTable->role_table,
entry->old_name,
HASH_REMOVE,
NULL);
}
}
old = hash_search(
CurrentDdlTable->role_table,
entry->old_name,
HASH_FIND,
&found_old);
if (!found_old)
continue;
strlcpy(to_write->old_name, old->old_name, NAMEDATALEN);
hash_search(CurrentDdlTable->role_table,
entry->old_name,
HASH_REMOVE,
NULL);
}
hash_destroy(old_table->role_table);
}

View File

@@ -22,6 +22,7 @@
#include "neon_pgversioncompat.h"
#include "access/parallel.h"
#include "access/xlog.h"
#include "funcapi.h"
#include "miscadmin.h"
#include "pagestore_client.h"
@@ -40,12 +41,16 @@
#include "utils/dynahash.h"
#include "utils/guc.h"
#if PG_VERSION_NUM >= 150000
#include "access/xlogrecovery.h"
#endif
#include "hll.h"
#include "bitmap.h"
#include "neon.h"
#include "neon_perf_counters.h"
#define CriticalAssert(cond) do if (!(cond)) elog(PANIC, "Assertion %s failed at %s:%d: ", #cond, __FILE__, __LINE__); while (0)
#define CriticalAssert(cond) do if (!(cond)) elog(PANIC, "LFC: assertion %s failed at %s:%d: ", #cond, __FILE__, __LINE__); while (0)
/*
* Local file cache is used to temporary store relations pages in local file system.
@@ -100,7 +105,9 @@ typedef struct FileCacheEntry
BufferTag key;
uint32 hash;
uint32 offset;
uint32 access_count;
uint32 access_count : 30;
uint32 prewarm_requested : 1; /* entry should be filled by prewarm */
uint32 prewarm_started : 1; /* chunk is written by lfc_prewarm */
uint32 bitmap[CHUNK_BITMAP_SIZE];
dlist_node list_node; /* LRU/holes list node */
} FileCacheEntry;
@@ -118,17 +125,29 @@ typedef struct FileCacheControl
uint64 writes; /* number of writes issued */
uint64 time_read; /* time spent reading (us) */
uint64 time_write; /* time spent writing (us) */
uint32 prewarm_total_chunks;
uint32 prewarm_curr_chunk;
uint32 prewarmed_pages;
uint32 skipped_pages;
dlist_head lru; /* double linked list for LRU replacement
* algorithm */
dlist_head holes; /* double linked list of punched holes */
HyperLogLogState wss_estimation; /* estimation of working set size */
} FileCacheControl;
typedef struct FileCacheStateEntry
{
BufferTag key;
uint32 bitmap[CHUNK_BITMAP_SIZE];
} FileCacheStateEntry;
static HTAB *lfc_hash;
static int lfc_desc = 0;
static LWLockId lfc_lock;
static int lfc_max_size;
static int lfc_size_limit;
static int lfc_prewarm_limit;
static int lfc_prewarm_batch;
static char *lfc_path;
static FileCacheControl *lfc_ctl;
static shmem_startup_hook_type prev_shmem_startup_hook;
@@ -149,7 +168,7 @@ lfc_disable(char const *op)
{
int fd;
elog(WARNING, "Failed to %s local file cache at %s: %m, disabling local file cache", op, lfc_path);
elog(WARNING, "LFC: failed to %s local file cache at %s: %m, disabling local file cache", op, lfc_path);
/* Invalidate hash */
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
@@ -184,7 +203,7 @@ lfc_disable(char const *op)
pgstat_report_wait_end();
if (rc < 0)
elog(WARNING, "Failed to truncate local file cache %s: %m", lfc_path);
elog(WARNING, "LFC: failed to truncate local file cache %s: %m", lfc_path);
}
}
@@ -196,7 +215,7 @@ lfc_disable(char const *op)
fd = BasicOpenFile(lfc_path, O_RDWR | O_CREAT | O_TRUNC);
if (fd < 0)
elog(WARNING, "Failed to recreate local file cache %s: %m", lfc_path);
elog(WARNING, "LFC: failed to recreate local file cache %s: %m", lfc_path);
else
close(fd);
@@ -267,14 +286,7 @@ lfc_shmem_startup(void)
n_chunks + 1, n_chunks + 1,
&info,
HASH_ELEM | HASH_BLOBS);
lfc_ctl->generation = 0;
lfc_ctl->size = 0;
lfc_ctl->used = 0;
lfc_ctl->hits = 0;
lfc_ctl->misses = 0;
lfc_ctl->writes = 0;
lfc_ctl->time_read = 0;
lfc_ctl->time_write = 0;
memset(lfc_ctl, 0, sizeof *lfc_ctl);
dlist_init(&lfc_ctl->lru);
dlist_init(&lfc_ctl->holes);
@@ -285,7 +297,7 @@ lfc_shmem_startup(void)
fd = BasicOpenFile(lfc_path, O_RDWR | O_CREAT | O_TRUNC);
if (fd < 0)
{
elog(WARNING, "Failed to create local file cache %s: %m", lfc_path);
elog(WARNING, "LFC: failed to create local file cache %s: %m", lfc_path);
lfc_ctl->limit = 0;
}
else
@@ -327,7 +339,7 @@ lfc_check_limit_hook(int *newval, void **extra, GucSource source)
{
if (*newval > lfc_max_size)
{
elog(ERROR, "neon.file_cache_size_limit can not be larger than neon.max_file_cache_size");
elog(ERROR, "LFC: neon.file_cache_size_limit can not be larger than neon.max_file_cache_size");
return false;
}
return true;
@@ -365,6 +377,10 @@ lfc_change_limit_hook(int newval, void *extra)
neon_log(LOG, "Failed to punch hole in file: %m");
#endif
/* We remove the old entry, and re-enter a hole to the hash table */
for (int i = 0; i < BLOCKS_PER_CHUNK; i++)
{
lfc_ctl->used_pages -= (victim->bitmap[i >> 5] >> (i & 31)) & 1;
}
hash_search_with_hash_value(lfc_hash, &victim->key, victim->hash, HASH_REMOVE, NULL);
memset(&holetag, 0, sizeof(holetag));
@@ -436,6 +452,32 @@ lfc_init(void)
NULL,
NULL);
DefineCustomIntVariable("neon.file_cache_prewarm_limit",
"Maximal number of prewarmed pages",
NULL,
&lfc_prewarm_limit,
0, /* disabled by default */
0,
INT_MAX,
PGC_SIGHUP,
0,
NULL,
NULL,
NULL);
DefineCustomIntVariable("neon.file_cache_prewarm_batch",
"Number of pages retrivied by prewarm from page server",
NULL,
&lfc_prewarm_batch,
64,
1,
INT_MAX,
PGC_SIGHUP,
0,
NULL,
NULL,
NULL);
if (lfc_max_size == 0)
return;
@@ -449,6 +491,264 @@ lfc_init(void)
#endif
}
static FileCacheStateEntry*
lfc_get_state(size_t* n_entries)
{
size_t max_entries = *n_entries;
size_t i = 0;
FileCacheStateEntry* fs;
if (lfc_maybe_disabled() || max_entries == 0) /* fast exit if file cache is disabled */
return NULL;
fs = (FileCacheStateEntry*)palloc(sizeof(FileCacheStateEntry) * max_entries);
LWLockAcquire(lfc_lock, LW_SHARED);
if (LFC_ENABLED())
{
dlist_iter iter;
dlist_reverse_foreach(iter, &lfc_ctl->lru)
{
FileCacheEntry *entry = dlist_container(FileCacheEntry, list_node, iter.cur);
memcpy(&fs[i].key, &entry->key, sizeof entry->key);
memcpy(fs[i].bitmap, entry->bitmap, sizeof entry->bitmap);
if (++i == max_entries)
break;
}
elog(LOG, "LFC: save state of %ld chunks", (long)i);
}
LWLockRelease(lfc_lock);
*n_entries = i;
return fs;
}
/*
* Prewarm LFC cache to the specified state.
*
* Prewarming can interfere with accesses to the pages by other backends. Usually access to LFC is protected by shared buffers: when Postgres
* is reading page, it pins shared buffer and enforces that only one backend is reading it, while other are waiting for read completion.
*
* But it is not true for prewarming: backend can fetch page itself, modify and then write it to LFC. At the
* same time `lfc_prewarm` tries to write deteriorated image of this page in LFC. To increase concurrency, access to LFC files (both read and write)
* is performed without holding locks. So it can happen that two or more processes write different content to the same location in the LFC file.
* Certainly we can not rely on disk content in this case.
*
* To solve this problem we use two flags in LFC entry: `prewarm_requested` and `prewarm_started`. First is set before prewarm is actually started.
* `lfc_prewarm` writes to LFC file only if this flag is set. This flag is cleared if any other backend performs write to this LFC chunk.
* In this case data loaded by `lfc_prewarm` is considered to be deteriorated and should be just ignored.
*
* But as far as write to LFC is performed without holding lock, there is no guarantee that no such write is in progress.
* This is why second flag is used: `prewarm_started`. It is set by `lfc_prewarm` when is starts writing page and cleared when write is completed.
* Any other backend writing to LFC should abandon it's write to LFC file (just not mark page as loaded in bitmap) if this flag is set.
* So neither `lfc_prewarm`, neither backend are saving page in LFC in this case - it is just skipped.
*/
static void
lfc_prewarm(FileCacheStateEntry* fs, size_t n_entries)
{
ssize_t rc;
size_t snd_idx = 0, rcv_idx = 0;
size_t n_sent = 0, n_received = 0;
FileCacheEntry *entry;
uint64 generation;
uint32 entry_offset;
uint32 hash;
size_t i;
bool found;
int shard_no;
if (!lfc_ensure_opened())
return;
if (n_entries == 0 || fs == NULL)
{
elog(LOG, "LFC: prewarm is disabled");
return;
}
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
/* Do not prewarm more entries than LFC limit */
if (lfc_ctl->limit <= lfc_ctl->size)
{
LWLockRelease(lfc_lock);
return;
}
if (n_entries > lfc_ctl->limit - lfc_ctl->size)
{
n_entries = lfc_ctl->limit - lfc_ctl->size;
}
/* Initialize fields used to track prewarming progress */
lfc_ctl->prewarm_total_chunks = n_entries;
lfc_ctl->prewarm_curr_chunk = 0;
/*
* Load LFC state and add entries in hash table.
* It is needed to track modification of prewarmed pages.
* All such entries have `prewarm_requested` flag set. When entry is updated (some backed reads or writes
* some pages from this chunk), then `prewarm_requested` flag is cleared, prohibiting prewarm of this chunk.
* It prevents overwritting page updated or loaded by backend with older one, loaded by prewarm.
*/
for (i = 0; i < n_entries; i++)
{
hash = get_hash_value(lfc_hash, &fs[i].key);
entry = hash_search_with_hash_value(lfc_hash, &fs[i].key, hash, HASH_ENTER, &found);
/* Do not prewarm chunks which are already present in LFC */
if (!found)
{
entry->offset = lfc_ctl->size++;
entry->hash = hash;
entry->access_count = 0;
entry->prewarm_requested = true;
entry->prewarm_started = false;
memset(entry->bitmap, 0, sizeof entry->bitmap);
/* Most recently visted pages are stored first */
dlist_push_head(&lfc_ctl->lru, &entry->list_node);
lfc_ctl->used += 1;
}
}
LWLockRelease(lfc_lock);
elog(LOG, "LFC: start loading %ld chunks", (long)n_entries);
while (true)
{
size_t chunk_no = snd_idx / BLOCKS_PER_CHUNK;
size_t offs_in_chunk = snd_idx % BLOCKS_PER_CHUNK;
if (chunk_no < n_entries)
{
if (fs[chunk_no].bitmap[offs_in_chunk >> 5] & (1 << (offs_in_chunk & 31)))
{
/*
* In case of prewarming replica we should be careful not to load too new version
* of the page - with LSN larger than current replay LSN.
* At primary we are always loading latest version.
*/
XLogRecPtr req_lsn = RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : UINT64_MAX;
NeonGetPageRequest request = {
.req.tag = T_NeonGetPageRequest,
/* lsn and not_modified_since are filled in below */
.rinfo = BufTagGetNRelFileInfo(fs[chunk_no].key),
.forknum = fs[chunk_no].key.forkNum,
.blkno = fs[chunk_no].key.blockNum + offs_in_chunk,
.req.lsn = req_lsn,
.req.not_modified_since = 0
};
shard_no = get_shard_number(&fs[chunk_no].key);
while (!page_server->send(shard_no, (NeonRequest *) &request)
|| !page_server->flush(shard_no))
{
/* page server disconnected: all previusly sent prefetch requests are lost */
n_sent = 0;
}
n_sent += 1;
}
snd_idx += 1;
}
if (n_sent >= n_received + lfc_prewarm_batch || chunk_no == n_entries)
{
NeonResponse * resp;
do
{
chunk_no = rcv_idx / BLOCKS_PER_CHUNK;
offs_in_chunk = rcv_idx % BLOCKS_PER_CHUNK;
rcv_idx += 1;
} while (!(fs[chunk_no].bitmap[offs_in_chunk >> 5] & (1 << (offs_in_chunk & 31))));
shard_no = get_shard_number(&fs[chunk_no].key);
resp = page_server->receive(shard_no);
lfc_ctl->prewarm_curr_chunk = chunk_no;
switch (resp->tag)
{
case T_NeonGetPageResponse:
break;
case T_NeonErrorResponse:
{
/* Prefech can request page which is already dropped so PS can respond with error: just ignore it */
NeonErrorResponse *err_resp = (NeonErrorResponse *) resp;
elog(LOG, "LFC: page server failed to load page %u of relation %u/%u/%u.%u: %s",
fs[chunk_no].key.blockNum + (BlockNumber)offs_in_chunk, RelFileInfoFmt(BufTagGetNRelFileInfo(fs[chunk_no].key)), fs[chunk_no].key.forkNum, err_resp->message);
continue;
}
default:
elog(LOG, "LFC: unexpected response type: %d", resp->tag);
return;
}
hash = get_hash_value(lfc_hash, &fs[chunk_no].key);
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
entry = hash_search_with_hash_value(lfc_hash, &fs[chunk_no].key, hash, HASH_FIND, NULL);
if (entry != NULL && entry->prewarm_requested)
{
/* Unlink entry from LRU list to pin it for the duration of IO operation */
if (entry->access_count++ == 0)
dlist_delete(&entry->list_node);
generation = lfc_ctl->generation;
entry_offset = entry->offset;
Assert(!entry->prewarm_started);
entry->prewarm_started = true;
LWLockRelease(lfc_lock);
rc = pwrite(lfc_desc, ((NeonGetPageResponse*)resp)->page, BLCKSZ, ((off_t) entry_offset * BLOCKS_PER_CHUNK + offs_in_chunk) * BLCKSZ);
if (rc != BLCKSZ)
{
lfc_disable("write");
break;
}
else
{
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
if (lfc_ctl->generation == generation)
{
CriticalAssert(LFC_ENABLED());
if (--entry->access_count == 0)
dlist_push_tail(&lfc_ctl->lru, &entry->list_node);
if (entry->prewarm_requested)
{
lfc_ctl->used_pages += 1 - ((entry->bitmap[offs_in_chunk >> 5] >> (offs_in_chunk & 31)) & 1);
entry->bitmap[offs_in_chunk >> 5] |= 1 << (offs_in_chunk & 31);
lfc_ctl->prewarmed_pages += 1;
}
else
{
lfc_ctl->skipped_pages += 1;
}
Assert(entry->prewarm_started);
entry->prewarm_started = false;
}
LWLockRelease(lfc_lock);
}
}
else
{
Assert(!entry || !entry->prewarm_started);
lfc_ctl->skipped_pages += 1;
LWLockRelease(lfc_lock);
}
if (++n_received == n_sent && snd_idx >= n_entries * BLOCKS_PER_CHUNK)
{
break;
}
}
}
Assert(n_sent == n_received);
lfc_ctl->prewarm_curr_chunk = n_entries;
elog(LOG, "LFC: complete prewarming: loaded %ld pages", (long)n_received);
}
/*
* Check if page is present in the cache.
* Returns true if page is found in local cache.
@@ -616,6 +916,7 @@ lfc_evict(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno)
/* remove the page from the cache */
entry->bitmap[chunk_offs >> 5] &= ~(1 << (chunk_offs & (32 - 1)));
entry->prewarm_requested = false; /* prohibit prewarm of this LFC entry */
if (entry->access_count == 0)
{
@@ -861,7 +1162,15 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
CriticalAssert(BufTagGetRelNumber(&tag) != InvalidRelFileNumber);
/*
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
if (!LFC_ENABLED())
{
LWLockRelease(lfc_lock);
return;
}
/*
* For every chunk that has blocks we're interested in, we
* 1. get the chunk header
* 2. Check if the chunk actually has the blocks we're interested in
@@ -887,18 +1196,26 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
tag.blockNum = blkno & ~(BLOCKS_PER_CHUNK - 1);
hash = get_hash_value(lfc_hash, &tag);
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
if (!LFC_ENABLED())
{
LWLockRelease(lfc_lock);
return;
}
entry = hash_search_with_hash_value(lfc_hash, &tag, hash, HASH_ENTER, &found);
if (found)
{
if (entry->prewarm_started)
{
/*
* Some page of this chunk is currently written by `lfc_prewarm`.
* We should give-up not to interfere with it.
* But clearing `prewarm_requested` flag also will not allow `lfc_prewarm` to fix it result.
*/
entry->prewarm_requested = false;
/* cleanup all affected pages of the chunk: we do not know which one of them is conflicting with prewarm */
for (int i = 0; i < blocks_in_chunk; i++)
{
lfc_ctl->used_pages -= ((entry->bitmap[(chunk_offs + i) >> 5] >> ((chunk_offs + i) & 31)) & 1);
entry->bitmap[(chunk_offs + i) >> 5] &= ~(1 << ((chunk_offs + i) & 31));
}
goto next_chunk;
}
/*
* Unlink entry from LRU list to pin it for the duration of IO
* operation
@@ -928,7 +1245,7 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
{
/* Cache overflow: evict least recently used chunk */
FileCacheEntry *victim = dlist_container(FileCacheEntry, list_node, dlist_pop_head_node(&lfc_ctl->lru));
for (int i = 0; i < BLOCKS_PER_CHUNK; i++)
{
lfc_ctl->used_pages -= (victim->bitmap[i >> 5] >> (i & 31)) & 1;
@@ -944,10 +1261,10 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
FileCacheEntry *hole = dlist_container(FileCacheEntry, list_node, dlist_pop_head_node(&lfc_ctl->holes));
uint32 offset = hole->offset;
bool hole_found;
hash_search_with_hash_value(lfc_hash, &hole->key, hole->hash, HASH_REMOVE, &hole_found);
CriticalAssert(hole_found);
lfc_ctl->used += 1;
entry->offset = offset; /* reuse the hole */
}
@@ -959,9 +1276,11 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
}
entry->access_count = 1;
entry->hash = hash;
entry->prewarm_started = false;
memset(entry->bitmap, 0, sizeof entry->bitmap);
}
entry->prewarm_requested = false; /* prohibit prewarm if LFC entry is updated by some backend */
generation = lfc_ctl->generation;
entry_offset = entry->offset;
LWLockRelease(lfc_lock);
@@ -976,6 +1295,7 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
if (rc != BLCKSZ * blocks_in_chunk)
{
lfc_disable("write");
return;
}
else
{
@@ -1005,12 +1325,13 @@ lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
}
}
LWLockRelease(lfc_lock);
}
next_chunk:
blkno += blocks_in_chunk;
buf_offset += blocks_in_chunk;
nblocks -= blocks_in_chunk;
}
LWLockRelease(lfc_lock);
}
typedef struct
@@ -1334,3 +1655,69 @@ approximate_working_set_size(PG_FUNCTION_ARGS)
}
PG_RETURN_NULL();
}
PG_FUNCTION_INFO_V1(get_local_cache_state);
Datum
get_local_cache_state(PG_FUNCTION_ARGS)
{
size_t n_entries = PG_ARGISNULL(0) ? lfc_prewarm_limit : PG_GETARG_INT32(0);
FileCacheStateEntry* fs = lfc_get_state(&n_entries);
if (fs != NULL)
{
size_t size_in_bytes = sizeof(FileCacheStateEntry) * n_entries;
bytea* res = (bytea*)palloc(VARHDRSZ + size_in_bytes);
SET_VARSIZE(res, VARHDRSZ + size_in_bytes);
memcpy(VARDATA(res), fs, size_in_bytes);
pfree(fs);
PG_RETURN_BYTEA_P(res);
}
PG_RETURN_NULL();
}
PG_FUNCTION_INFO_V1(prewarm_local_cache);
Datum
prewarm_local_cache(PG_FUNCTION_ARGS)
{
bytea* state = PG_GETARG_BYTEA_PP(0);
uint32 n_entries = VARSIZE_ANY_EXHDR(state)/sizeof(FileCacheStateEntry);
FileCacheStateEntry* fs = (FileCacheStateEntry*)VARDATA_ANY(state);
lfc_prewarm(fs, n_entries);
PG_RETURN_NULL();
}
PG_FUNCTION_INFO_V1(get_prewarm_info);
Datum
get_prewarm_info(PG_FUNCTION_ARGS)
{
Datum values[4];
bool nulls[4];
TupleDesc tupdesc;
if (lfc_size_limit == 0)
PG_RETURN_NULL();
tupdesc = CreateTemplateTupleDesc(4);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "total_chunks", INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "curr_chunk", INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 3, "prewarmed_pages", INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "skipped_pages", INT4OID, -1, 0);
tupdesc = BlessTupleDesc(tupdesc);
MemSet(nulls, 0, sizeof(nulls));
LWLockAcquire(lfc_lock, LW_SHARED);
values[0] = Int32GetDatum(lfc_ctl->prewarm_total_chunks);
values[1] = Int32GetDatum(lfc_ctl->prewarm_curr_chunk);
values[2] = Int32GetDatum(lfc_ctl->prewarmed_pages);
values[3] = Int32GetDatum(lfc_ctl->skipped_pages);
LWLockRelease(lfc_lock);
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}

View File

@@ -131,8 +131,8 @@ get_snapshots_cutoff_lsn(void)
{
cutoff = snapshot_descriptors[logical_replication_max_snap_files - 1].lsn;
elog(LOG,
"ls_monitor: dropping logical slots with restart_lsn lower %X/%X, found %zu snapshot files, limit is %d",
LSN_FORMAT_ARGS(cutoff), snapshot_index, logical_replication_max_snap_files);
"ls_monitor: number of snapshot files, %zu, is larger than limit of %d",
snapshot_index, logical_replication_max_snap_files);
}
/* Is the size of the logical snapshots directory larger than specified?
@@ -162,8 +162,8 @@ get_snapshots_cutoff_lsn(void)
}
if (cutoff != original)
elog(LOG, "ls_monitor: dropping logical slots with restart_lsn lower than %X/%X, " SNAPDIR " is larger than %d KB",
LSN_FORMAT_ARGS(cutoff), logical_replication_max_logicalsnapdir_size);
elog(LOG, "ls_monitor: " SNAPDIR " is larger than %d KB",
logical_replication_max_logicalsnapdir_size);
}
pfree(snapshot_descriptors);
@@ -214,9 +214,13 @@ InitLogicalReplicationMonitor(void)
}
/*
* Unused logical replication slots pins WAL and prevents deletion of snapshots.
* Unused logical replication slots pins WAL and prevent deletion of snapshots.
* WAL bloat is guarded by max_slot_wal_keep_size; this bgw removes slots which
* need too many .snap files.
* need too many .snap files. These files are stored as AUX files, which are a
* pageserver mechanism for storing non-relation data. AUX files are shipped in
* in the basebackup which is requested by compute_ctl before Postgres starts.
* The larger the time to retrieve the basebackup, the more likely it is the
* compute will be killed by the control plane due to a timeout.
*/
void
LogicalSlotsMonitorMain(Datum main_arg)
@@ -239,10 +243,7 @@ LogicalSlotsMonitorMain(Datum main_arg)
ProcessConfigFile(PGC_SIGHUP);
}
/*
* If there are too many .snap files, just drop all logical slots to
* prevent aux files bloat.
*/
/* Get the cutoff LSN */
cutoff_lsn = get_snapshots_cutoff_lsn();
if (cutoff_lsn > 0)
{
@@ -252,31 +253,37 @@ LogicalSlotsMonitorMain(Datum main_arg)
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
XLogRecPtr restart_lsn;
/* find the name */
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
/* Consider only logical repliction slots */
/* Consider only active logical repliction slots */
if (!s->in_use || !SlotIsLogical(s))
{
LWLockRelease(ReplicationSlotControlLock);
continue;
}
/* do we need to drop it? */
/*
* Retrieve the restart LSN to determine if we need to drop the
* slot
*/
SpinLockAcquire(&s->mutex);
restart_lsn = s->data.restart_lsn;
SpinLockRelease(&s->mutex);
strlcpy(slot_name, s->data.name.data, sizeof(slot_name));
LWLockRelease(ReplicationSlotControlLock);
if (restart_lsn >= cutoff_lsn)
{
LWLockRelease(ReplicationSlotControlLock);
elog(LOG, "ls_monitor: not dropping replication slot %s because restart LSN %X/%X is greater than cutoff LSN %X/%X",
slot_name, LSN_FORMAT_ARGS(restart_lsn), LSN_FORMAT_ARGS(cutoff_lsn));
continue;
}
strlcpy(slot_name, s->data.name.data, NAMEDATALEN);
elog(LOG, "ls_monitor: dropping slot %s with restart_lsn %X/%X below horizon %X/%X",
elog(LOG, "ls_monitor: dropping replication slot %s because restart LSN %X/%X lower than cutoff LSN %X/%X",
slot_name, LSN_FORMAT_ARGS(restart_lsn), LSN_FORMAT_ARGS(cutoff_lsn));
LWLockRelease(ReplicationSlotControlLock);
/* now try to drop it, killing owner before if any */
/* now try to drop it, killing owner before, if any */
for (;;)
{
pid_t active_pid;
@@ -288,9 +295,9 @@ LogicalSlotsMonitorMain(Datum main_arg)
if (active_pid == 0)
{
/*
* Slot is releasted, try to drop it. Though of course
* Slot is released, try to drop it. Though of course,
* it could have been reacquired, so drop can ERROR
* out. Similarly it could have been dropped in the
* out. Similarly, it could have been dropped in the
* meanwhile.
*
* In principle we could remove pg_try/pg_catch, that
@@ -300,14 +307,14 @@ LogicalSlotsMonitorMain(Datum main_arg)
PG_TRY();
{
ReplicationSlotDrop(slot_name, true);
elog(LOG, "ls_monitor: slot %s dropped", slot_name);
elog(LOG, "ls_monitor: replication slot %s dropped", slot_name);
}
PG_CATCH();
{
/* log ERROR and reset elog stack */
EmitErrorReport();
FlushErrorState();
elog(LOG, "ls_monitor: failed to drop slot %s", slot_name);
elog(LOG, "ls_monitor: failed to drop replication slot %s", slot_name);
}
PG_END_TRY();
break;
@@ -315,7 +322,7 @@ LogicalSlotsMonitorMain(Datum main_arg)
else
{
/* kill the owner and wait for release */
elog(LOG, "ls_monitor: killing slot %s owner %d", slot_name, active_pid);
elog(LOG, "ls_monitor: killing replication slot %s owner %d", slot_name, active_pid);
(void) kill(active_pid, SIGTERM);
/* We shouldn't get stuck, but to be safe add timeout. */
ConditionVariableTimedSleep(&s->active_cv, 1000, WAIT_EVENT_REPLICATION_SLOT_DROP);

View File

@@ -0,0 +1,22 @@
\echo Use "ALTER EXTENSION neon UPDATE TO '1.6'" to load this file. \quit
CREATE FUNCTION get_prewarm_info(out total_chunks integer, out curr_chunk integer, out prewarmed_pages integer, out skipped_pages integer)
RETURNS record
AS 'MODULE_PATHNAME', 'get_prewarm_info'
LANGUAGE C STRICT
PARALLEL SAFE;
CREATE FUNCTION get_local_cache_state(max_chunks integer default null)
RETURNS bytea
AS 'MODULE_PATHNAME', 'get_local_cache_state'
LANGUAGE C
PARALLEL UNSAFE;
CREATE FUNCTION prewarm_local_cache(state bytea)
RETURNS void
AS 'MODULE_PATHNAME', 'prewarm_local_cache'
LANGUAGE C STRICT
PARALLEL UNSAFE;

View File

@@ -0,0 +1,7 @@
DROP FUNCTION IF EXISTS get_prewarm_info(out total_chunks integer, out curr_chunk integer, out prewarmed_pages integer, out skipped_pages integer);
DROP FUNCTION IF EXISTS get_local_cache_state(max_chunks integer);
DROP FUNCTION IF EXISTS prewarm_local_cache(state bytea);

View File

@@ -74,10 +74,6 @@ impl std::fmt::Display for Backend<'_, ()> {
.debug_tuple("ControlPlane::ProxyV1")
.field(&endpoint.url())
.finish(),
ControlPlaneClient::Neon(endpoint) => fmt
.debug_tuple("ControlPlane::Neon")
.field(&endpoint.url())
.finish(),
#[cfg(any(test, feature = "testing"))]
ControlPlaneClient::PostgresMock(endpoint) => fmt
.debug_tuple("ControlPlane::PostgresMock")

View File

@@ -43,9 +43,6 @@ static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
#[derive(Clone, Debug, ValueEnum)]
enum AuthBackendType {
#[value(name("console"), alias("cplane"))]
ControlPlane,
#[value(name("cplane-v1"), alias("control-plane"))]
ControlPlaneV1,
@@ -488,40 +485,7 @@ async fn main() -> anyhow::Result<()> {
}
if let Either::Left(auth::Backend::ControlPlane(api, _)) = &auth_backend {
if let proxy::control_plane::client::ControlPlaneClient::Neon(api) = &**api {
match (redis_notifications_client, regional_redis_client.clone()) {
(None, None) => {}
(client1, client2) => {
let cache = api.caches.project_info.clone();
if let Some(client) = client1 {
maintenance_tasks.spawn(notifications::task_main(
client,
cache.clone(),
cancel_map.clone(),
args.region.clone(),
));
}
if let Some(client) = client2 {
maintenance_tasks.spawn(notifications::task_main(
client,
cache.clone(),
cancel_map.clone(),
args.region.clone(),
));
}
maintenance_tasks.spawn(async move { cache.clone().gc_worker().await });
}
}
if let Some(regional_redis_client) = regional_redis_client {
let cache = api.caches.endpoints_cache.clone();
let con = regional_redis_client;
let span = tracing::info_span!("endpoints_cache");
maintenance_tasks.spawn(
async move { cache.do_read(con, cancellation_token.clone()).await }
.instrument(span),
);
}
} else if let proxy::control_plane::client::ControlPlaneClient::ProxyV1(api) = &**api {
if let proxy::control_plane::client::ControlPlaneClient::ProxyV1(api) = &**api {
match (redis_notifications_client, regional_redis_client.clone()) {
(None, None) => {}
(client1, client2) => {
@@ -757,65 +721,6 @@ fn build_auth_backend(
Ok(Either::Left(config))
}
AuthBackendType::ControlPlane => {
let wake_compute_cache_config: CacheOptions = args.wake_compute_cache.parse()?;
let project_info_cache_config: ProjectInfoCacheOptions =
args.project_info_cache.parse()?;
let endpoint_cache_config: config::EndpointCacheConfig =
args.endpoint_cache_config.parse()?;
info!("Using NodeInfoCache (wake_compute) with options={wake_compute_cache_config:?}");
info!(
"Using AllowedIpsCache (wake_compute) with options={project_info_cache_config:?}"
);
info!("Using EndpointCacheConfig with options={endpoint_cache_config:?}");
let caches = Box::leak(Box::new(control_plane::caches::ApiCaches::new(
wake_compute_cache_config,
project_info_cache_config,
endpoint_cache_config,
)));
let config::ConcurrencyLockOptions {
shards,
limiter,
epoch,
timeout,
} = args.wake_compute_lock.parse()?;
info!(?limiter, shards, ?epoch, "Using NodeLocks (wake_compute)");
let locks = Box::leak(Box::new(control_plane::locks::ApiLocks::new(
"wake_compute_lock",
limiter,
shards,
timeout,
epoch,
&Metrics::get().wake_compute_lock,
)?));
tokio::spawn(locks.garbage_collect_worker());
let url: proxy::url::ApiUrl = args.auth_endpoint.parse()?;
let endpoint = http::Endpoint::new(url, http::new_client());
let mut wake_compute_rps_limit = args.wake_compute_limit.clone();
RateBucketInfo::validate(&mut wake_compute_rps_limit)?;
let wake_compute_endpoint_rate_limiter =
Arc::new(WakeComputeRateLimiter::new(wake_compute_rps_limit));
let api = control_plane::client::neon::NeonControlPlaneClient::new(
endpoint,
args.control_plane_token.clone(),
caches,
locks,
wake_compute_endpoint_rate_limiter,
);
let api = control_plane::client::ControlPlaneClient::Neon(api);
let auth_backend = auth::Backend::ControlPlane(MaybeOwned::Owned(api), ());
let config = Box::leak(Box::new(auth_backend));
Ok(Either::Left(config))
}
#[cfg(feature = "testing")]
AuthBackendType::Postgres => {
let url = args.auth_endpoint.parse()?;

View File

@@ -1,7 +1,6 @@
pub mod cplane_proxy_v1;
#[cfg(any(test, feature = "testing"))]
pub mod mock;
pub mod neon;
use std::hash::Hash;
use std::sync::Arc;
@@ -28,10 +27,8 @@ use crate::types::EndpointId;
#[non_exhaustive]
#[derive(Clone)]
pub enum ControlPlaneClient {
/// New Proxy V1 control plane API
/// Proxy V1 control plane API
ProxyV1(cplane_proxy_v1::NeonControlPlaneClient),
/// Current Management API (V2).
Neon(neon::NeonControlPlaneClient),
/// Local mock control plane.
#[cfg(any(test, feature = "testing"))]
PostgresMock(mock::MockControlPlane),
@@ -49,7 +46,6 @@ impl ControlPlaneApi for ControlPlaneClient {
) -> Result<CachedRoleSecret, errors::GetAuthInfoError> {
match self {
Self::ProxyV1(api) => api.get_role_secret(ctx, user_info).await,
Self::Neon(api) => api.get_role_secret(ctx, user_info).await,
#[cfg(any(test, feature = "testing"))]
Self::PostgresMock(api) => api.get_role_secret(ctx, user_info).await,
#[cfg(test)]
@@ -66,7 +62,6 @@ impl ControlPlaneApi for ControlPlaneClient {
) -> Result<(CachedAllowedIps, Option<CachedRoleSecret>), errors::GetAuthInfoError> {
match self {
Self::ProxyV1(api) => api.get_allowed_ips_and_secret(ctx, user_info).await,
Self::Neon(api) => api.get_allowed_ips_and_secret(ctx, user_info).await,
#[cfg(any(test, feature = "testing"))]
Self::PostgresMock(api) => api.get_allowed_ips_and_secret(ctx, user_info).await,
#[cfg(test)]
@@ -81,7 +76,6 @@ impl ControlPlaneApi for ControlPlaneClient {
) -> Result<Vec<AuthRule>, errors::GetEndpointJwksError> {
match self {
Self::ProxyV1(api) => api.get_endpoint_jwks(ctx, endpoint).await,
Self::Neon(api) => api.get_endpoint_jwks(ctx, endpoint).await,
#[cfg(any(test, feature = "testing"))]
Self::PostgresMock(api) => api.get_endpoint_jwks(ctx, endpoint).await,
#[cfg(test)]
@@ -96,7 +90,6 @@ impl ControlPlaneApi for ControlPlaneClient {
) -> Result<CachedNodeInfo, errors::WakeComputeError> {
match self {
Self::ProxyV1(api) => api.wake_compute(ctx, user_info).await,
Self::Neon(api) => api.wake_compute(ctx, user_info).await,
#[cfg(any(test, feature = "testing"))]
Self::PostgresMock(api) => api.wake_compute(ctx, user_info).await,
#[cfg(test)]

View File

@@ -1,511 +0,0 @@
//! Stale console backend, remove after migrating to Proxy V1 API (#15245).
use std::sync::Arc;
use std::time::Duration;
use ::http::header::AUTHORIZATION;
use ::http::HeaderName;
use futures::TryFutureExt;
use postgres_client::config::SslMode;
use tokio::time::Instant;
use tracing::{debug, info, info_span, warn, Instrument};
use super::super::messages::{ControlPlaneErrorMessage, GetRoleSecret, WakeCompute};
use crate::auth::backend::jwt::AuthRule;
use crate::auth::backend::ComputeUserInfo;
use crate::cache::Cached;
use crate::context::RequestContext;
use crate::control_plane::caches::ApiCaches;
use crate::control_plane::errors::{
ControlPlaneError, GetAuthInfoError, GetEndpointJwksError, WakeComputeError,
};
use crate::control_plane::locks::ApiLocks;
use crate::control_plane::messages::{ColdStartInfo, EndpointJwksResponse, Reason};
use crate::control_plane::{
AuthInfo, AuthSecret, CachedAllowedIps, CachedNodeInfo, CachedRoleSecret, NodeInfo,
};
use crate::metrics::{CacheOutcome, Metrics};
use crate::rate_limiter::WakeComputeRateLimiter;
use crate::types::{EndpointCacheKey, EndpointId};
use crate::{compute, http, scram};
const X_REQUEST_ID: HeaderName = HeaderName::from_static("x-request-id");
#[derive(Clone)]
pub struct NeonControlPlaneClient {
endpoint: http::Endpoint,
pub caches: &'static ApiCaches,
pub(crate) locks: &'static ApiLocks<EndpointCacheKey>,
pub(crate) wake_compute_endpoint_rate_limiter: Arc<WakeComputeRateLimiter>,
// put in a shared ref so we don't copy secrets all over in memory
jwt: Arc<str>,
}
impl NeonControlPlaneClient {
/// Construct an API object containing the auth parameters.
pub fn new(
endpoint: http::Endpoint,
jwt: Arc<str>,
caches: &'static ApiCaches,
locks: &'static ApiLocks<EndpointCacheKey>,
wake_compute_endpoint_rate_limiter: Arc<WakeComputeRateLimiter>,
) -> Self {
Self {
endpoint,
caches,
locks,
wake_compute_endpoint_rate_limiter,
jwt,
}
}
pub(crate) fn url(&self) -> &str {
self.endpoint.url().as_str()
}
async fn do_get_auth_info(
&self,
ctx: &RequestContext,
user_info: &ComputeUserInfo,
) -> Result<AuthInfo, GetAuthInfoError> {
if !self
.caches
.endpoints_cache
.is_valid(ctx, &user_info.endpoint.normalize())
{
// TODO: refactor this because it's weird
// this is a failure to authenticate but we return Ok.
info!("endpoint is not valid, skipping the request");
return Ok(AuthInfo::default());
}
let request_id = ctx.session_id().to_string();
let application_name = ctx.console_application_name();
async {
let request = self
.endpoint
.get_path("proxy_get_role_secret")
.header(X_REQUEST_ID, &request_id)
.header(AUTHORIZATION, format!("Bearer {}", &self.jwt))
.query(&[("session_id", ctx.session_id())])
.query(&[
("application_name", application_name.as_str()),
("project", user_info.endpoint.as_str()),
("role", user_info.user.as_str()),
])
.build()?;
debug!(url = request.url().as_str(), "sending http request");
let start = Instant::now();
let pause = ctx.latency_timer_pause(crate::metrics::Waiting::Cplane);
let response = self.endpoint.execute(request).await?;
drop(pause);
info!(duration = ?start.elapsed(), "received http response");
let body = match parse_body::<GetRoleSecret>(response).await {
Ok(body) => body,
// Error 404 is special: it's ok not to have a secret.
// TODO(anna): retry
Err(e) => {
return if e.get_reason().is_not_found() {
// TODO: refactor this because it's weird
// this is a failure to authenticate but we return Ok.
Ok(AuthInfo::default())
} else {
Err(e.into())
};
}
};
let secret = if body.role_secret.is_empty() {
None
} else {
let secret = scram::ServerSecret::parse(&body.role_secret)
.map(AuthSecret::Scram)
.ok_or(GetAuthInfoError::BadSecret)?;
Some(secret)
};
let allowed_ips = body.allowed_ips.unwrap_or_default();
Metrics::get()
.proxy
.allowed_ips_number
.observe(allowed_ips.len() as f64);
Ok(AuthInfo {
secret,
allowed_ips,
project_id: body.project_id,
})
}
.inspect_err(|e| tracing::debug!(error = ?e))
.instrument(info_span!("do_get_auth_info"))
.await
}
async fn do_get_endpoint_jwks(
&self,
ctx: &RequestContext,
endpoint: EndpointId,
) -> Result<Vec<AuthRule>, GetEndpointJwksError> {
if !self
.caches
.endpoints_cache
.is_valid(ctx, &endpoint.normalize())
{
return Err(GetEndpointJwksError::EndpointNotFound);
}
let request_id = ctx.session_id().to_string();
async {
let request = self
.endpoint
.get_with_url(|url| {
url.path_segments_mut()
.push("endpoints")
.push(endpoint.as_str())
.push("jwks");
})
.header(X_REQUEST_ID, &request_id)
.header(AUTHORIZATION, format!("Bearer {}", &self.jwt))
.query(&[("session_id", ctx.session_id())])
.build()
.map_err(GetEndpointJwksError::RequestBuild)?;
debug!(url = request.url().as_str(), "sending http request");
let start = Instant::now();
let pause = ctx.latency_timer_pause(crate::metrics::Waiting::Cplane);
let response = self
.endpoint
.execute(request)
.await
.map_err(GetEndpointJwksError::RequestExecute)?;
drop(pause);
info!(duration = ?start.elapsed(), "received http response");
let body = parse_body::<EndpointJwksResponse>(response).await?;
let rules = body
.jwks
.into_iter()
.map(|jwks| AuthRule {
id: jwks.id,
jwks_url: jwks.jwks_url,
audience: jwks.jwt_audience,
role_names: jwks.role_names,
})
.collect();
Ok(rules)
}
.inspect_err(|e| tracing::debug!(error = ?e))
.instrument(info_span!("do_get_endpoint_jwks"))
.await
}
async fn do_wake_compute(
&self,
ctx: &RequestContext,
user_info: &ComputeUserInfo,
) -> Result<NodeInfo, WakeComputeError> {
let request_id = ctx.session_id().to_string();
let application_name = ctx.console_application_name();
async {
let mut request_builder = self
.endpoint
.get_path("proxy_wake_compute")
.header("X-Request-ID", &request_id)
.header("Authorization", format!("Bearer {}", &self.jwt))
.query(&[("session_id", ctx.session_id())])
.query(&[
("application_name", application_name.as_str()),
("project", user_info.endpoint.as_str()),
]);
let options = user_info.options.to_deep_object();
if !options.is_empty() {
request_builder = request_builder.query(&options);
}
let request = request_builder.build()?;
debug!(url = request.url().as_str(), "sending http request");
let start = Instant::now();
let pause = ctx.latency_timer_pause(crate::metrics::Waiting::Cplane);
let response = self.endpoint.execute(request).await?;
drop(pause);
info!(duration = ?start.elapsed(), "received http response");
let body = parse_body::<WakeCompute>(response).await?;
// Unfortunately, ownership won't let us use `Option::ok_or` here.
let (host, port) = match parse_host_port(&body.address) {
None => return Err(WakeComputeError::BadComputeAddress(body.address)),
Some(x) => x,
};
// Don't set anything but host and port! This config will be cached.
// We'll set username and such later using the startup message.
// TODO: add more type safety (in progress).
let mut config = compute::ConnCfg::new(host.to_owned(), port);
config.ssl_mode(SslMode::Disable); // TLS is not configured on compute nodes.
let node = NodeInfo {
config,
aux: body.aux,
allow_self_signed_compute: false,
};
Ok(node)
}
.inspect_err(|e| tracing::debug!(error = ?e))
.instrument(info_span!("do_wake_compute"))
.await
}
}
impl super::ControlPlaneApi for NeonControlPlaneClient {
#[tracing::instrument(skip_all)]
async fn get_role_secret(
&self,
ctx: &RequestContext,
user_info: &ComputeUserInfo,
) -> Result<CachedRoleSecret, GetAuthInfoError> {
let normalized_ep = &user_info.endpoint.normalize();
let user = &user_info.user;
if let Some(role_secret) = self
.caches
.project_info
.get_role_secret(normalized_ep, user)
{
return Ok(role_secret);
}
let auth_info = self.do_get_auth_info(ctx, user_info).await?;
if let Some(project_id) = auth_info.project_id {
let normalized_ep_int = normalized_ep.into();
self.caches.project_info.insert_role_secret(
project_id,
normalized_ep_int,
user.into(),
auth_info.secret.clone(),
);
self.caches.project_info.insert_allowed_ips(
project_id,
normalized_ep_int,
Arc::new(auth_info.allowed_ips),
);
ctx.set_project_id(project_id);
}
// When we just got a secret, we don't need to invalidate it.
Ok(Cached::new_uncached(auth_info.secret))
}
async fn get_allowed_ips_and_secret(
&self,
ctx: &RequestContext,
user_info: &ComputeUserInfo,
) -> Result<(CachedAllowedIps, Option<CachedRoleSecret>), GetAuthInfoError> {
let normalized_ep = &user_info.endpoint.normalize();
if let Some(allowed_ips) = self.caches.project_info.get_allowed_ips(normalized_ep) {
Metrics::get()
.proxy
.allowed_ips_cache_misses
.inc(CacheOutcome::Hit);
return Ok((allowed_ips, None));
}
Metrics::get()
.proxy
.allowed_ips_cache_misses
.inc(CacheOutcome::Miss);
let auth_info = self.do_get_auth_info(ctx, user_info).await?;
let allowed_ips = Arc::new(auth_info.allowed_ips);
let user = &user_info.user;
if let Some(project_id) = auth_info.project_id {
let normalized_ep_int = normalized_ep.into();
self.caches.project_info.insert_role_secret(
project_id,
normalized_ep_int,
user.into(),
auth_info.secret.clone(),
);
self.caches.project_info.insert_allowed_ips(
project_id,
normalized_ep_int,
allowed_ips.clone(),
);
ctx.set_project_id(project_id);
}
Ok((
Cached::new_uncached(allowed_ips),
Some(Cached::new_uncached(auth_info.secret)),
))
}
#[tracing::instrument(skip_all)]
async fn get_endpoint_jwks(
&self,
ctx: &RequestContext,
endpoint: EndpointId,
) -> Result<Vec<AuthRule>, GetEndpointJwksError> {
self.do_get_endpoint_jwks(ctx, endpoint).await
}
#[tracing::instrument(skip_all)]
async fn wake_compute(
&self,
ctx: &RequestContext,
user_info: &ComputeUserInfo,
) -> Result<CachedNodeInfo, WakeComputeError> {
let key = user_info.endpoint_cache_key();
macro_rules! check_cache {
() => {
if let Some(cached) = self.caches.node_info.get(&key) {
let (cached, info) = cached.take_value();
let info = info.map_err(|c| {
info!(key = &*key, "found cached wake_compute error");
WakeComputeError::ControlPlane(ControlPlaneError::Message(Box::new(*c)))
})?;
debug!(key = &*key, "found cached compute node info");
ctx.set_project(info.aux.clone());
return Ok(cached.map(|()| info));
}
};
}
// Every time we do a wakeup http request, the compute node will stay up
// for some time (highly depends on the console's scale-to-zero policy);
// The connection info remains the same during that period of time,
// which means that we might cache it to reduce the load and latency.
check_cache!();
let permit = self.locks.get_permit(&key).await?;
// after getting back a permit - it's possible the cache was filled
// double check
if permit.should_check_cache() {
// TODO: if there is something in the cache, mark the permit as success.
check_cache!();
}
// check rate limit
if !self
.wake_compute_endpoint_rate_limiter
.check(user_info.endpoint.normalize_intern(), 1)
{
return Err(WakeComputeError::TooManyConnections);
}
let node = permit.release_result(self.do_wake_compute(ctx, user_info).await);
match node {
Ok(node) => {
ctx.set_project(node.aux.clone());
debug!(key = &*key, "created a cache entry for woken compute node");
let mut stored_node = node.clone();
// store the cached node as 'warm_cached'
stored_node.aux.cold_start_info = ColdStartInfo::WarmCached;
let (_, cached) = self.caches.node_info.insert_unit(key, Ok(stored_node));
Ok(cached.map(|()| node))
}
Err(err) => match err {
WakeComputeError::ControlPlane(ControlPlaneError::Message(err)) => {
let Some(status) = &err.status else {
return Err(WakeComputeError::ControlPlane(ControlPlaneError::Message(
err,
)));
};
let reason = status
.details
.error_info
.map_or(Reason::Unknown, |x| x.reason);
// if we can retry this error, do not cache it.
if reason.can_retry() {
return Err(WakeComputeError::ControlPlane(ControlPlaneError::Message(
err,
)));
}
// at this point, we should only have quota errors.
debug!(
key = &*key,
"created a cache entry for the wake compute error"
);
self.caches.node_info.insert_ttl(
key,
Err(err.clone()),
Duration::from_secs(30),
);
Err(WakeComputeError::ControlPlane(ControlPlaneError::Message(
err,
)))
}
err => return Err(err),
},
}
}
}
/// Parse http response body, taking status code into account.
async fn parse_body<T: for<'a> serde::Deserialize<'a>>(
response: http::Response,
) -> Result<T, ControlPlaneError> {
let status = response.status();
if status.is_success() {
// We shouldn't log raw body because it may contain secrets.
info!("request succeeded, processing the body");
return Ok(response.json().await?);
}
let s = response.bytes().await?;
// Log plaintext to be able to detect, whether there are some cases not covered by the error struct.
info!("response_error plaintext: {:?}", s);
// Don't throw an error here because it's not as important
// as the fact that the request itself has failed.
let mut body = serde_json::from_slice(&s).unwrap_or_else(|e| {
warn!("failed to parse error body: {e}");
ControlPlaneErrorMessage {
error: "reason unclear (malformed error message)".into(),
http_status_code: status,
status: None,
}
});
body.http_status_code = status;
warn!("console responded with an error ({status}): {body:?}");
Err(ControlPlaneError::Message(Box::new(body)))
}
fn parse_host_port(input: &str) -> Option<(&str, u16)> {
let (host, port) = input.rsplit_once(':')?;
let ipv6_brackets: &[_] = &['[', ']'];
Some((host.trim_matches(ipv6_brackets), port.parse().ok()?))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_host_port_v4() {
let (host, port) = parse_host_port("127.0.0.1:5432").expect("failed to parse");
assert_eq!(host, "127.0.0.1");
assert_eq!(port, 5432);
}
#[test]
fn test_parse_host_port_v6() {
let (host, port) = parse_host_port("[2001:db8::1]:5432").expect("failed to parse");
assert_eq!(host, "2001:db8::1");
assert_eq!(port, 5432);
}
#[test]
fn test_parse_host_port_url() {
let (host, port) = parse_host_port("compute-foo-bar-1234.default.svc.cluster.local:5432")
.expect("failed to parse");
assert_eq!(host, "compute-foo-bar-1234.default.svc.cluster.local");
assert_eq!(port, 5432);
}
}

View File

@@ -221,15 +221,6 @@ pub(crate) struct UserFacingMessage {
pub(crate) message: Box<str>,
}
/// Response which holds client's auth secret, e.g. [`crate::scram::ServerSecret`].
/// Returned by the `/proxy_get_role_secret` API method.
#[derive(Deserialize)]
pub(crate) struct GetRoleSecret {
pub(crate) role_secret: Box<str>,
pub(crate) allowed_ips: Option<Vec<IpPattern>>,
pub(crate) project_id: Option<ProjectIdInt>,
}
/// Response which holds client's auth secret, e.g. [`crate::scram::ServerSecret`].
/// Returned by the `/get_endpoint_access_control` API method.
#[derive(Deserialize)]
@@ -240,13 +231,6 @@ pub(crate) struct GetEndpointAccessControl {
pub(crate) allowed_vpc_endpoint_ids: Option<Vec<EndpointIdInt>>,
}
// Manually implement debug to omit sensitive info.
impl fmt::Debug for GetRoleSecret {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("GetRoleSecret").finish_non_exhaustive()
}
}
/// Response which holds compute node's `host:port` pair.
/// Returned by the `/proxy_wake_compute` API method.
#[derive(Debug, Deserialize)]
@@ -477,18 +461,18 @@ mod tests {
let json = json!({
"role_secret": "secret",
});
serde_json::from_str::<GetRoleSecret>(&json.to_string())?;
serde_json::from_str::<GetEndpointAccessControl>(&json.to_string())?;
let json = json!({
"role_secret": "secret",
"allowed_ips": ["8.8.8.8"],
});
serde_json::from_str::<GetRoleSecret>(&json.to_string())?;
serde_json::from_str::<GetEndpointAccessControl>(&json.to_string())?;
let json = json!({
"role_secret": "secret",
"allowed_ips": ["8.8.8.8"],
"project_id": "project",
});
serde_json::from_str::<GetRoleSecret>(&json.to_string())?;
serde_json::from_str::<GetEndpointAccessControl>(&json.to_string())?;
Ok(())
}

View File

@@ -1,11 +1,12 @@
//! Code to deal with safekeeper control file upgrades
use crate::{
safekeeper::{AcceptorState, PgUuid, ServerInfo, Term, TermHistory, TermLsn},
safekeeper::{AcceptorState, PgUuid, TermHistory, TermLsn},
state::{EvictionState, PersistedPeers, TimelinePersistentState},
wal_backup_partial,
};
use anyhow::{bail, Result};
use pq_proto::SystemId;
use safekeeper_api::{ServerInfo, Term};
use serde::{Deserialize, Serialize};
use tracing::*;
use utils::{

View File

@@ -14,6 +14,7 @@ use camino::Utf8PathBuf;
use chrono::{DateTime, Utc};
use postgres_ffi::XLogSegNo;
use postgres_ffi::MAX_SEND_SIZE;
use safekeeper_api::models::WalSenderState;
use serde::Deserialize;
use serde::Serialize;
@@ -25,7 +26,6 @@ use utils::id::{TenantId, TimelineId};
use utils::lsn::Lsn;
use crate::safekeeper::TermHistory;
use crate::send_wal::WalSenderState;
use crate::state::TimelineMemState;
use crate::state::TimelinePersistentState;
use crate::timeline::get_timeline_dir;

View File

@@ -4,6 +4,8 @@
use anyhow::Context;
use pageserver_api::models::ShardParameters;
use pageserver_api::shard::{ShardIdentity, ShardStripeSize};
use safekeeper_api::models::ConnectionId;
use safekeeper_api::Term;
use std::future::Future;
use std::str::{self, FromStr};
use std::sync::Arc;
@@ -16,9 +18,7 @@ use crate::auth::check_permission;
use crate::json_ctrl::{handle_json_ctrl, AppendLogicalMessage};
use crate::metrics::{TrafficMetrics, PG_QUERIES_GAUGE};
use crate::safekeeper::Term;
use crate::timeline::TimelineError;
use crate::wal_service::ConnectionId;
use crate::{GlobalTimelines, SafeKeeperConf};
use postgres_backend::PostgresBackend;
use postgres_backend::QueryError;

View File

@@ -8,6 +8,7 @@
//! etc.
use reqwest::{IntoUrl, Method, StatusCode};
use safekeeper_api::models::TimelineStatus;
use std::error::Error as _;
use utils::{
http::error::HttpErrorBody,
@@ -15,8 +16,6 @@ use utils::{
logging::SecretString,
};
use super::routes::TimelineStatus;
#[derive(Debug, Clone)]
pub struct Client {
mgmt_api_endpoint: String,

View File

@@ -1,5 +1,9 @@
use hyper::{Body, Request, Response, StatusCode};
use serde::{Deserialize, Serialize};
use safekeeper_api::models::AcceptorStateStatus;
use safekeeper_api::models::SafekeeperStatus;
use safekeeper_api::models::TermSwitchApiEntry;
use safekeeper_api::models::TimelineStatus;
use safekeeper_api::ServerInfo;
use std::collections::HashMap;
use std::fmt;
use std::io::Write as _;
@@ -31,26 +35,17 @@ use utils::{
request::{ensure_no_body, parse_request_param},
RequestExt, RouterBuilder,
},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
id::{TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
};
use crate::debug_dump::TimelineDigestRequest;
use crate::receive_wal::WalReceiverState;
use crate::safekeeper::Term;
use crate::safekeeper::{ServerInfo, TermLsn};
use crate::send_wal::WalSenderState;
use crate::timeline::PeerInfo;
use crate::safekeeper::TermLsn;
use crate::timelines_global_map::TimelineDeleteForceResult;
use crate::GlobalTimelines;
use crate::SafeKeeperConf;
use crate::{copy_timeline, debug_dump, patch_control_file, pull_timeline};
#[derive(Debug, Serialize)]
struct SafekeeperStatus {
id: NodeId,
}
/// Healthcheck handler.
async fn status_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
check_permission(&request, None)?;
@@ -73,50 +68,6 @@ fn get_global_timelines(request: &Request<Body>) -> Arc<GlobalTimelines> {
.clone()
}
/// Same as TermLsn, but serializes LSN using display serializer
/// in Postgres format, i.e. 0/FFFFFFFF. Used only for the API response.
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct TermSwitchApiEntry {
pub term: Term,
pub lsn: Lsn,
}
impl From<TermSwitchApiEntry> for TermLsn {
fn from(api_val: TermSwitchApiEntry) -> Self {
TermLsn {
term: api_val.term,
lsn: api_val.lsn,
}
}
}
/// Augment AcceptorState with last_log_term for convenience
#[derive(Debug, Serialize, Deserialize)]
pub struct AcceptorStateStatus {
pub term: Term,
pub epoch: Term, // aka last_log_term
pub term_history: Vec<TermSwitchApiEntry>,
}
/// Info about timeline on safekeeper ready for reporting.
#[derive(Debug, Serialize, Deserialize)]
pub struct TimelineStatus {
pub tenant_id: TenantId,
pub timeline_id: TimelineId,
pub acceptor_state: AcceptorStateStatus,
pub pg_info: ServerInfo,
pub flush_lsn: Lsn,
pub timeline_start_lsn: Lsn,
pub local_start_lsn: Lsn,
pub commit_lsn: Lsn,
pub backup_lsn: Lsn,
pub peer_horizon_lsn: Lsn,
pub remote_consistent_lsn: Lsn,
pub peers: Vec<PeerInfo>,
pub walsenders: Vec<WalSenderState>,
pub walreceivers: Vec<WalReceiverState>,
}
fn check_permission(request: &Request<Body>, tenant_id: Option<TenantId>) -> Result<(), ApiError> {
check_permission_with(request, |claims| {
crate::auth::check_permission(claims, tenant_id)
@@ -187,6 +138,15 @@ async fn timeline_list_handler(request: Request<Body>) -> Result<Response<Body>,
json_response(StatusCode::OK, res)
}
impl From<TermSwitchApiEntry> for TermLsn {
fn from(api_val: TermSwitchApiEntry) -> Self {
TermLsn {
term: api_val.term,
lsn: api_val.lsn,
}
}
}
/// Report info about timeline.
async fn timeline_status_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let ttid = TenantTimelineId::new(

View File

@@ -8,16 +8,17 @@
use anyhow::Context;
use postgres_backend::QueryError;
use safekeeper_api::{ServerInfo, Term};
use serde::{Deserialize, Serialize};
use tokio::io::{AsyncRead, AsyncWrite};
use tracing::*;
use crate::handler::SafekeeperPostgresHandler;
use crate::safekeeper::{AcceptorProposerMessage, AppendResponse, ServerInfo};
use crate::safekeeper::{AcceptorProposerMessage, AppendResponse};
use crate::safekeeper::{
AppendRequest, AppendRequestHeader, ProposerAcceptorMessage, ProposerElected,
};
use crate::safekeeper::{Term, TermHistory, TermLsn};
use crate::safekeeper::{TermHistory, TermLsn};
use crate::state::TimelinePersistentState;
use crate::timeline::WalResidentTimeline;
use postgres_backend::PostgresBackend;

View File

@@ -4,6 +4,7 @@ use camino::Utf8PathBuf;
use chrono::{DateTime, Utc};
use futures::{SinkExt, StreamExt, TryStreamExt};
use postgres_ffi::{XLogFileName, XLogSegNo, PG_TLI};
use safekeeper_api::{models::TimelineStatus, Term};
use serde::{Deserialize, Serialize};
use std::{
cmp::min,
@@ -21,11 +22,7 @@ use tracing::{error, info, instrument};
use crate::{
control_file::CONTROL_FILE_NAME,
debug_dump,
http::{
client::{self, Client},
routes::TimelineStatus,
},
safekeeper::Term,
http::client::{self, Client},
state::{EvictionState, TimelinePersistentState},
timeline::{Timeline, WalResidentTimeline},
timelines_global_map::{create_temp_timeline_dir, validate_temp_timeline},

View File

@@ -9,9 +9,7 @@ use crate::metrics::{
};
use crate::safekeeper::AcceptorProposerMessage;
use crate::safekeeper::ProposerAcceptorMessage;
use crate::safekeeper::ServerInfo;
use crate::timeline::WalResidentTimeline;
use crate::wal_service::ConnectionId;
use crate::GlobalTimelines;
use anyhow::{anyhow, Context};
use bytes::BytesMut;
@@ -23,8 +21,8 @@ use postgres_backend::PostgresBackend;
use postgres_backend::PostgresBackendReader;
use postgres_backend::QueryError;
use pq_proto::BeMessage;
use serde::Deserialize;
use serde::Serialize;
use safekeeper_api::models::{ConnectionId, WalReceiverState, WalReceiverStatus};
use safekeeper_api::ServerInfo;
use std::future;
use std::net::SocketAddr;
use std::sync::Arc;
@@ -171,21 +169,6 @@ impl WalReceiversShared {
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WalReceiverState {
/// None means it is recovery initiated by us (this safekeeper).
pub conn_id: Option<ConnectionId>,
pub status: WalReceiverStatus,
}
/// Walreceiver status. Currently only whether it passed voting stage and
/// started receiving the stream, but it is easy to add more if needed.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum WalReceiverStatus {
Voting,
Streaming,
}
/// Scope guard to access slot in WalReceivers registry and unregister from
/// it in Drop.
pub struct WalReceiverGuard {

View File

@@ -7,6 +7,8 @@ use std::{fmt, pin::pin};
use anyhow::{bail, Context};
use futures::StreamExt;
use postgres_protocol::message::backend::ReplicationMessage;
use safekeeper_api::models::{PeerInfo, TimelineStatus};
use safekeeper_api::Term;
use tokio::sync::mpsc::{channel, Receiver, Sender};
use tokio::time::timeout;
use tokio::{
@@ -24,13 +26,11 @@ use crate::receive_wal::{WalAcceptor, REPLY_QUEUE_SIZE};
use crate::safekeeper::{AppendRequest, AppendRequestHeader};
use crate::timeline::WalResidentTimeline;
use crate::{
http::routes::TimelineStatus,
receive_wal::MSG_QUEUE_SIZE,
safekeeper::{
AcceptorProposerMessage, ProposerAcceptorMessage, ProposerElected, Term, TermHistory,
TermLsn, VoteRequest,
AcceptorProposerMessage, ProposerAcceptorMessage, ProposerElected, TermHistory, TermLsn,
VoteRequest,
},
timeline::PeerInfo,
SafeKeeperConf,
};

View File

@@ -5,6 +5,9 @@ use byteorder::{LittleEndian, ReadBytesExt};
use bytes::{Buf, BufMut, Bytes, BytesMut};
use postgres_ffi::{TimeLineID, MAX_SEND_SIZE};
use safekeeper_api::models::HotStandbyFeedback;
use safekeeper_api::Term;
use safekeeper_api::INVALID_TERM;
use serde::{Deserialize, Serialize};
use std::cmp::max;
use std::cmp::min;
@@ -16,7 +19,6 @@ use tracing::*;
use crate::control_file;
use crate::metrics::MISC_OPERATION_SECONDS;
use crate::send_wal::HotStandbyFeedback;
use crate::state::TimelineState;
use crate::wal_storage;
@@ -31,10 +33,6 @@ use utils::{
const SK_PROTOCOL_VERSION: u32 = 2;
pub const UNKNOWN_SERVER_VERSION: u32 = 0;
/// Consensus logical timestamp.
pub type Term = u64;
pub const INVALID_TERM: Term = 0;
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
pub struct TermLsn {
pub term: Term,
@@ -198,16 +196,6 @@ impl AcceptorState {
}
}
/// Information about Postgres. Safekeeper gets it once and then verifies
/// all further connections from computes match.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ServerInfo {
/// Postgres server version
pub pg_version: u32,
pub system_id: SystemId,
pub wal_seg_size: u32,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct PersistedPeerInfo {
/// LSN up to which safekeeper offloaded WAL to s3.
@@ -1041,6 +1029,7 @@ where
mod tests {
use futures::future::BoxFuture;
use postgres_ffi::{XLogSegNo, WAL_SEGMENT_SIZE};
use safekeeper_api::ServerInfo;
use super::*;
use crate::state::{EvictionState, PersistedPeers, TimelinePersistentState};

View File

@@ -4,11 +4,10 @@
use crate::handler::SafekeeperPostgresHandler;
use crate::metrics::RECEIVED_PS_FEEDBACKS;
use crate::receive_wal::WalReceivers;
use crate::safekeeper::{Term, TermLsn};
use crate::safekeeper::TermLsn;
use crate::send_interpreted_wal::InterpretedWalSender;
use crate::timeline::WalResidentTimeline;
use crate::wal_reader_stream::WalReaderStreamBuilder;
use crate::wal_service::ConnectionId;
use crate::wal_storage::WalReader;
use anyhow::{bail, Context as AnyhowContext};
use bytes::Bytes;
@@ -19,7 +18,11 @@ use postgres_backend::{CopyStreamHandlerEnd, PostgresBackendReader, QueryError};
use postgres_ffi::get_current_timestamp;
use postgres_ffi::{TimestampTz, MAX_SEND_SIZE};
use pq_proto::{BeMessage, WalSndKeepAlive, XLogDataBody};
use serde::{Deserialize, Serialize};
use safekeeper_api::models::{
ConnectionId, HotStandbyFeedback, ReplicationFeedback, StandbyFeedback, StandbyReply,
WalSenderState, INVALID_FULL_TRANSACTION_ID,
};
use safekeeper_api::Term;
use tokio::io::{AsyncRead, AsyncWrite};
use utils::failpoint_support;
use utils::id::TenantTimelineId;
@@ -28,7 +31,6 @@ use utils::postgres_client::PostgresClientProtocol;
use std::cmp::{max, min};
use std::net::SocketAddr;
use std::str;
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::watch::Receiver;
@@ -42,65 +44,6 @@ const STANDBY_STATUS_UPDATE_TAG_BYTE: u8 = b'r';
// neon extension of replication protocol
const NEON_STATUS_UPDATE_TAG_BYTE: u8 = b'z';
type FullTransactionId = u64;
/// Hot standby feedback received from replica
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct HotStandbyFeedback {
pub ts: TimestampTz,
pub xmin: FullTransactionId,
pub catalog_xmin: FullTransactionId,
}
const INVALID_FULL_TRANSACTION_ID: FullTransactionId = 0;
impl HotStandbyFeedback {
pub fn empty() -> HotStandbyFeedback {
HotStandbyFeedback {
ts: 0,
xmin: 0,
catalog_xmin: 0,
}
}
}
/// Standby status update
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct StandbyReply {
pub write_lsn: Lsn, // The location of the last WAL byte + 1 received and written to disk in the standby.
pub flush_lsn: Lsn, // The location of the last WAL byte + 1 flushed to disk in the standby.
pub apply_lsn: Lsn, // The location of the last WAL byte + 1 applied in the standby.
pub reply_ts: TimestampTz, // The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.
pub reply_requested: bool,
}
impl StandbyReply {
fn empty() -> Self {
StandbyReply {
write_lsn: Lsn::INVALID,
flush_lsn: Lsn::INVALID,
apply_lsn: Lsn::INVALID,
reply_ts: 0,
reply_requested: false,
}
}
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct StandbyFeedback {
pub reply: StandbyReply,
pub hs_feedback: HotStandbyFeedback,
}
impl StandbyFeedback {
pub fn empty() -> Self {
StandbyFeedback {
reply: StandbyReply::empty(),
hs_feedback: HotStandbyFeedback::empty(),
}
}
}
/// WalSenders registry. Timeline holds it (wrapped in Arc).
pub struct WalSenders {
mutex: Mutex<WalSendersShared>,
@@ -341,25 +284,6 @@ impl WalSendersShared {
}
}
// Serialized is used only for pretty printing in json.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WalSenderState {
ttid: TenantTimelineId,
addr: SocketAddr,
conn_id: ConnectionId,
// postgres application_name
appname: Option<String>,
feedback: ReplicationFeedback,
}
// Receiver is either pageserver or regular standby, which have different
// feedbacks.
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
enum ReplicationFeedback {
Pageserver(PageserverFeedback),
Standby(StandbyFeedback),
}
// id of the occupied slot in WalSenders to access it (and save in the
// WalSenderGuard). We could give Arc directly to the slot, but there is not
// much sense in that as values aggregation which is performed on each feedback
@@ -888,6 +812,7 @@ impl<IO: AsyncRead + AsyncWrite + Unpin> ReplyReader<IO> {
#[cfg(test)]
mod tests {
use safekeeper_api::models::FullTransactionId;
use utils::id::{TenantId, TimelineId};
use super::*;

View File

@@ -5,7 +5,7 @@ use std::{cmp::max, ops::Deref};
use anyhow::{bail, Result};
use postgres_ffi::WAL_SEGMENT_SIZE;
use safekeeper_api::models::TimelineTermBumpResponse;
use safekeeper_api::{models::TimelineTermBumpResponse, ServerInfo, Term};
use serde::{Deserialize, Serialize};
use utils::{
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
@@ -14,10 +14,7 @@ use utils::{
use crate::{
control_file,
safekeeper::{
AcceptorState, PersistedPeerInfo, PgUuid, ServerInfo, Term, TermHistory,
UNKNOWN_SERVER_VERSION,
},
safekeeper::{AcceptorState, PersistedPeerInfo, PgUuid, TermHistory, UNKNOWN_SERVER_VERSION},
timeline::TimelineError,
wal_backup_partial::{self},
};

View File

@@ -4,8 +4,8 @@
use anyhow::{anyhow, bail, Result};
use camino::{Utf8Path, Utf8PathBuf};
use remote_storage::RemotePath;
use safekeeper_api::models::TimelineTermBumpResponse;
use serde::{Deserialize, Serialize};
use safekeeper_api::models::{PeerInfo, TimelineTermBumpResponse};
use safekeeper_api::Term;
use tokio::fs::{self};
use tokio_util::sync::CancellationToken;
use utils::id::TenantId;
@@ -31,9 +31,7 @@ use storage_broker::proto::TenantTimelineId as ProtoTenantTimelineId;
use crate::control_file;
use crate::rate_limit::RateLimiter;
use crate::receive_wal::WalReceivers;
use crate::safekeeper::{
AcceptorProposerMessage, ProposerAcceptorMessage, SafeKeeper, Term, TermLsn,
};
use crate::safekeeper::{AcceptorProposerMessage, ProposerAcceptorMessage, SafeKeeper, TermLsn};
use crate::send_wal::WalSenders;
use crate::state::{EvictionState, TimelineMemState, TimelinePersistentState, TimelineState};
use crate::timeline_guard::ResidenceGuard;
@@ -47,40 +45,17 @@ use crate::wal_storage::{Storage as wal_storage_iface, WalReader};
use crate::SafeKeeperConf;
use crate::{debug_dump, timeline_manager, wal_storage};
/// Things safekeeper should know about timeline state on peers.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PeerInfo {
pub sk_id: NodeId,
pub term: Term,
/// Term of the last entry.
pub last_log_term: Term,
/// LSN of the last record.
pub flush_lsn: Lsn,
pub commit_lsn: Lsn,
/// Since which LSN safekeeper has WAL.
pub local_start_lsn: Lsn,
/// When info was received. Serde annotations are not very useful but make
/// the code compile -- we don't rely on this field externally.
#[serde(skip)]
#[serde(default = "Instant::now")]
ts: Instant,
pub pg_connstr: String,
pub http_connstr: String,
}
impl PeerInfo {
fn from_sk_info(sk_info: &SafekeeperTimelineInfo, ts: Instant) -> PeerInfo {
PeerInfo {
sk_id: NodeId(sk_info.safekeeper_id),
term: sk_info.term,
last_log_term: sk_info.last_log_term,
flush_lsn: Lsn(sk_info.flush_lsn),
commit_lsn: Lsn(sk_info.commit_lsn),
local_start_lsn: Lsn(sk_info.local_start_lsn),
pg_connstr: sk_info.safekeeper_connstr.clone(),
http_connstr: sk_info.http_connstr.clone(),
ts,
}
fn peer_info_from_sk_info(sk_info: &SafekeeperTimelineInfo, ts: Instant) -> PeerInfo {
PeerInfo {
sk_id: NodeId(sk_info.safekeeper_id),
term: sk_info.term,
last_log_term: sk_info.last_log_term,
flush_lsn: Lsn(sk_info.flush_lsn),
commit_lsn: Lsn(sk_info.commit_lsn),
local_start_lsn: Lsn(sk_info.local_start_lsn),
pg_connstr: sk_info.safekeeper_connstr.clone(),
http_connstr: sk_info.http_connstr.clone(),
ts,
}
}
@@ -697,7 +672,7 @@ impl Timeline {
{
let mut shared_state = self.write_shared_state().await;
shared_state.sk.record_safekeeper_info(&sk_info).await?;
let peer_info = PeerInfo::from_sk_info(&sk_info, Instant::now());
let peer_info = peer_info_from_sk_info(&sk_info, Instant::now());
shared_state.peers_info.upsert(&peer_info);
}
Ok(())

View File

@@ -14,6 +14,7 @@ use std::{
use futures::channel::oneshot;
use postgres_ffi::XLogSegNo;
use safekeeper_api::{models::PeerInfo, Term};
use serde::{Deserialize, Serialize};
use tokio::{
task::{JoinError, JoinHandle},
@@ -32,10 +33,9 @@ use crate::{
rate_limit::{rand_duration, RateLimiter},
recovery::recovery_main,
remove_wal::calc_horizon_lsn,
safekeeper::Term,
send_wal::WalSenders,
state::TimelineState,
timeline::{ManagerTimeline, PeerInfo, ReadGuardSharedState, StateSK, WalResidentTimeline},
timeline::{ManagerTimeline, ReadGuardSharedState, StateSK, WalResidentTimeline},
timeline_guard::{AccessService, GuardId, ResidenceGuard},
timelines_set::{TimelineSetGuard, TimelinesSet},
wal_backup::{self, WalBackupTaskHandle},

View File

@@ -4,7 +4,6 @@
use crate::defaults::DEFAULT_EVICTION_CONCURRENCY;
use crate::rate_limit::RateLimiter;
use crate::safekeeper::ServerInfo;
use crate::state::TimelinePersistentState;
use crate::timeline::{get_tenant_dir, get_timeline_dir, Timeline, TimelineError};
use crate::timelines_set::TimelinesSet;
@@ -13,6 +12,7 @@ use crate::{control_file, wal_storage, SafeKeeperConf};
use anyhow::{bail, Context, Result};
use camino::Utf8PathBuf;
use camino_tempfile::Utf8TempDir;
use safekeeper_api::ServerInfo;
use serde::Serialize;
use std::collections::HashMap;
use std::str::FromStr;

View File

@@ -3,6 +3,7 @@ use anyhow::{Context, Result};
use camino::{Utf8Path, Utf8PathBuf};
use futures::stream::FuturesOrdered;
use futures::StreamExt;
use safekeeper_api::models::PeerInfo;
use tokio::task::JoinHandle;
use tokio_util::sync::CancellationToken;
use utils::backoff;
@@ -30,7 +31,7 @@ use tracing::*;
use utils::{id::TenantTimelineId, lsn::Lsn};
use crate::metrics::{BACKED_UP_SEGMENTS, BACKUP_ERRORS, WAL_BACKUP_TASKS};
use crate::timeline::{PeerInfo, WalResidentTimeline};
use crate::timeline::WalResidentTimeline;
use crate::timeline_manager::{Manager, StateSnapshot};
use crate::{SafeKeeperConf, WAL_BACKUP_RUNTIME};

View File

@@ -22,6 +22,7 @@
use camino::Utf8PathBuf;
use postgres_ffi::{XLogFileName, XLogSegNo, PG_TLI};
use remote_storage::RemotePath;
use safekeeper_api::Term;
use serde::{Deserialize, Serialize};
use tokio_util::sync::CancellationToken;
@@ -31,7 +32,6 @@ use utils::{id::NodeId, lsn::Lsn};
use crate::{
metrics::{MISC_OPERATION_SECONDS, PARTIAL_BACKUP_UPLOADED_BYTES, PARTIAL_BACKUP_UPLOADS},
rate_limit::{rand_duration, RateLimiter},
safekeeper::Term,
timeline::WalResidentTimeline,
timeline_manager::StateSnapshot,
wal_backup::{self},

View File

@@ -4,12 +4,12 @@ use async_stream::try_stream;
use bytes::Bytes;
use futures::Stream;
use postgres_backend::CopyStreamHandlerEnd;
use safekeeper_api::Term;
use std::time::Duration;
use tokio::time::timeout;
use utils::lsn::Lsn;
use crate::{
safekeeper::Term,
send_wal::{EndWatch, WalSenderGuard},
timeline::WalResidentTimeline,
};

View File

@@ -4,6 +4,7 @@
//!
use anyhow::{Context, Result};
use postgres_backend::QueryError;
use safekeeper_api::models::ConnectionId;
use std::sync::Arc;
use std::time::Duration;
use tokio::net::TcpStream;
@@ -114,8 +115,6 @@ async fn handle_socket(
.await
}
/// Unique WAL service connection ids are logged in spans for observability.
pub type ConnectionId = u32;
pub type ConnectionCount = u32;
pub fn issue_connection_id(count: &mut ConnectionCount) -> ConnectionId {

View File

@@ -15,12 +15,13 @@ use desim::{
};
use http::Uri;
use safekeeper::{
safekeeper::{ProposerAcceptorMessage, SafeKeeper, ServerInfo, UNKNOWN_SERVER_VERSION},
safekeeper::{ProposerAcceptorMessage, SafeKeeper, UNKNOWN_SERVER_VERSION},
state::{TimelinePersistentState, TimelineState},
timeline::TimelineError,
wal_storage::Storage,
SafeKeeperConf,
};
use safekeeper_api::ServerInfo;
use tracing::{debug, info_span, warn};
use utils::{
id::{NodeId, TenantId, TenantTimelineId, TimelineId},

View File

@@ -742,6 +742,50 @@ impl Scheduler {
self.schedule_shard::<AttachedShardTag>(&[], &None, &ScheduleContext::default())
}
/// For choosing which AZ to schedule a new shard into, use this. It will return the
/// AZ with the lowest median utilization.
///
/// We use an AZ-wide measure rather than simply selecting the AZ of the least-loaded
/// node, because while tenants start out single sharded, when they grow and undergo
/// shard-split, they will occupy space on many nodes within an AZ.
///
/// We use median rather than total free space or mean utilization, because
/// we wish to avoid preferring AZs that have low-load nodes resulting from
/// recent replacements.
///
/// The practical result is that we will pick an AZ based on its median node, and
/// then actually _schedule_ the new shard onto the lowest-loaded node in that AZ.
pub(crate) fn get_az_for_new_tenant(&self) -> Option<AvailabilityZone> {
if self.nodes.is_empty() {
return None;
}
let mut scores_by_az = HashMap::new();
for (node_id, node) in &self.nodes {
let az_scores = scores_by_az.entry(&node.az).or_insert_with(Vec::new);
let score = match &node.may_schedule {
MaySchedule::Yes(utilization) => utilization.score(),
MaySchedule::No => PageserverUtilization::full().score(),
};
az_scores.push((node_id, node, score));
}
// Sort by utilization. Also include the node ID to break ties.
for scores in scores_by_az.values_mut() {
scores.sort_by_key(|i| (i.2, i.0));
}
let mut median_by_az = scores_by_az
.iter()
.map(|(az, nodes)| (*az, nodes.get(nodes.len() / 2).unwrap().2))
.collect::<Vec<_>>();
// Sort by utilization. Also include the AZ to break ties.
median_by_az.sort_by_key(|i| (i.1, i.0));
// Return the AZ with the lowest median utilization
Some(median_by_az.first().unwrap().0.clone())
}
/// Unit test access to internal state
#[cfg(test)]
pub(crate) fn get_node_shard_count(&self, node_id: NodeId) -> usize {
@@ -1087,4 +1131,53 @@ mod tests {
intent.clear(&mut scheduler);
}
}
#[test]
fn az_scheduling_for_new_tenant() {
let az_a_tag = AvailabilityZone("az-a".to_string());
let az_b_tag = AvailabilityZone("az-b".to_string());
let nodes = test_utils::make_test_nodes(
6,
&[
az_a_tag.clone(),
az_a_tag.clone(),
az_a_tag.clone(),
az_b_tag.clone(),
az_b_tag.clone(),
az_b_tag.clone(),
],
);
let mut scheduler = Scheduler::new(nodes.values());
/// Force the utilization of a node in Scheduler's state to a particular
/// number of bytes used.
fn set_utilization(scheduler: &mut Scheduler, node_id: NodeId, shard_count: u32) {
let mut node = Node::new(
node_id,
"".to_string(),
0,
"".to_string(),
0,
scheduler.nodes.get(&node_id).unwrap().az.clone(),
);
node.set_availability(NodeAvailability::Active(test_utilization::simple(
shard_count,
0,
)));
scheduler.node_upsert(&node);
}
// Initial empty state. Scores are tied, scheduler prefers lower AZ ID.
assert_eq!(scheduler.get_az_for_new_tenant(), Some(az_a_tag.clone()));
// Put some utilization on one node in AZ A: this should change nothing, as the median hasn't changed
set_utilization(&mut scheduler, NodeId(1), 1000000);
assert_eq!(scheduler.get_az_for_new_tenant(), Some(az_a_tag.clone()));
// Put some utilization on a second node in AZ A: now the median has changed, so the scheduler
// should prefer the other AZ.
set_utilization(&mut scheduler, NodeId(2), 1000000);
assert_eq!(scheduler.get_az_for_new_tenant(), Some(az_b_tag.clone()));
}
}

View File

@@ -1582,6 +1582,7 @@ impl Service {
attach_req.tenant_shard_id,
ShardIdentity::unsharded(),
PlacementPolicy::Attached(0),
None,
),
);
tracing::info!("Inserted shard {} in memory", attach_req.tenant_shard_id);
@@ -2109,6 +2110,16 @@ impl Service {
)
};
let preferred_az_id = {
let locked = self.inner.read().unwrap();
// Idempotency: take the existing value if the tenant already exists
if let Some(shard) = locked.tenants.get(create_ids.first().unwrap()) {
shard.preferred_az().cloned()
} else {
locked.scheduler.get_az_for_new_tenant()
}
};
// Ordering: we persist tenant shards before creating them on the pageserver. This enables a caller
// to clean up after themselves by issuing a tenant deletion if something goes wrong and we restart
// during the creation, rather than risking leaving orphan objects in S3.
@@ -2128,7 +2139,7 @@ impl Service {
splitting: SplitState::default(),
scheduling_policy: serde_json::to_string(&ShardSchedulingPolicy::default())
.unwrap(),
preferred_az_id: None,
preferred_az_id: preferred_az_id.as_ref().map(|az| az.to_string()),
})
.collect();
@@ -2164,6 +2175,7 @@ impl Service {
&create_req.shard_parameters,
create_req.config.clone(),
placement_policy.clone(),
preferred_az_id.as_ref(),
&mut schedule_context,
)
.await;
@@ -2177,44 +2189,6 @@ impl Service {
}
}
let preferred_azs = {
let locked = self.inner.read().unwrap();
response_shards
.iter()
.filter_map(|resp| {
let az_id = locked
.nodes
.get(&resp.node_id)
.map(|n| n.get_availability_zone_id().clone())?;
Some((resp.shard_id, az_id))
})
.collect::<Vec<_>>()
};
// Note that we persist the preferred AZ for the new shards separately.
// In theory, we could "peek" the scheduler to determine where the shard will
// land, but the subsequent "real" call into the scheduler might select a different
// node. Hence, we do this awkward update to keep things consistent.
let updated = self
.persistence
.set_tenant_shard_preferred_azs(preferred_azs)
.await
.map_err(|err| {
ApiError::InternalServerError(anyhow::anyhow!(
"Failed to persist preferred az ids: {err}"
))
})?;
{
let mut locked = self.inner.write().unwrap();
for (tid, az_id) in updated {
if let Some(shard) = locked.tenants.get_mut(&tid) {
shard.set_preferred_az(az_id);
}
}
}
// If we failed to schedule shards, then they are still created in the controller,
// but we return an error to the requester to avoid a silent failure when someone
// tries to e.g. create a tenant whose placement policy requires more nodes than
@@ -2245,6 +2219,7 @@ impl Service {
/// Helper for tenant creation that does the scheduling for an individual shard. Covers both the
/// case of a new tenant and a pre-existing one.
#[allow(clippy::too_many_arguments)]
async fn do_initial_shard_scheduling(
&self,
tenant_shard_id: TenantShardId,
@@ -2252,6 +2227,7 @@ impl Service {
shard_params: &ShardParameters,
config: TenantConfig,
placement_policy: PlacementPolicy,
preferred_az_id: Option<&AvailabilityZone>,
schedule_context: &mut ScheduleContext,
) -> InitialShardScheduleOutcome {
let mut locked = self.inner.write().unwrap();
@@ -2262,10 +2238,6 @@ impl Service {
Entry::Occupied(mut entry) => {
tracing::info!("Tenant shard {tenant_shard_id} already exists while creating");
// TODO: schedule() should take an anti-affinity expression that pushes
// attached and secondary locations (independently) away frorm those
// pageservers also holding a shard for this tenant.
if let Err(err) = entry.get_mut().schedule(scheduler, schedule_context) {
return InitialShardScheduleOutcome::ShardScheduleError(err);
}
@@ -2289,6 +2261,7 @@ impl Service {
tenant_shard_id,
ShardIdentity::from_params(tenant_shard_id.shard_number, shard_params),
placement_policy,
preferred_az_id.cloned(),
));
state.generation = initial_generation;
@@ -4256,7 +4229,8 @@ impl Service {
},
);
let mut child_state = TenantShard::new(child, child_shard, policy.clone());
let mut child_state =
TenantShard::new(child, child_shard, policy.clone(), preferred_az.clone());
child_state.intent = IntentState::single(scheduler, Some(pageserver));
child_state.observed = ObservedState {
locations: child_observed,

View File

@@ -472,6 +472,7 @@ impl TenantShard {
tenant_shard_id: TenantShardId,
shard: ShardIdentity,
policy: PlacementPolicy,
preferred_az_id: Option<AvailabilityZone>,
) -> Self {
metrics::METRICS_REGISTRY
.metrics_group
@@ -495,7 +496,7 @@ impl TenantShard {
last_error: Arc::default(),
pending_compute_notification: false,
scheduling_policy: ShardSchedulingPolicy::default(),
preferred_az_id: None,
preferred_az_id,
}
}
@@ -1571,6 +1572,7 @@ pub(crate) mod tests {
)
.unwrap(),
policy,
None,
)
}
@@ -1597,7 +1599,7 @@ pub(crate) mod tests {
shard_number,
shard_count,
};
let mut ts = TenantShard::new(
TenantShard::new(
tenant_shard_id,
ShardIdentity::new(
shard_number,
@@ -1606,13 +1608,8 @@ pub(crate) mod tests {
)
.unwrap(),
policy.clone(),
);
if let Some(az) = &preferred_az {
ts.set_preferred_az(az.clone());
}
ts
preferred_az.clone(),
)
})
.collect()
}

View File

@@ -435,7 +435,10 @@ class NeonEnvBuilder:
self.pageserver_virtual_file_io_mode = pageserver_virtual_file_io_mode
self.pageserver_wal_receiver_protocol = pageserver_wal_receiver_protocol
if pageserver_wal_receiver_protocol is not None:
self.pageserver_wal_receiver_protocol = pageserver_wal_receiver_protocol
else:
self.pageserver_wal_receiver_protocol = PageserverWalReceiverProtocol.INTERPRETED
assert test_name.startswith(
"test_"
@@ -3219,7 +3222,7 @@ class NeonProxy(PgProtocol):
*["--allow-self-signed-compute", "true"],
]
class Console(AuthBackend):
class ProxyV1(AuthBackend):
def __init__(self, endpoint: str, fixed_rate_limit: int | None = None):
self.endpoint = endpoint
self.fixed_rate_limit = fixed_rate_limit
@@ -3227,7 +3230,7 @@ class NeonProxy(PgProtocol):
def extra_args(self) -> list[str]:
args = [
# Console auth backend params
*["--auth-backend", "console"],
*["--auth-backend", "cplane-v1"],
*["--auth-endpoint", self.endpoint],
*["--sql-over-http-pool-opt-in", "false"],
]
@@ -3475,13 +3478,13 @@ class NeonProxy(PgProtocol):
class NeonAuthBroker:
class ControlPlane:
class ProxyV1:
def __init__(self, endpoint: str):
self.endpoint = endpoint
def extra_args(self) -> list[str]:
args = [
*["--auth-backend", "console"],
*["--auth-backend", "cplane-v1"],
*["--auth-endpoint", self.endpoint],
]
return args
@@ -3493,7 +3496,7 @@ class NeonAuthBroker:
http_port: int,
mgmt_port: int,
external_http_port: int,
auth_backend: NeonAuthBroker.ControlPlane,
auth_backend: NeonAuthBroker.ProxyV1,
):
self.domain = "apiauth.localtest.me" # resolves to 127.0.0.1
self.host = "127.0.0.1"
@@ -3679,7 +3682,7 @@ def static_auth_broker(
local_proxy_addr = f"{http2_echoserver.host}:{http2_echoserver.port}"
# return local_proxy addr on ProxyWakeCompute.
httpserver.expect_request("/cplane/proxy_wake_compute").respond_with_json(
httpserver.expect_request("/cplane/wake_compute").respond_with_json(
{
"address": local_proxy_addr,
"aux": {
@@ -3719,7 +3722,7 @@ def static_auth_broker(
http_port=http_port,
mgmt_port=mgmt_port,
external_http_port=external_http_port,
auth_backend=NeonAuthBroker.ControlPlane(httpserver.url_for("/cplane")),
auth_backend=NeonAuthBroker.ProxyV1(httpserver.url_for("/cplane")),
) as proxy:
proxy.start()
yield proxy

View File

@@ -70,6 +70,9 @@ class MockS3Server:
def secret_key(self) -> str:
return "test"
def session_token(self) -> str:
return "test"
def kill(self):
self.server.stop()
@@ -161,6 +164,7 @@ class S3Storage:
bucket_region: str
access_key: str | None
secret_key: str | None
session_token: str | None
aws_profile: str | None
prefix_in_bucket: str
client: S3Client
@@ -181,13 +185,18 @@ class S3Storage:
if home is not None:
env["HOME"] = home
return env
if self.access_key is not None and self.secret_key is not None:
if (
self.access_key is not None
and self.secret_key is not None
and self.session_token is not None
):
return {
"AWS_ACCESS_KEY_ID": self.access_key,
"AWS_SECRET_ACCESS_KEY": self.secret_key,
"AWS_SESSION_TOKEN": self.session_token,
}
raise RuntimeError(
"Either AWS_PROFILE or (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) have to be set for S3Storage"
"Either AWS_PROFILE or (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN) have to be set for S3Storage"
)
def to_string(self) -> str:
@@ -352,6 +361,7 @@ class RemoteStorageKind(StrEnum):
mock_region = mock_s3_server.region()
access_key, secret_key = mock_s3_server.access_key(), mock_s3_server.secret_key()
session_token = mock_s3_server.session_token()
client = boto3.client(
"s3",
@@ -359,6 +369,7 @@ class RemoteStorageKind(StrEnum):
region_name=mock_region,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=session_token,
)
bucket_name = to_bucket_name(user, test_name)
@@ -372,6 +383,7 @@ class RemoteStorageKind(StrEnum):
bucket_region=mock_region,
access_key=access_key,
secret_key=secret_key,
session_token=session_token,
aws_profile=None,
prefix_in_bucket="",
client=client,
@@ -383,9 +395,10 @@ class RemoteStorageKind(StrEnum):
env_access_key = os.getenv("AWS_ACCESS_KEY_ID")
env_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
env_access_token = os.getenv("AWS_SESSION_TOKEN")
env_profile = os.getenv("AWS_PROFILE")
assert (
env_access_key and env_secret_key
env_access_key and env_secret_key and env_access_token
) or env_profile, "need to specify either access key and secret access key or profile"
bucket_name = bucket_name or os.getenv("REMOTE_STORAGE_S3_BUCKET")
@@ -398,6 +411,9 @@ class RemoteStorageKind(StrEnum):
client = boto3.client(
"s3",
region_name=bucket_region,
aws_access_key_id=env_access_key,
aws_secret_access_key=env_secret_key,
aws_session_token=env_access_token,
)
return S3Storage(
@@ -405,6 +421,7 @@ class RemoteStorageKind(StrEnum):
bucket_region=bucket_region,
access_key=env_access_key,
secret_key=env_secret_key,
session_token=env_access_token,
aws_profile=env_profile,
prefix_in_bucket=prefix_in_bucket,
client=client,

View File

@@ -153,6 +153,7 @@ def test_pageserver_gc_compaction_smoke(neon_env_builder: NeonEnvBuilder):
if i % 10 == 0:
log.info(f"Running churn round {i}/{churn_rounds} ...")
if (i - 1) % 10 == 0:
# Run gc-compaction every 10 rounds to ensure the test doesn't take too long time.
ps_http.timeline_compact(
tenant_id,
@@ -161,10 +162,11 @@ def test_pageserver_gc_compaction_smoke(neon_env_builder: NeonEnvBuilder):
body={
"scheduled": True,
"sub_compaction": True,
"compact_range": {
"compact_key_range": {
"start": "000000000000000000000000000000000000",
"end": "030000000000000000000000000000000000",
},
"sub_compaction_max_job_size_mb": 16,
},
)

View File

@@ -60,14 +60,12 @@ def ddl_forward_handler(
if request.json is None:
log.info("Received invalid JSON")
return Response(status=400)
json = request.json
json: dict[str, list[str]] = request.json
# Handle roles first
if "roles" in json:
for operation in json["roles"]:
handle_role(dbs, roles, operation)
if "dbs" in json:
for operation in json["dbs"]:
handle_db(dbs, roles, operation)
for operation in json.get("roles", []):
handle_role(dbs, roles, operation)
for operation in json.get("dbs", []):
handle_db(dbs, roles, operation)
return Response(status=200)
@@ -207,6 +205,23 @@ def test_ddl_forwarding(ddl: DdlForwardingContext):
ddl.wait()
assert ddl.roles == {}
cur.execute("CREATE ROLE bork WITH PASSWORD 'newyork'")
cur.execute("BEGIN")
cur.execute("SAVEPOINT point")
cur.execute("DROP ROLE bork")
cur.execute("COMMIT")
ddl.wait()
assert ddl.roles == {}
cur.execute("CREATE ROLE bork WITH PASSWORD 'oldyork'")
cur.execute("BEGIN")
cur.execute("SAVEPOINT point")
cur.execute("ALTER ROLE bork PASSWORD NULL")
cur.execute("COMMIT")
cur.execute("DROP ROLE bork")
ddl.wait()
assert ddl.roles == {}
cur.execute("CREATE ROLE bork WITH PASSWORD 'dork'")
cur.execute("CREATE DATABASE stork WITH OWNER=bork")
cur.execute("ALTER ROLE bork RENAME TO cork")

View File

@@ -0,0 +1,130 @@
import random
import threading
import time
import pytest
from fixtures.log_helper import log
from fixtures.neon_fixtures import NeonEnv
from fixtures.utils import USE_LFC
@pytest.mark.skipif(not USE_LFC, reason="LFC is disabled, skipping")
def test_lfc_prewarm(neon_simple_env: NeonEnv):
env = neon_simple_env
n_records = 1000000
endpoint = env.endpoints.create_start(
branch_name="main",
config_lines=[
"autovacuum = off",
"shared_buffers=1MB",
"neon.max_file_cache_size=1GB",
"neon.file_cache_size_limit=1GB",
"neon.file_cache_prewarm_limit=1000",
],
)
conn = endpoint.connect()
cur = conn.cursor()
cur.execute("create extension neon version '1.6'")
cur.execute("create table t(pk integer primary key, payload text default repeat('?', 128))")
cur.execute(f"insert into t (pk) values (generate_series(1,{n_records}))")
cur.execute("select get_local_cache_state()")
lfc_state = cur.fetchall()[0][0]
endpoint.stop()
endpoint.start()
conn = endpoint.connect()
cur = conn.cursor()
time.sleep(1) # wait until compute_ctl complete downgrade of extension to default version
cur.execute("alter extension neon update to '1.6'")
cur.execute("select prewarm_local_cache(%s)", (lfc_state,))
cur.execute("select lfc_value from neon_lfc_stats where lfc_key='file_cache_used_pages'")
lfc_used_pages = cur.fetchall()[0][0]
log.info(f"Used LFC size: {lfc_used_pages}")
cur.execute("select * from get_prewarm_info()")
prewarm_info = cur.fetchall()[0]
log.info(f"Prewarm info: {prewarm_info}")
log.info(f"Prewarm progress: {prewarm_info[1]*100//prewarm_info[0]}%")
assert lfc_used_pages > 10000
assert prewarm_info[0] > 0 and prewarm_info[0] == prewarm_info[1]
cur.execute("select sum(pk) from t")
assert cur.fetchall()[0][0] == n_records * (n_records + 1) / 2
assert prewarm_info[1] > 0
@pytest.mark.skipif(not USE_LFC, reason="LFC is disabled, skipping")
def test_lfc_prewarm_under_workload(neon_simple_env: NeonEnv):
env = neon_simple_env
n_records = 1000000
n_threads = 4
endpoint = env.endpoints.create_start(
branch_name="main",
config_lines=[
"shared_buffers=1MB",
"neon.max_file_cache_size=1GB",
"neon.file_cache_size_limit=1GB",
"neon.file_cache_prewarm_limit=1000000",
],
)
conn = endpoint.connect()
cur = conn.cursor()
cur.execute("create extension neon version '1.6'")
cur.execute(
"create table accounts(id integer primary key, balance bigint default 0, payload text default repeat('?', 128))"
)
cur.execute(f"insert into accounts(id) values (generate_series(1,{n_records}))")
cur.execute("select get_local_cache_state()")
lfc_state = cur.fetchall()[0][0]
running = True
def workload():
conn = endpoint.connect()
cur = conn.cursor()
n_transfers = 0
while running:
src = random.randint(1, n_records)
dst = random.randint(1, n_records)
cur.execute("update accounts set balance=balance-100 where id=%s", (src,))
cur.execute("update accounts set balance=balance+100 where id=%s", (dst,))
n_transfers += 1
log.info(f"Number of transfers: {n_transfers}")
def prewarm():
conn = endpoint.connect()
cur = conn.cursor()
n_prewarms = 0
while running:
cur.execute("alter system set neon.file_cache_size_limit='1MB'")
cur.execute("select pg_reload_conf()")
cur.execute("alter system set neon.file_cache_size_limit='1GB'")
cur.execute("select pg_reload_conf()")
cur.execute("select prewarm_local_cache(%s)", (lfc_state,))
n_prewarms += 1
log.info(f"Number of prewarms: {n_prewarms}")
workload_threads = []
for _ in range(n_threads):
t = threading.Thread(target=workload)
workload_threads.append(t)
t.start()
prewarm_thread = threading.Thread(target=prewarm)
prewarm_thread.start()
time.sleep(100)
running = False
for t in workload_threads:
t.join()
prewarm_thread.join()
cur.execute("select sum(balance) from accounts")
total_balance = cur.fetchall()[0][0]
assert total_balance == 0

View File

@@ -573,17 +573,18 @@ def test_subscriber_synchronous_commit(neon_simple_env: NeonEnv, vanilla_pg: Van
vanilla_pg.safe_psql("create extension neon;")
env.create_branch("subscriber")
# We want all data to fit into shared_buffers because later we stop
# safekeeper and insert more; this shouldn't cause page requests as they
# will be stuck.
# We want all data to fit into shared_buffers or LFC cache because later we
# stop safekeeper and insert more; this shouldn't cause page requests as
# they will be stuck.
if USE_LFC:
config_lines = ["neon.max_file_cache_size = 32MB", "neon.file_cache_size_limit = 32MB"]
else:
config_lines = [
"shared_buffers = 32MB",
]
sub = env.endpoints.create(
"subscriber",
config_lines=[
"neon.max_file_cache_size = 32MB",
"neon.file_cache_size_limit = 32MB",
]
if USE_LFC
else [],
config_lines=config_lines,
)
sub.start()

View File

@@ -7,7 +7,6 @@ from fixtures.neon_fixtures import NeonEnvBuilder
@pytest.mark.parametrize("shard_count", [None, 4])
@pytest.mark.timeout(600)
def test_prefetch(neon_env_builder: NeonEnvBuilder, shard_count: int | None):
if shard_count is not None:
neon_env_builder.num_pageservers = shard_count

View File

@@ -435,6 +435,14 @@ def test_timeline_archival_chaos(neon_env_builder: NeonEnvBuilder):
]
)
env.storage_scrubber.allowed_errors.extend(
[
# Unclcean shutdowns of pageserver can legitimately result in orphan layers
# (https://github.com/neondatabase/neon/issues/9988#issuecomment-2520558211)
f".*Orphan layer detected: tenants/{tenant_id}/.*"
]
)
class TimelineState:
def __init__(self):
self.timeline_id = TimelineId.generate()

View File

@@ -1,18 +1,18 @@
{
"v17": [
"17.2",
"01fa3c48664ca030cfb69bb4a350aa9df4691d88"
"010c0ea2eb06afe76485a33c43954cbcf3d99f86"
],
"v16": [
"16.6",
"81428621f7c04aed03671cf80a928e0a36d92505"
"97f9fde349c6de6d573f5ce96db07eca60ce6185"
],
"v15": [
"15.10",
"8736b10c1d93d11b9c0489872dd529c4c0f5338f"
"f262d631ad477a1819e84a183e5a7ef561830085"
],
"v14": [
"14.15",
"13ff324150fceaac72920e01742addc053db9462"
"c2f65b3201591e02ce45b66731392f98d3388e73"
]
}