Compare commits

..

25 Commits

Author SHA1 Message Date
Konstantin Knizhnik
5ecc4b249f Bump Postgres version 2024-11-25 17:06:01 +02:00
Konstantin Knizhnik
48025c988f Bump Postgres version 2024-11-25 10:12:06 +02:00
Konstantin Knizhnik
424ba47c58 Bupm postgres version 2024-11-24 21:52:06 +02:00
Konstantin Knizhnik
c424fa60ca Bupm postgres version 2024-11-24 21:31:51 +02:00
Konstantin Knizhnik
74d5129a0d Fix seqscan prefetch in pg17 2024-11-24 13:52:48 +02:00
Christian Schwarz
450be26bbb fast imports: initial Importer and Storage changes (#9218)
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Stas Kelvic <stas@neon.tech>

# Context

This PR contains PoC-level changes for a product feature that allows
onboarding large databases into Neon without going through the regular
data path.

# Changes

This internal RFC provides all the context
* https://github.com/neondatabase/cloud/pull/19799

In the language of the RFC, this PR covers

* the Importer code (`fast_import`) 
* all the Pageserver changes (mgmt API changes, flow implementation,
etc)
* a basic test for the Pageserver changes

# Reviewing

As acknowledged in the RFC, the code added in this PR is not ready for
general availability.
Also, the **architecture is not to be discussed in this PR**, but in the
RFC and associated Slack channel instead.

Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code
is being reviewed:

* Importer code (`fast_import`): practically anything goes
* Core flow (`flow.rs`):
* Malicious input data must be expected and the existing threat models
apply.
* The code must not be safe to execute on *dedicated* Pageserver
instances:
* This means in particular that tenants *on other* Pageserver instances
must not be affected negatively wrt data confidentiality, integrity or
availability.
* Other code: the usual quality bar
* Pay special attention to correct use of gate guards, timeline
cancellation in all places during shutdown & migration, etc.
* Consider the broader system impact; if you find potentially
problematic interactions with Storage features that were not covered in
the RFC, bring that up during the review.

I recommend submitting three separate reviews, for the three high-level
areas with different quality bars.


# References

(Internal-only)

* refs https://github.com/neondatabase/cloud/issues/17507
* refs https://github.com/neondatabase/company_projects/issues/293
* refs https://github.com/neondatabase/company_projects/issues/309
* refs https://github.com/neondatabase/cloud/issues/20646

---------

Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2024-11-22 22:47:06 +00:00
Anastasia Lubennikova
3245f7b88d Rename 'installed_extensions' metric to 'compute_installed_extensions' (#9759)
to keep it consistent with existing compute metrics.

flux-fleet change is not needed, because it doesn't have any filter by
metric name for compute metrics.
2024-11-22 19:27:04 +00:00
Alex Chi Z.
c1937d073f fix(pageserver): ensure upload happens after delete (#9844)
## Problem

Follow up of https://github.com/neondatabase/neon/pull/9682, that patch
didn't fully address the problem: what if shutdown fails due to whatever
reason and then we reattach the tenant? Then we will still remove the
future layer. The underlying problem is that the fix for #5878 gets
voided because of the generation optimizations.

Of course, we also need to ensure that delete happens after uploads, but
note that we only schedule deletes when there are no ongoing upload
tasks, so that's fine.

## Summary of changes

* Add a test case to reproduce the behavior (by changing the original
test case to attach the same generation).
* If layer upload happens after the deletion, drain the deletion queue
before uploading.
* If blocked_deletion is enabled, directly remove it from the
blocked_deletion queue.
* Local fs backend fix to avoid race between deletion and preload.
* test_emergency_mode does not need to wait for uploads (and it's
generally not possible to wait for uploads).
* ~~Optimize deletion executor to skip validation if there are no files
to delete.~~ this doesn't work

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 18:30:53 +00:00
Alex Chi Z.
6f8b1eb5a6 test(pageserver): add detach ancestor smoke test (#9842)
## Problem

Follow up to https://github.com/neondatabase/neon/pull/9682, hopefully
we can detect some issues or assure ourselves that this is ready for
production.

## Summary of changes

* Add a compaction-detach-ancestor smoke test.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 18:21:51 +00:00
Erik Grinaker
e939d36dd4 safekeeper,pageserver: fix CPU profiling allowlists (#9856)
## Problem

The HTTP router allowlists matched both on the path and the query
string. This meant that only `/profile/cpu` would be allowed without
auth, while `/profile/cpu?format=svg` would require auth.

Follows #9764.

## Summary of changes

* Match allowlists on URI path, rather than the entire URI.
* Fix the allowlist for Safekeeper to use `/profile/cpu` rather than the
old `/pprof/profile`.
* Just use a constant slice for the allowlist; it's only a handful of
items, and these handlers are not on hot paths.
2024-11-22 17:50:33 +00:00
Alex Chi Z.
211e4174d2 fix(pageserver): preempt and retry azure list operation (#9840)
## Problem

close https://github.com/neondatabase/neon/issues/9836

Looking at Azure SDK, the only related issue I can find is
https://github.com/azure/azure-sdk-for-rust/issues/1549. Azure uses
reqwest as the backend, so I assume there's some underlying magic
unknown to us that might have caused the stuck in #9836.

The observation is:
* We didn't get an explicit out of resource HTTP error from Azure.
* The connection simply gets stuck and times out.
* But when we retry after we reach the timeout, it succeeds.

This issue is hard to identify -- maybe something went wrong at the ABS
side, or something wrong with our side. But we know that a retry will
usually succeed if we give up the stuck connection.

Therefore, I propose the fix that we preempt stuck HTTP operation and
actively retry. This would mitigate the problem, while in the long run,
we need to keep an eye on ABS usage and see if we can fully resolve this
problem.

The reasoning of such timeout mechanism: we use a much smaller timeout
than before to preempt, while it is possible that a normal listing
operation would take a longer time than the initial timeout if it
contains a lot of keys. Therefore, after we terminate the connection, we
should double the timeout, so that such requests would eventually
succeed.

## Summary of changes

* Use exponential growth for ABS list timeout.
* Rather than using a fixed timeout, use a timeout that starts small and
grows
* Rather than exposing timeouts to the list_streaming caller as soon as
we see them, only do so after we have retried a few times

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 17:50:00 +00:00
Ivan Efremov
3b1ac8b14a proxy: Implement cancellation rate limiting (#9739)
Implement cancellation rate limiting and ip allowlist checks. Add
ip_allowlist to the cancel closure

Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)
2024-11-22 16:46:38 +00:00
Alexander Bayandin
b3b579b45e test_bulk_insert: fix typing for PgVersion (#9854)
## Problem

Along with the migration to Python 3.11, I switched `C(str, Enum)` with
`C(StrEnum)`; one such example is the `PgVersion` enum.
It required more changes in `PgVersion` itself (before, it accepted both
`str` and `int`, and after it, it supports only `str`), which caused the
`test_bulk_insert` test to fail.

## Summary of changes
- `test_bulk_insert`: explicitly cast pg_version from `timeline_detail`
to str
2024-11-22 16:13:53 +00:00
Conrad Ludgate
8ab96cc71f chore(proxy/jwks): reduce the rightward drift of jwks renewal (#9853)
I found the rightward drift of the `renew_jwks` function hard to review.

This PR splits out some major logic and uses early returns to make the
happy path more linear.
2024-11-22 14:51:32 +00:00
Alexander Bayandin
51d26a261b build(deps): bump mypy from 1.3.0 to 1.13.0 (#9670)
## Problem
We use a pretty old version of `mypy` 1.3 (released 1.5 years ago), it
produces false positives for `typing.Self`.

## Summary of changes
- Bump `mypy` from 1.3 to 1.13
- Fix new warnings and errors
- Use `typing.Self` whenever we `return self`
2024-11-22 14:31:36 +00:00
Tristan Partin
c10b7f7de9 Write a newline after adding dynamic_shared_memory_type to PG conf (#9843)
Without adding a newline, we can end up with a conf line that looks like
the following:

dynamic_shared_memory_type = mmap# Managed by compute_ctl: begin

This leads to Postgres logging:

LOG: configuration file
"/var/db/postgres/compute/pgdata/postgresql.conf" contains errors;
unaffected changes were applied

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-22 13:37:06 +00:00
Heikki Linnakangas
7372312a73 Avoid unnecessary send_replace calls in seqwait (#9852)
The notifications need to be sent whenever the waiters heap changes, per
the comment in `update_status`. But if 'advance' is called when there
are no waiters, or the new LSN is lower than the waiters so that no one
needs to be woken up, there's no need to send notifications. This saves
some CPU cycles in the common case that there are no waiters.
2024-11-22 13:29:49 +00:00
John Spray
d9de65ee8f pageserver: permit reads behind GC cutoff during LSN grace period (#9833)
## Problem

In https://github.com/neondatabase/neon/issues/9754 and the flakiness of
`test_readonly_node_gc`, we saw that although our logic for controlling
GC was sound, the validation of getpage requests was not, because it
could not consider LSN leases when requests arrived shortly after
restart.

Closes https://github.com/neondatabase/neon/issues/9754

## Summary of changes

This is the "Option 3" discussed verbally -- rather than holding back gc
cutoff, we waive the usual validation of request LSN if we are still
waiting for leases to be sent after startup

- When validating LSN in `wait_or_get_last_lsn`, skip the validation
relative to GC cutoff if the timeline is still in its LSN lease grace
period
- Re-enable test_readonly_node_gc
2024-11-22 09:24:23 +00:00
Fedor Dikarev
83b73fc24e Batch scrape workflows up to last 30 days and stop ad-hoc (#9846)
Comparing Batch and Ad-hoc collectors there is no big difference, just
we need scrape for longer duration to catch retries.
Dashboard with comparison:

https://neonprod.grafana.net/d/be3pjm7c9ne2oe/compare-ad-hoc-and-batch?orgId=1&from=1731345095814&to=1731946295814

I should anyway raise support case with Github relating to that,
meanwhile that should be working solution and should save us some cost,
so it worths to switch to Batch now.

Ref: https://github.com/neondatabase/cloud/issues/17503
2024-11-22 09:06:00 +00:00
Peter Bendel
1e05e3a6e2 minor PostgreSQL update in benchmarking (#9845)
## Problem

in benchmarking.yml job pgvector we install postgres from deb packages.
After the minor postgres update the referenced packages no longer exist

[Failing job:
](https://github.com/neondatabase/neon/actions/runs/11965785323/job/33360391115#step:4:41)

## Summary of changes

Reference and install the updated packages.

[Successful job after this
fix](https://github.com/neondatabase/neon/actions/runs/11967959920/job/33366011934#step:4:45)
2024-11-22 08:31:54 +00:00
Tristan Partin
37962e729e Fix panic in compute_ctl metrics collection (#9831)
Calling unwrap on the encoder is a little overzealous. One of the errors
that can be returned by the encode function in particular is the
non-existence of metrics for a metric family, so we should prematurely
filter instances like that out. I believe that the cause of this panic
was caused by a race condition between the prometheus collector and the
compute collecting the installed extensions metric for the first time.
The HTTP server is spawned on a separate thread before we even start
bringing up Postgres.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-21 20:19:02 +00:00
Erik Grinaker
190e8cebac safekeeper,pageserver: add CPU profiling (#9764)
## Problem

We don't have a convenient way to gather CPU profiles from a running
binary, e.g. during production incidents or end-to-end benchmarks, nor
during microbenchmarks (particularly on macOS).

We would also like to have continuous profiling in production, likely
using [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/).
We may choose to use either eBPF profiles or pprof profiles for this
(pending testing and discussion with SREs), but pprof profiles appear
useful regardless for the reasons listed above. See
https://github.com/neondatabase/cloud/issues/14888.

This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.

Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.

## Summary of changes

Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns
it. Defaults to a 5-second pprof Protobuf profile for use with e.g.
`pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query
parameters:

* `format`: output format (`pprof` or `svg`)
* `frequency`: sampling frequency in microseconds (default 100)
* `seconds`: number of seconds to profile (default 5)

Also integrates pprof profiles into Criterion benchmarks, such that
flamegraph reports can be taken with `cargo bench ... --profile-duration
<seconds>`. Output under `target/criterion/*/profile/flamegraph.svg`.

Example profiles:

* pprof profile (use [`pprof`](https://github.com/google/pprof)):
[profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz)
  * Web interface: `pprof -http :6060 profile.pb.gz`
* Interactive flamegraph:
[profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)
2024-11-21 18:59:46 +00:00
Conrad Ludgate
725a5ff003 fix(proxy): CancelKeyData display log masking (#9838)
Fixes the masking for the CancelKeyData display format. Due to negative
i32 cast to u64, the top-bits all had `0xffffffff` prefix. On the
bitwise-or that followed, these took priority.

This PR also compresses 3 logs during sql-over-http into 1 log with
durations as label fields, as prior discussed.
2024-11-21 16:46:30 +00:00
Alexander Bayandin
8d1c44039e Python 3.11 (#9515)
## Problem

On Debian 12 (Bookworm), Python 3.11 is the latest available version.

## Summary of changes
- Update Python to 3.11 in build-tools
- Fix ruff check / format
- Fix mypy
- Use `StrEnum` instead of pair `str`, `Enum`
- Update docs
2024-11-21 16:25:31 +00:00
Konstantin Knizhnik
0713ff3176 Bump Postgres version (#9808)
## Problem

I have made a mistake in merging Postgre PRs

## Summary of changes

Restore consistency of submodule referenced.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-21 14:56:56 +00:00
148 changed files with 5074 additions and 1703 deletions

View File

@@ -558,12 +558,12 @@ jobs:
arch=$(uname -m | sed 's/x86_64/amd64/g' | sed 's/aarch64/arm64/g')
cd /home/nonroot
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-17/libpq5_17.1-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-client-16_16.5-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-16_16.5-1.pgdg110+1_${arch}.deb"
dpkg -x libpq5_17.1-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-16_16.5-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-client-16_16.5-1.pgdg110+1_${arch}.deb pg
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-17/libpq5_17.2-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-client-16_16.6-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-16_16.6-1.pgdg110+1_${arch}.deb"
dpkg -x libpq5_17.2-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-16_16.6-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-client-16_16.6-1.pgdg110+1_${arch}.deb pg
mkdir -p /tmp/neon/pg_install/v16/bin
ln -s /home/nonroot/pg/usr/lib/postgresql/16/bin/pgbench /tmp/neon/pg_install/v16/bin/pgbench

View File

@@ -4,10 +4,12 @@ on:
schedule:
- cron: '*/15 * * * *'
- cron: '25 0 * * *'
- cron: '25 1 * * 6'
jobs:
gh-workflow-stats-batch:
name: GitHub Workflow Stats Batch
gh-workflow-stats-batch-2h:
name: GitHub Workflow Stats Batch 2 hours
if: github.event.schedule == '*/15 * * * *'
runs-on: ubuntu-22.04
permissions:
actions: read
@@ -16,14 +18,36 @@ jobs:
uses: neondatabase/gh-workflow-stats-action@v0.2.1
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_batch_neon"
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '2h'
- name: Export Workflow Run for the past 24 hours
if: github.event.schedule == '25 0 * * *'
gh-workflow-stats-batch-48h:
name: GitHub Workflow Stats Batch 48 hours
if: github.event.schedule == '25 0 * * *'
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Export Workflow Run for the past 48 hours
uses: neondatabase/gh-workflow-stats-action@v0.2.1
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_batch_neon"
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '24h'
duration: '48h'
gh-workflow-stats-batch-30d:
name: GitHub Workflow Stats Batch 30 days
if: github.event.schedule == '25 1 * * 6'
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Export Workflow Run for the past 30 days
uses: neondatabase/gh-workflow-stats-action@v0.2.1
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '720h'

View File

@@ -1,41 +0,0 @@
name: Report Workflow Stats
on:
workflow_run:
workflows:
- Add `external` label to issues and PRs created by external users
- Benchmarking
- Build and Test
- Build and Test Locally
- Build build-tools image
- Check Permissions
- Check neon with extra platform builds
- Cloud Regression Test
- Create Release Branch
- Handle `approved-for-ci-run` label
- Lint GitHub Workflows
- Notify Slack channel about upcoming release
- Periodic pagebench performance test on dedicated EC2 machine in eu-central-1 region
- Pin build-tools image
- Prepare benchmarking databases by restoring dumps
- Push images to ACR
- Test Postgres client libraries
- Trigger E2E Tests
- cleanup caches by a branch
- Pre-merge checks
types: [completed]
jobs:
gh-workflow-stats:
name: Github Workflow Stats
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Export GH Workflow Stats
uses: neondatabase/gh-workflow-stats-action@v0.1.4
with:
DB_URI: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
DB_TABLE: "gh_workflow_stats_neon"
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_RUN_ID: ${{ github.event.workflow_run.id }}

272
Cargo.lock generated
View File

@@ -46,6 +46,15 @@ dependencies = [
"memchr",
]
[[package]]
name = "aligned-vec"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7e0966165eaf052580bd70eb1b32cb3d6245774c0104d1b2793e9650bf83b52a"
dependencies = [
"equator",
]
[[package]]
name = "allocator-api2"
version = "0.2.16"
@@ -146,6 +155,12 @@ dependencies = [
"static_assertions",
]
[[package]]
name = "arrayvec"
version = "0.7.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "asn1-rs"
version = "0.6.2"
@@ -244,19 +259,6 @@ dependencies = [
"syn 2.0.52",
]
[[package]]
name = "async-timer"
version = "1.0.0-beta.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d420af8e042475e58a20d91af8eda7d6528989418c03f3f527e1c3415696f70"
dependencies = [
"error-code",
"libc",
"tokio",
"wasm-bindgen",
"web-time",
]
[[package]]
name = "async-trait"
version = "0.1.68"
@@ -372,6 +374,28 @@ dependencies = [
"tracing",
]
[[package]]
name = "aws-sdk-kms"
version = "1.47.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "564a597a3c71a957d60a2e4c62c93d78ee5a0d636531e15b760acad983a5c18e"
dependencies = [
"aws-credential-types",
"aws-runtime",
"aws-smithy-async",
"aws-smithy-http",
"aws-smithy-json",
"aws-smithy-runtime",
"aws-smithy-runtime-api",
"aws-smithy-types",
"aws-types",
"bytes",
"http 0.2.9",
"once_cell",
"regex-lite",
"tracing",
]
[[package]]
name = "aws-sdk-s3"
version = "1.52.0"
@@ -588,9 +612,9 @@ dependencies = [
[[package]]
name = "aws-smithy-runtime"
version = "1.7.1"
version = "1.7.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1ce695746394772e7000b39fe073095db6d45a862d0767dd5ad0ac0d7f8eb87"
checksum = "a065c0fe6fdbdf9f11817eb68582b2ab4aff9e9c39e986ae48f7ec576c6322db"
dependencies = [
"aws-smithy-async",
"aws-smithy-http",
@@ -755,7 +779,7 @@ dependencies = [
"once_cell",
"paste",
"pin-project",
"quick-xml",
"quick-xml 0.31.0",
"rand 0.8.5",
"reqwest 0.11.19",
"rustc_version",
@@ -1233,6 +1257,10 @@ name = "compute_tools"
version = "0.1.0"
dependencies = [
"anyhow",
"aws-config",
"aws-sdk-kms",
"aws-sdk-s3",
"base64 0.13.1",
"bytes",
"camino",
"cfg-if",
@@ -1250,13 +1278,16 @@ dependencies = [
"opentelemetry",
"opentelemetry_sdk",
"postgres",
"postgres_initdb",
"prometheus",
"regex",
"remote_storage",
"reqwest 0.12.4",
"rlimit",
"rust-ini",
"serde",
"serde_json",
"serde_with",
"signal-hook",
"tar",
"thiserror",
@@ -1394,6 +1425,15 @@ version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e496a50fda8aacccc86d7529e2c1e0892dbd0f898a6b5645b5561b89c3210efa"
[[package]]
name = "cpp_demangle"
version = "0.4.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96e58d342ad113c2b878f16d5d034c03be492ae460cdbc02b7f0f2284d310c7d"
dependencies = [
"cfg-if",
]
[[package]]
name = "cpufeatures"
version = "0.2.9"
@@ -1917,6 +1957,26 @@ dependencies = [
"termcolor",
]
[[package]]
name = "equator"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c35da53b5a021d2484a7cc49b2ac7f2d840f8236a286f84202369bd338d761ea"
dependencies = [
"equator-macro",
]
[[package]]
name = "equator-macro"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3bf679796c0322556351f287a51b49e48f7c4986e727b5dd78c972d30e2e16cc"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.52",
]
[[package]]
name = "equivalent"
version = "1.0.1"
@@ -1933,12 +1993,6 @@ dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "error-code"
version = "3.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5d9305ccc6942a704f4335694ecd3de2ea531b114ac2d51f5f843750787a92f"
[[package]]
name = "event-listener"
version = "2.5.3"
@@ -2030,6 +2084,18 @@ dependencies = [
"windows-sys 0.48.0",
]
[[package]]
name = "findshlibs"
version = "0.10.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "40b9e59cd0f7e0806cca4be089683ecb6434e602038df21fe6bf6711b2f07f64"
dependencies = [
"cc",
"lazy_static",
"libc",
"winapi",
]
[[package]]
name = "fixedbitset"
version = "0.4.2"
@@ -2733,6 +2799,24 @@ version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "64e9829a50b42bb782c1df523f78d332fe371b10c661e78b7a3c34b0198e9fac"
[[package]]
name = "inferno"
version = "0.11.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "232929e1d75fe899576a3d5c7416ad0d88dbfbb3c3d6aa00873a7408a50ddb88"
dependencies = [
"ahash",
"indexmap 2.0.1",
"is-terminal",
"itoa",
"log",
"num-format",
"once_cell",
"quick-xml 0.26.0",
"rgb",
"str_stack",
]
[[package]]
name = "inotify"
version = "0.9.6"
@@ -2783,9 +2867,9 @@ dependencies = [
[[package]]
name = "ipnet"
version = "2.9.0"
version = "2.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f518f335dce6725a761382244631d86cf0ccb2863413590b31338feb467f9c3"
checksum = "ddc24109865250148c2e0f3d25d4f0f479571723792d3802153c60922a4fb708"
[[package]]
name = "is-terminal"
@@ -3072,6 +3156,15 @@ version = "2.6.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f665ee40bc4a3c5590afb1e9677db74a508659dfd71e126420da8274909a0167"
[[package]]
name = "memmap2"
version = "0.9.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "45fd3a57831bf88bc63f8cebc0cf956116276e97fef3966103e96416209f7c92"
dependencies = [
"libc",
]
[[package]]
name = "memoffset"
version = "0.7.1"
@@ -3297,6 +3390,16 @@ version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51d515d32fb182ee37cda2ccdcb92950d6a3c2893aa280e540671c2cd0f3b1d9"
[[package]]
name = "num-format"
version = "0.4.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a652d9771a63711fd3c3deb670acfbe5c30a4072e664d7a3bf5a9e1056ac72c3"
dependencies = [
"arrayvec",
"itoa",
]
[[package]]
name = "num-integer"
version = "0.1.45"
@@ -3609,7 +3712,6 @@ dependencies = [
"arc-swap",
"async-compression",
"async-stream",
"async-timer",
"bit_field",
"byteorder",
"bytes",
@@ -3639,6 +3741,7 @@ dependencies = [
"num_cpus",
"once_cell",
"pageserver_api",
"pageserver_client",
"pageserver_compaction",
"pin-project-lite",
"postgres",
@@ -3647,6 +3750,7 @@ dependencies = [
"postgres_backend",
"postgres_connection",
"postgres_ffi",
"postgres_initdb",
"pq_proto",
"procfs",
"rand 0.8.5",
@@ -4122,12 +4226,48 @@ dependencies = [
"utils",
]
[[package]]
name = "postgres_initdb"
version = "0.1.0"
dependencies = [
"anyhow",
"camino",
"thiserror",
"tokio",
"workspace_hack",
]
[[package]]
name = "powerfmt"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "439ee305def115ba05938db6eb1644ff94165c5ab5e9420d1c1bcedbba909391"
[[package]]
name = "pprof"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ebbe2f8898beba44815fdc9e5a4ae9c929e21c5dc29b0c774a15555f7f58d6d0"
dependencies = [
"aligned-vec",
"backtrace",
"cfg-if",
"criterion",
"findshlibs",
"inferno",
"libc",
"log",
"nix 0.26.4",
"once_cell",
"parking_lot 0.12.1",
"protobuf",
"protobuf-codegen-pure",
"smallvec",
"symbolic-demangle",
"tempfile",
"thiserror",
]
[[package]]
name = "ppv-lite86"
version = "0.2.17"
@@ -4280,6 +4420,31 @@ dependencies = [
"prost",
]
[[package]]
name = "protobuf"
version = "2.28.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "106dd99e98437432fed6519dedecfade6a06a73bb7b2a1e019fdd2bee5778d94"
[[package]]
name = "protobuf-codegen"
version = "2.28.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "033460afb75cf755fcfc16dfaed20b86468082a2ea24e05ac35ab4a099a017d6"
dependencies = [
"protobuf",
]
[[package]]
name = "protobuf-codegen-pure"
version = "2.28.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "95a29399fc94bcd3eeaa951c715f7bea69409b2445356b00519740bcd6ddd865"
dependencies = [
"protobuf",
"protobuf-codegen",
]
[[package]]
name = "proxy"
version = "0.1.0"
@@ -4391,6 +4556,15 @@ dependencies = [
"zerocopy",
]
[[package]]
name = "quick-xml"
version = "0.26.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f50b1c63b38611e7d4d7f68b82d3ad0cc71a2ad2e7f61fc10f1328d917c93cd"
dependencies = [
"memchr",
]
[[package]]
name = "quick-xml"
version = "0.31.0"
@@ -4873,6 +5047,15 @@ dependencies = [
"subtle",
]
[[package]]
name = "rgb"
version = "0.8.50"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "57397d16646700483b67d2dd6511d79318f9d057fdbd21a4066aeac8b41d310a"
dependencies = [
"bytemuck",
]
[[package]]
name = "ring"
version = "0.17.6"
@@ -5186,6 +5369,7 @@ dependencies = [
"postgres-protocol",
"postgres_backend",
"postgres_ffi",
"pprof",
"pq_proto",
"rand 0.8.5",
"regex",
@@ -5732,6 +5916,12 @@ dependencies = [
"der 0.7.8",
]
[[package]]
name = "stable_deref_trait"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a8f112729512f8e442d81f95a8a7ddf2b7c6b8a1a6f509a95864142b30cab2d3"
[[package]]
name = "static_assertions"
version = "1.1.0"
@@ -5878,6 +6068,12 @@ dependencies = [
"workspace_hack",
]
[[package]]
name = "str_stack"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9091b6114800a5f2141aee1d1b9d6ca3592ac062dc5decb3764ec5895a47b4eb"
[[package]]
name = "stringprep"
version = "0.1.2"
@@ -5925,6 +6121,29 @@ version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20e16a0f46cf5fd675563ef54f26e83e20f2366bcf027bcb3cc3ed2b98aaf2ca"
[[package]]
name = "symbolic-common"
version = "12.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "366f1b4c6baf6cfefc234bbd4899535fca0b06c74443039a73f6dfb2fad88d77"
dependencies = [
"debugid",
"memmap2",
"stable_deref_trait",
"uuid",
]
[[package]]
name = "symbolic-demangle"
version = "12.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "aba05ba5b9962ea5617baf556293720a8b2d0a282aa14ee4bf10e22efc7da8c8"
dependencies = [
"cpp_demangle",
"rustc-demangle",
"symbolic-common",
]
[[package]]
name = "syn"
version = "1.0.109"
@@ -6792,6 +7011,7 @@ dependencies = [
"once_cell",
"pin-project-lite",
"postgres_connection",
"pprof",
"pq_proto",
"rand 0.8.5",
"regex",
@@ -7326,6 +7546,7 @@ dependencies = [
"anyhow",
"axum",
"axum-core",
"base64 0.13.1",
"base64 0.21.1",
"base64ct",
"bytes",
@@ -7360,6 +7581,7 @@ dependencies = [
"libc",
"log",
"memchr",
"nix 0.26.4",
"nom",
"num-bigint",
"num-integer",

View File

@@ -34,6 +34,7 @@ members = [
"libs/vm_monitor",
"libs/walproposer",
"libs/wal_decoder",
"libs/postgres_initdb",
]
[workspace.package]
@@ -47,7 +48,6 @@ anyhow = { version = "1.0", features = ["backtrace"] }
arc-swap = "1.6"
async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }
atomic-take = "1.1.0"
async-timer = { version= "1.0.0-beta.15", features = ["tokio1"] }
azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }
azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
@@ -58,6 +58,7 @@ async-trait = "0.1"
aws-config = { version = "1.5", default-features = false, features=["rustls", "sso"] }
aws-sdk-s3 = "1.52"
aws-sdk-iam = "1.46.0"
aws-sdk-kms = "1.47.0"
aws-smithy-async = { version = "1.2.1", default-features = false, features=["rt-tokio"] }
aws-smithy-types = "1.2"
aws-credential-types = "1.2.0"
@@ -74,7 +75,7 @@ bytes = "1.0"
camino = "1.1.6"
cfg-if = "1.0.0"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
clap = { version = "4.0", features = ["derive"] }
clap = { version = "4.0", features = ["derive", "env"] }
comfy-table = "7.1"
const_format = "0.2"
crc32c = "0.6"
@@ -107,7 +108,7 @@ hyper-util = "0.1"
tokio-tungstenite = "0.21.0"
indexmap = "2"
indoc = "2"
ipnet = "2.9.0"
ipnet = "2.10.0"
itertools = "0.10"
itoa = "1.0.11"
jsonwebtoken = "9"
@@ -131,6 +132,7 @@ parquet = { version = "53", default-features = false, features = ["zstd"] }
parquet_derive = "53"
pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }
pin-project-lite = "0.2"
pprof = { version = "0.14", features = ["criterion", "flamegraph", "protobuf", "protobuf-codec"] }
procfs = "0.16"
prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency
prost = "0.13"
@@ -154,7 +156,7 @@ sentry = { version = "0.32", default-features = false, features = ["backtrace",
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
serde_path_to_error = "0.1"
serde_with = "2.0"
serde_with = { version = "2.0", features = [ "base64" ] }
serde_assert = "0.5.0"
sha2 = "0.10.2"
signal-hook = "0.3"
@@ -213,12 +215,14 @@ tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", br
compute_api = { version = "0.1", path = "./libs/compute_api/" }
consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
metrics = { version = "0.1", path = "./libs/metrics/" }
pageserver = { path = "./pageserver" }
pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }
pageserver_client = { path = "./pageserver/client" }
pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }
postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }
postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }
postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }
postgres_initdb = { path = "./libs/postgres_initdb" }
pq_proto = { version = "0.1", path = "./libs/pq_proto/" }
remote_storage = { version = "0.1", path = "./libs/remote_storage/" }
safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }

View File

@@ -132,7 +132,7 @@ make -j`sysctl -n hw.logicalcpu` -s
To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.
To run the integration tests or Python scripts (not required to use the code), install
Python (3.9 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.
Python (3.11 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.
#### Running neon database

View File

@@ -234,7 +234,7 @@ USER nonroot:nonroot
WORKDIR /home/nonroot
# Python
ENV PYTHON_VERSION=3.9.19 \
ENV PYTHON_VERSION=3.11.10 \
PYENV_ROOT=/home/nonroot/.pyenv \
PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH
RUN set -e \

View File

@@ -1243,7 +1243,7 @@ RUN make -j $(getconf _NPROCESSORS_ONLN) \
#########################################################################################
#
# Compile and run the Neon-specific `compute_ctl` binary
# Compile and run the Neon-specific `compute_ctl` and `fast_import` binaries
#
#########################################################################################
FROM $REPOSITORY/$IMAGE:$TAG AS compute-tools
@@ -1264,6 +1264,7 @@ RUN cd compute_tools && mold -run cargo build --locked --profile release-line-de
FROM debian:$DEBIAN_FLAVOR AS compute-tools-image
COPY --from=compute-tools /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
COPY --from=compute-tools /home/nonroot/target/release-line-debug-size-lto/fast_import /usr/local/bin/fast_import
#########################################################################################
#
@@ -1458,6 +1459,7 @@ RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/fast_import /usr/local/bin/fast_import
# pgbouncer and its config
COPY --from=pgbouncer /usr/local/pgbouncer/bin/pgbouncer /usr/local/bin/pgbouncer
@@ -1533,6 +1535,25 @@ RUN apt update && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
# s5cmd 2.2.2 from https://github.com/peak/s5cmd/releases/tag/v2.2.2
# used by fast_import
ARG TARGETARCH
ADD https://github.com/peak/s5cmd/releases/download/v2.2.2/s5cmd_2.2.2_linux_$TARGETARCH.deb /tmp/s5cmd.deb
RUN set -ex; \
\
# Determine the expected checksum based on TARGETARCH
if [ "${TARGETARCH}" = "amd64" ]; then \
CHECKSUM="392c385320cd5ffa435759a95af77c215553d967e4b1c0fffe52e4f14c29cf85"; \
elif [ "${TARGETARCH}" = "arm64" ]; then \
CHECKSUM="939bee3cf4b5604ddb00e67f8c157b91d7c7a5b553d1fbb6890fad32894b7b46"; \
else \
echo "Unsupported architecture: ${TARGETARCH}"; exit 1; \
fi; \
\
# Compute and validate the checksum
echo "${CHECKSUM} /tmp/s5cmd.deb" | sha256sum -c -
RUN dpkg -i /tmp/s5cmd.deb && rm /tmp/s5cmd.deb
ENV LANG=en_US.utf8
USER postgres
ENTRYPOINT ["/usr/local/bin/compute_ctl"]

View File

@@ -10,6 +10,10 @@ default = []
testing = []
[dependencies]
base64.workspace = true
aws-config.workspace = true
aws-sdk-s3.workspace = true
aws-sdk-kms.workspace = true
anyhow.workspace = true
camino.workspace = true
chrono.workspace = true
@@ -27,6 +31,8 @@ opentelemetry.workspace = true
opentelemetry_sdk.workspace = true
postgres.workspace = true
regex.workspace = true
serde.workspace = true
serde_with.workspace = true
serde_json.workspace = true
signal-hook.workspace = true
tar.workspace = true
@@ -43,6 +49,7 @@ thiserror.workspace = true
url.workspace = true
prometheus.workspace = true
postgres_initdb.workspace = true
compute_api.workspace = true
utils.workspace = true
workspace_hack.workspace = true

View File

@@ -0,0 +1,338 @@
//! This program dumps a remote Postgres database into a local Postgres database
//! and uploads the resulting PGDATA into object storage for import into a Timeline.
//!
//! # Context, Architecture, Design
//!
//! See cloud.git Fast Imports RFC (<https://github.com/neondatabase/cloud/pull/19799>)
//! for the full picture.
//! The RFC describing the storage pieces of importing the PGDATA dump into a Timeline
//! is publicly accessible at <https://github.com/neondatabase/neon/pull/9538>.
//!
//! # This is a Prototype!
//!
//! This program is part of a prototype feature and not yet used in production.
//!
//! The cloud.git RFC contains lots of suggestions for improving e2e throughput
//! of this step of the timeline import process.
//!
//! # Local Testing
//!
//! - Comment out most of the pgxns in The Dockerfile.compute-tools to speed up the build.
//! - Build the image with the following command:
//!
//! ```bash
//! docker buildx build --build-arg DEBIAN_FLAVOR=bullseye-slim --build-arg GIT_VERSION=local --build-arg PG_VERSION=v14 --build-arg BUILD_TAG="$(date --iso-8601=s -u)" -t localhost:3030/localregistry/compute-node-v14:latest -f compute/Dockerfile.com
//! docker push localhost:3030/localregistry/compute-node-v14:latest
//! ```
use anyhow::Context;
use aws_config::BehaviorVersion;
use camino::{Utf8Path, Utf8PathBuf};
use clap::Parser;
use nix::unistd::Pid;
use tracing::{info, info_span, warn, Instrument};
use utils::fs_ext::is_directory_empty;
#[path = "fast_import/child_stdio_to_log.rs"]
mod child_stdio_to_log;
#[path = "fast_import/s3_uri.rs"]
mod s3_uri;
#[path = "fast_import/s5cmd.rs"]
mod s5cmd;
#[derive(clap::Parser)]
struct Args {
#[clap(long)]
working_directory: Utf8PathBuf,
#[clap(long, env = "NEON_IMPORTER_S3_PREFIX")]
s3_prefix: s3_uri::S3Uri,
#[clap(long)]
pg_bin_dir: Utf8PathBuf,
#[clap(long)]
pg_lib_dir: Utf8PathBuf,
}
#[serde_with::serde_as]
#[derive(serde::Deserialize)]
struct Spec {
encryption_secret: EncryptionSecret,
#[serde_as(as = "serde_with::base64::Base64")]
source_connstring_ciphertext_base64: Vec<u8>,
}
#[derive(serde::Deserialize)]
enum EncryptionSecret {
#[allow(clippy::upper_case_acronyms)]
KMS { key_id: String },
}
#[tokio::main]
pub(crate) async fn main() -> anyhow::Result<()> {
utils::logging::init(
utils::logging::LogFormat::Plain,
utils::logging::TracingErrorLayerEnablement::EnableWithRustLogFilter,
utils::logging::Output::Stdout,
)?;
info!("starting");
let Args {
working_directory,
s3_prefix,
pg_bin_dir,
pg_lib_dir,
} = Args::parse();
let aws_config = aws_config::load_defaults(BehaviorVersion::v2024_03_28()).await;
let spec: Spec = {
let spec_key = s3_prefix.append("/spec.json");
let s3_client = aws_sdk_s3::Client::new(&aws_config);
let object = s3_client
.get_object()
.bucket(&spec_key.bucket)
.key(spec_key.key)
.send()
.await
.context("get spec from s3")?
.body
.collect()
.await
.context("download spec body")?;
serde_json::from_slice(&object.into_bytes()).context("parse spec as json")?
};
match tokio::fs::create_dir(&working_directory).await {
Ok(()) => {}
Err(e) if e.kind() == std::io::ErrorKind::AlreadyExists => {
if !is_directory_empty(&working_directory)
.await
.context("check if working directory is empty")?
{
anyhow::bail!("working directory is not empty");
} else {
// ok
}
}
Err(e) => return Err(anyhow::Error::new(e).context("create working directory")),
}
let pgdata_dir = working_directory.join("pgdata");
tokio::fs::create_dir(&pgdata_dir)
.await
.context("create pgdata directory")?;
//
// Setup clients
//
let aws_config = aws_config::load_defaults(BehaviorVersion::v2024_03_28()).await;
let kms_client = aws_sdk_kms::Client::new(&aws_config);
//
// Initialize pgdata
//
let superuser = "cloud_admin"; // XXX: this shouldn't be hard-coded
postgres_initdb::do_run_initdb(postgres_initdb::RunInitdbArgs {
superuser,
locale: "en_US.UTF-8", // XXX: this shouldn't be hard-coded,
pg_version: 140000, // XXX: this shouldn't be hard-coded but derived from which compute image we're running in
initdb_bin: pg_bin_dir.join("initdb").as_ref(),
library_search_path: &pg_lib_dir, // TODO: is this right? Prob works in compute image, not sure about neon_local.
pgdata: &pgdata_dir,
})
.await
.context("initdb")?;
let nproc = num_cpus::get();
//
// Launch postgres process
//
let mut postgres_proc = tokio::process::Command::new(pg_bin_dir.join("postgres"))
.arg("-D")
.arg(&pgdata_dir)
.args(["-c", "wal_level=minimal"])
.args(["-c", "shared_buffers=10GB"])
.args(["-c", "max_wal_senders=0"])
.args(["-c", "fsync=off"])
.args(["-c", "full_page_writes=off"])
.args(["-c", "synchronous_commit=off"])
.args(["-c", "maintenance_work_mem=8388608"])
.args(["-c", &format!("max_parallel_maintenance_workers={nproc}")])
.args(["-c", &format!("max_parallel_workers={nproc}")])
.args(["-c", &format!("max_parallel_workers_per_gather={nproc}")])
.args(["-c", &format!("max_worker_processes={nproc}")])
.args(["-c", "effective_io_concurrency=100"])
.env_clear()
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()
.context("spawn postgres")?;
info!("spawned postgres, waiting for it to become ready");
tokio::spawn(
child_stdio_to_log::relay_process_output(
postgres_proc.stdout.take(),
postgres_proc.stderr.take(),
)
.instrument(info_span!("postgres")),
);
let restore_pg_connstring =
format!("host=localhost port=5432 user={superuser} dbname=postgres");
loop {
let res = tokio_postgres::connect(&restore_pg_connstring, tokio_postgres::NoTls).await;
if res.is_ok() {
info!("postgres is ready, could connect to it");
break;
}
}
//
// Decrypt connection string
//
let source_connection_string = {
match spec.encryption_secret {
EncryptionSecret::KMS { key_id } => {
let mut output = kms_client
.decrypt()
.key_id(key_id)
.ciphertext_blob(aws_sdk_s3::primitives::Blob::new(
spec.source_connstring_ciphertext_base64,
))
.send()
.await
.context("decrypt source connection string")?;
let plaintext = output
.plaintext
.take()
.context("get plaintext source connection string")?;
String::from_utf8(plaintext.into_inner())
.context("parse source connection string as utf8")?
}
}
};
//
// Start the work
//
let dumpdir = working_directory.join("dumpdir");
let common_args = [
// schema mapping (prob suffices to specify them on one side)
"--no-owner".to_string(),
"--no-privileges".to_string(),
"--no-publications".to_string(),
"--no-security-labels".to_string(),
"--no-subscriptions".to_string(),
"--no-tablespaces".to_string(),
// format
"--format".to_string(),
"directory".to_string(),
// concurrency
"--jobs".to_string(),
num_cpus::get().to_string(),
// progress updates
"--verbose".to_string(),
];
info!("dump into the working directory");
{
let mut pg_dump = tokio::process::Command::new(pg_bin_dir.join("pg_dump"))
.args(&common_args)
.arg("-f")
.arg(&dumpdir)
.arg("--no-sync")
// POSITIONAL args
// source db (db name included in connection string)
.arg(&source_connection_string)
// how we run it
.env_clear()
.kill_on_drop(true)
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()
.context("spawn pg_dump")?;
info!(pid=%pg_dump.id().unwrap(), "spawned pg_dump");
tokio::spawn(
child_stdio_to_log::relay_process_output(pg_dump.stdout.take(), pg_dump.stderr.take())
.instrument(info_span!("pg_dump")),
);
let st = pg_dump.wait().await.context("wait for pg_dump")?;
info!(status=?st, "pg_dump exited");
if !st.success() {
warn!(status=%st, "pg_dump failed, restore will likely fail as well");
}
}
// TODO: do it in a streaming way, plenty of internal research done on this already
// TODO: do the unlogged table trick
info!("restore from working directory into vanilla postgres");
{
let mut pg_restore = tokio::process::Command::new(pg_bin_dir.join("pg_restore"))
.args(&common_args)
.arg("-d")
.arg(&restore_pg_connstring)
// POSITIONAL args
.arg(&dumpdir)
// how we run it
.env_clear()
.kill_on_drop(true)
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()
.context("spawn pg_restore")?;
info!(pid=%pg_restore.id().unwrap(), "spawned pg_restore");
tokio::spawn(
child_stdio_to_log::relay_process_output(
pg_restore.stdout.take(),
pg_restore.stderr.take(),
)
.instrument(info_span!("pg_restore")),
);
let st = pg_restore.wait().await.context("wait for pg_restore")?;
info!(status=?st, "pg_restore exited");
if !st.success() {
warn!(status=%st, "pg_restore failed, restore will likely fail as well");
}
}
info!("shutdown postgres");
{
nix::sys::signal::kill(
Pid::from_raw(
i32::try_from(postgres_proc.id().unwrap()).expect("convert child pid to i32"),
),
nix::sys::signal::SIGTERM,
)
.context("signal postgres to shut down")?;
postgres_proc
.wait()
.await
.context("wait for postgres to shut down")?;
}
info!("upload pgdata");
s5cmd::sync(Utf8Path::new(&pgdata_dir), &s3_prefix.append("/"))
.await
.context("sync dump directory to destination")?;
info!("write status");
{
let status_dir = working_directory.join("status");
std::fs::create_dir(&status_dir).context("create status directory")?;
let status_file = status_dir.join("status");
std::fs::write(&status_file, serde_json::json!({"done": true}).to_string())
.context("write status file")?;
s5cmd::sync(&status_file, &s3_prefix.append("/status/pgdata"))
.await
.context("sync status directory to destination")?;
}
Ok(())
}

View File

@@ -0,0 +1,35 @@
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::{ChildStderr, ChildStdout};
use tracing::info;
/// Asynchronously relays the output from a child process's `stdout` and `stderr` to the tracing log.
/// Each line is read and logged individually, with lossy UTF-8 conversion.
///
/// # Arguments
///
/// * `stdout`: An `Option<ChildStdout>` from the child process.
/// * `stderr`: An `Option<ChildStderr>` from the child process.
///
pub(crate) async fn relay_process_output(stdout: Option<ChildStdout>, stderr: Option<ChildStderr>) {
let stdout_fut = async {
if let Some(stdout) = stdout {
let reader = BufReader::new(stdout);
let mut lines = reader.lines();
while let Ok(Some(line)) = lines.next_line().await {
info!(fd = "stdout", "{}", line);
}
}
};
let stderr_fut = async {
if let Some(stderr) = stderr {
let reader = BufReader::new(stderr);
let mut lines = reader.lines();
while let Ok(Some(line)) = lines.next_line().await {
info!(fd = "stderr", "{}", line);
}
}
};
tokio::join!(stdout_fut, stderr_fut);
}

View File

@@ -0,0 +1,75 @@
use anyhow::Result;
use std::str::FromStr;
/// Struct to hold parsed S3 components
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct S3Uri {
pub bucket: String,
pub key: String,
}
impl FromStr for S3Uri {
type Err = anyhow::Error;
/// Parse an S3 URI into a bucket and key
fn from_str(uri: &str) -> Result<Self> {
// Ensure the URI starts with "s3://"
if !uri.starts_with("s3://") {
return Err(anyhow::anyhow!("Invalid S3 URI scheme"));
}
// Remove the "s3://" prefix
let stripped_uri = &uri[5..];
// Split the remaining string into bucket and key parts
if let Some((bucket, key)) = stripped_uri.split_once('/') {
Ok(S3Uri {
bucket: bucket.to_string(),
key: key.to_string(),
})
} else {
Err(anyhow::anyhow!(
"Invalid S3 URI format, missing bucket or key"
))
}
}
}
impl S3Uri {
pub fn append(&self, suffix: &str) -> Self {
Self {
bucket: self.bucket.clone(),
key: format!("{}{}", self.key, suffix),
}
}
}
impl std::fmt::Display for S3Uri {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "s3://{}/{}", self.bucket, self.key)
}
}
impl clap::builder::TypedValueParser for S3Uri {
type Value = Self;
fn parse_ref(
&self,
_cmd: &clap::Command,
_arg: Option<&clap::Arg>,
value: &std::ffi::OsStr,
) -> Result<Self::Value, clap::Error> {
let value_str = value.to_str().ok_or_else(|| {
clap::Error::raw(
clap::error::ErrorKind::InvalidUtf8,
"Invalid UTF-8 sequence",
)
})?;
S3Uri::from_str(value_str).map_err(|e| {
clap::Error::raw(
clap::error::ErrorKind::InvalidValue,
format!("Failed to parse S3 URI: {}", e),
)
})
}
}

View File

@@ -0,0 +1,27 @@
use anyhow::Context;
use camino::Utf8Path;
use super::s3_uri::S3Uri;
pub(crate) async fn sync(local: &Utf8Path, remote: &S3Uri) -> anyhow::Result<()> {
let mut builder = tokio::process::Command::new("s5cmd");
// s5cmd uses aws-sdk-go v1, hence doesn't support AWS_ENDPOINT_URL
if let Some(val) = std::env::var_os("AWS_ENDPOINT_URL") {
builder.arg("--endpoint-url").arg(val);
}
builder
.arg("sync")
.arg(local.as_str())
.arg(remote.to_string());
let st = builder
.spawn()
.context("spawn s5cmd")?
.wait()
.await
.context("wait for s5cmd")?;
if st.success() {
Ok(())
} else {
Err(anyhow::anyhow!("s5cmd failed"))
}
}

View File

@@ -116,7 +116,7 @@ pub fn write_postgres_conf(
vartype: "enum".to_owned(),
};
write!(file, "{}", opt.to_pg_setting())?;
writeln!(file, "{}", opt.to_pg_setting())?;
}
}

View File

@@ -20,6 +20,7 @@ use anyhow::Result;
use hyper::header::CONTENT_TYPE;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use metrics::proto::MetricFamily;
use metrics::Encoder;
use metrics::TextEncoder;
use tokio::task;
@@ -72,10 +73,22 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body
(&Method::GET, "/metrics") => {
debug!("serving /metrics GET request");
let mut buffer = vec![];
let metrics = installed_extensions::collect();
// When we call TextEncoder::encode() below, it will immediately
// return an error if a metric family has no metrics, so we need to
// preemptively filter out metric families with no metrics.
let metrics = installed_extensions::collect()
.into_iter()
.filter(|m| !m.get_metric().is_empty())
.collect::<Vec<MetricFamily>>();
let encoder = TextEncoder::new();
encoder.encode(&metrics, &mut buffer).unwrap();
let mut buffer = vec![];
if let Err(err) = encoder.encode(&metrics, &mut buffer) {
let msg = format!("error handling /metrics request: {err}");
error!(msg);
return render_json_error(&msg, StatusCode::INTERNAL_SERVER_ERROR);
}
match Response::builder()
.status(StatusCode::OK)

View File

@@ -115,7 +115,7 @@ pub fn get_installed_extensions_sync(connstr: Url) -> Result<()> {
static INSTALLED_EXTENSIONS: Lazy<UIntGaugeVec> = Lazy::new(|| {
register_uint_gauge_vec!(
"installed_extensions",
"compute_installed_extensions",
"Number of databases where the version of extension is installed",
&["extension_name", "version"]
)

View File

@@ -1153,6 +1153,7 @@ async fn handle_timeline(cmd: &TimelineCmd, env: &mut local_env::LocalEnv) -> Re
timeline_info.timeline_id
);
}
// TODO: rename to import-basebackup-plus-wal
TimelineCmd::Import(args) => {
let tenant_id = get_tenant_id(args.tenant_id, env)?;
let timeline_id = args.timeline_id;

View File

@@ -113,21 +113,21 @@ so manual installation of dependencies is not recommended.
A single virtual environment with all dependencies is described in the single `Pipfile`.
### Prerequisites
- Install Python 3.9 (the minimal supported version) or greater.
- Install Python 3.11 (the minimal supported version) or greater.
- Our setup with poetry should work with newer python versions too. So feel free to open an issue with a `c/test-runner` label if something doesn't work as expected.
- If you have some trouble with other version you can resolve it by installing Python 3.9 separately, via [pyenv](https://github.com/pyenv/pyenv) or via system package manager e.g.:
- If you have some trouble with other version you can resolve it by installing Python 3.11 separately, via [pyenv](https://github.com/pyenv/pyenv) or via system package manager e.g.:
```bash
# In Ubuntu
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.9
sudo apt install python3.11
```
- Install `poetry`
- Exact version of `poetry` is not important, see installation instructions available at poetry's [website](https://python-poetry.org/docs/#installation).
- Install dependencies via `./scripts/pysync`.
- Note that CI uses specific Python version (look for `PYTHON_VERSION` [here](https://github.com/neondatabase/docker-images/blob/main/rust/Dockerfile))
so if you have different version some linting tools can yield different result locally vs in the CI.
- You can explicitly specify which Python to use by running `poetry env use /path/to/python`, e.g. `poetry env use python3.9`.
- You can explicitly specify which Python to use by running `poetry env use /path/to/python`, e.g. `poetry env use python3.11`.
This may also disable the `The currently activated Python version X.Y.Z is not supported by the project` warning.
Run `poetry shell` to activate the virtual environment.

View File

@@ -2,28 +2,14 @@
// This module has heavy inspiration from the prometheus crate's `process_collector.rs`.
use once_cell::sync::Lazy;
use prometheus::Gauge;
use crate::UIntGauge;
pub struct Collector {
descs: Vec<prometheus::core::Desc>,
vmlck: crate::UIntGauge,
cpu_seconds_highres: Gauge,
}
const NMETRICS: usize = 2;
static CLK_TCK_F64: Lazy<f64> = Lazy::new(|| {
let long = unsafe { libc::sysconf(libc::_SC_CLK_TCK) };
if long == -1 {
panic!("sysconf(_SC_CLK_TCK) failed");
}
let convertible_to_f64: i32 =
i32::try_from(long).expect("sysconf(_SC_CLK_TCK) is larger than i32");
convertible_to_f64 as f64
});
const NMETRICS: usize = 1;
impl prometheus::core::Collector for Collector {
fn desc(&self) -> Vec<&prometheus::core::Desc> {
@@ -41,12 +27,6 @@ impl prometheus::core::Collector for Collector {
mfs.extend(self.vmlck.collect())
}
}
if let Ok(stat) = myself.stat() {
let cpu_seconds = stat.utime + stat.stime;
self.cpu_seconds_highres
.set(cpu_seconds as f64 / *CLK_TCK_F64);
mfs.extend(self.cpu_seconds_highres.collect());
}
mfs
}
}
@@ -63,23 +43,7 @@ impl Collector {
.cloned(),
);
let cpu_seconds_highres = Gauge::new(
"libmetrics_process_cpu_seconds_highres",
"Total user and system CPU time spent in seconds.\
Sub-second resolution, hence better than `process_cpu_seconds_total`.",
)
.unwrap();
descs.extend(
prometheus::core::Collector::desc(&cpu_seconds_highres)
.into_iter()
.cloned(),
);
Self {
descs,
vmlck,
cpu_seconds_highres,
}
Self { descs, vmlck }
}
}

View File

@@ -33,6 +33,7 @@ remote_storage.workspace = true
postgres_backend.workspace = true
nix = {workspace = true, optional = true}
reqwest.workspace = true
rand.workspace = true
[dev-dependencies]
bincode.workspace = true

View File

@@ -97,6 +97,15 @@ pub struct ConfigToml {
pub control_plane_api: Option<reqwest::Url>,
pub control_plane_api_token: Option<String>,
pub control_plane_emergency_mode: bool,
/// Unstable feature: subject to change or removal without notice.
/// See <https://github.com/neondatabase/neon/pull/9218>.
pub import_pgdata_upcall_api: Option<reqwest::Url>,
/// Unstable feature: subject to change or removal without notice.
/// See <https://github.com/neondatabase/neon/pull/9218>.
pub import_pgdata_upcall_api_token: Option<String>,
/// Unstable feature: subject to change or removal without notice.
/// See <https://github.com/neondatabase/neon/pull/9218>.
pub import_pgdata_aws_endpoint_url: Option<reqwest::Url>,
pub heatmap_upload_concurrency: usize,
pub secondary_download_concurrency: usize,
pub virtual_file_io_engine: Option<crate::models::virtual_file::IoEngineKind>,
@@ -386,6 +395,10 @@ impl Default for ConfigToml {
control_plane_api_token: (None),
control_plane_emergency_mode: (false),
import_pgdata_upcall_api: (None),
import_pgdata_upcall_api_token: (None),
import_pgdata_aws_endpoint_url: (None),
heatmap_upload_concurrency: (DEFAULT_HEATMAP_UPLOAD_CONCURRENCY),
secondary_download_concurrency: (DEFAULT_SECONDARY_DOWNLOAD_CONCURRENCY),

View File

@@ -48,7 +48,7 @@ pub struct ShardedRange<'a> {
// Calculate the size of a range within the blocks of the same relation, or spanning only the
// top page in the previous relation's space.
fn contiguous_range_len(range: &Range<Key>) -> u32 {
pub fn contiguous_range_len(range: &Range<Key>) -> u32 {
debug_assert!(is_contiguous_range(range));
if range.start.field6 == 0xffffffff {
range.end.field6 + 1
@@ -67,7 +67,7 @@ fn contiguous_range_len(range: &Range<Key>) -> u32 {
/// This matters, because:
/// - Within such ranges, keys are used contiguously. Outside such ranges it is sparse.
/// - Within such ranges, we may calculate distances using simple subtraction of field6.
fn is_contiguous_range(range: &Range<Key>) -> bool {
pub fn is_contiguous_range(range: &Range<Key>) -> bool {
range.start.field1 == range.end.field1
&& range.start.field2 == range.end.field2
&& range.start.field3 == range.end.field3

View File

@@ -2,6 +2,8 @@ pub mod detach_ancestor;
pub mod partitioning;
pub mod utilization;
#[cfg(feature = "testing")]
use camino::Utf8PathBuf;
pub use utilization::PageserverUtilization;
use std::{
@@ -227,6 +229,9 @@ pub enum TimelineCreateRequestMode {
// we continue to accept it by having it here.
pg_version: Option<u32>,
},
ImportPgdata {
import_pgdata: TimelineCreateRequestModeImportPgdata,
},
// NB: Bootstrap is all-optional, and thus the serde(untagged) will cause serde to stop at Bootstrap.
// (serde picks the first matching enum variant, in declaration order).
Bootstrap {
@@ -236,6 +241,42 @@ pub enum TimelineCreateRequestMode {
},
}
#[derive(Serialize, Deserialize, Clone)]
pub struct TimelineCreateRequestModeImportPgdata {
pub location: ImportPgdataLocation,
pub idempotency_key: ImportPgdataIdempotencyKey,
}
#[derive(Serialize, Deserialize, Clone, Debug)]
pub enum ImportPgdataLocation {
#[cfg(feature = "testing")]
LocalFs { path: Utf8PathBuf },
AwsS3 {
region: String,
bucket: String,
/// A better name for this would be `prefix`; changing requires coordination with cplane.
/// See <https://github.com/neondatabase/cloud/issues/20646>.
key: String,
},
}
#[derive(Serialize, Deserialize, Clone)]
#[serde(transparent)]
pub struct ImportPgdataIdempotencyKey(pub String);
impl ImportPgdataIdempotencyKey {
pub fn random() -> Self {
use rand::{distributions::Alphanumeric, Rng};
Self(
rand::thread_rng()
.sample_iter(&Alphanumeric)
.take(20)
.map(char::from)
.collect(),
)
}
}
#[derive(Serialize, Deserialize, Clone)]
pub struct LsnLeaseRequest {
pub lsn: Lsn,

View File

@@ -0,0 +1,12 @@
[package]
name = "postgres_initdb"
version = "0.1.0"
edition.workspace = true
license.workspace = true
[dependencies]
anyhow.workspace = true
tokio.workspace = true
camino.workspace = true
thiserror.workspace = true
workspace_hack = { version = "0.1", path = "../../workspace_hack" }

View File

@@ -0,0 +1,103 @@
//! The canonical way we run `initdb` in Neon.
//!
//! initdb has implicit defaults that are dependent on the environment, e.g., locales & collations.
//!
//! This module's job is to eliminate the environment-dependence as much as possible.
use std::fmt;
use camino::Utf8Path;
pub struct RunInitdbArgs<'a> {
pub superuser: &'a str,
pub locale: &'a str,
pub initdb_bin: &'a Utf8Path,
pub pg_version: u32,
pub library_search_path: &'a Utf8Path,
pub pgdata: &'a Utf8Path,
}
#[derive(thiserror::Error, Debug)]
pub enum Error {
Spawn(std::io::Error),
Failed {
status: std::process::ExitStatus,
stderr: Vec<u8>,
},
WaitOutput(std::io::Error),
Other(anyhow::Error),
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Error::Spawn(e) => write!(f, "Error spawning command: {:?}", e),
Error::Failed { status, stderr } => write!(
f,
"Command failed with status {:?}: {}",
status,
String::from_utf8_lossy(stderr)
),
Error::WaitOutput(e) => write!(f, "Error waiting for command output: {:?}", e),
Error::Other(e) => write!(f, "Error: {:?}", e),
}
}
}
pub async fn do_run_initdb(args: RunInitdbArgs<'_>) -> Result<(), Error> {
let RunInitdbArgs {
superuser,
locale,
initdb_bin: initdb_bin_path,
pg_version,
library_search_path,
pgdata,
} = args;
let mut initdb_command = tokio::process::Command::new(initdb_bin_path);
initdb_command
.args(["--pgdata", pgdata.as_ref()])
.args(["--username", superuser])
.args(["--encoding", "utf8"])
.args(["--locale", locale])
.arg("--no-instructions")
.arg("--no-sync")
.env_clear()
.env("LD_LIBRARY_PATH", library_search_path)
.env("DYLD_LIBRARY_PATH", library_search_path)
.stdin(std::process::Stdio::null())
// stdout invocation produces the same output every time, we don't need it
.stdout(std::process::Stdio::null())
// we would be interested in the stderr output, if there was any
.stderr(std::process::Stdio::piped());
// Before version 14, only the libc provide was available.
if pg_version > 14 {
// Version 17 brought with it a builtin locale provider which only provides
// C and C.UTF-8. While being safer for collation purposes since it is
// guaranteed to be consistent throughout a major release, it is also more
// performant.
let locale_provider = if pg_version >= 17 { "builtin" } else { "libc" };
initdb_command.args(["--locale-provider", locale_provider]);
}
let initdb_proc = initdb_command.spawn().map_err(Error::Spawn)?;
// Ideally we'd select here with the cancellation token, but the problem is that
// we can't safely terminate initdb: it launches processes of its own, and killing
// initdb doesn't kill them. After we return from this function, we want the target
// directory to be able to be cleaned up.
// See https://github.com/neondatabase/neon/issues/6385
let initdb_output = initdb_proc
.wait_with_output()
.await
.map_err(Error::WaitOutput)?;
if !initdb_output.status.success() {
return Err(Error::Failed {
status: initdb_output.status,
stderr: initdb_output.stderr,
});
}
Ok(())
}

View File

@@ -184,9 +184,8 @@ pub struct CancelKeyData {
impl fmt::Display for CancelKeyData {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// TODO: this is producing strange results, with 0xffffffff........ always in the logs.
let hi = (self.backend_pid as u64) << 32;
let lo = self.cancel_key as u64;
let lo = (self.cancel_key as u64) & 0xffffffff;
let id = hi | lo;
// This format is more compact and might work better for logs.
@@ -1047,4 +1046,13 @@ mod tests {
let data = [0, 0, 0, 7, 0, 0, 0, 0];
FeStartupPacket::parse(&mut BytesMut::from_iter(data)).unwrap_err();
}
#[test]
fn cancel_key_data() {
let key = CancelKeyData {
backend_pid: -1817212860,
cancel_key: -1183897012,
};
assert_eq!(format!("{key}"), "CancelKeyData(93af8844b96f2a4c)");
}
}

View File

@@ -24,6 +24,7 @@ use azure_storage_blobs::{blob::operations::GetBlobBuilder, prelude::ContainerCl
use bytes::Bytes;
use futures::future::Either;
use futures::stream::Stream;
use futures::FutureExt;
use futures_util::StreamExt;
use futures_util::TryStreamExt;
use http_types::{StatusCode, Url};
@@ -31,6 +32,7 @@ use scopeguard::ScopeGuard;
use tokio_util::sync::CancellationToken;
use tracing::debug;
use utils::backoff;
use utils::backoff::exponential_backoff_duration_seconds;
use crate::metrics::{start_measuring_requests, AttemptOutcome, RequestKind};
use crate::{
@@ -302,40 +304,59 @@ impl RemoteStorage for AzureBlobStorage {
let mut next_marker = None;
let mut timeout_try_cnt = 1;
'outer: loop {
let mut builder = builder.clone();
if let Some(marker) = next_marker.clone() {
builder = builder.marker(marker);
}
let response = builder.into_stream();
let response = response.into_stream().map_err(to_download_error);
let response = tokio_stream::StreamExt::timeout(response, self.timeout);
let response = response.map(|res| match res {
Ok(res) => res,
Err(_elapsed) => Err(DownloadError::Timeout),
// Azure Blob Rust SDK does not expose the list blob API directly. Users have to use
// their pageable iterator wrapper that returns all keys as a stream. We want to have
// full control of paging, and therefore we only take the first item from the stream.
let mut response_stream = builder.into_stream();
let response = response_stream.next();
// Timeout mechanism: Azure client will sometimes stuck on a request, but retrying that request
// would immediately succeed. Therefore, we use exponential backoff timeout to retry the request.
// (Usually, exponential backoff is used to determine the sleep time between two retries.) We
// start with 10.0 second timeout, and double the timeout for each failure, up to 5 failures.
// timeout = min(5 * (1.0+1.0)^n, self.timeout).
let this_timeout = (5.0 * exponential_backoff_duration_seconds(timeout_try_cnt, 1.0, self.timeout.as_secs_f64())).min(self.timeout.as_secs_f64());
let response = tokio::time::timeout(Duration::from_secs_f64(this_timeout), response);
let response = response.map(|res| {
match res {
Ok(Some(Ok(res))) => Ok(Some(res)),
Ok(Some(Err(e))) => Err(to_download_error(e)),
Ok(None) => Ok(None),
Err(_elasped) => Err(DownloadError::Timeout),
}
});
let mut response = std::pin::pin!(response);
let mut max_keys = max_keys.map(|mk| mk.get());
let next_item = tokio::select! {
op = response.next() => Ok(op),
op = response => op,
_ = cancel.cancelled() => Err(DownloadError::Cancelled),
}?;
};
if let Err(DownloadError::Timeout) = &next_item {
timeout_try_cnt += 1;
if timeout_try_cnt <= 5 {
continue;
}
}
let next_item = next_item?;
if timeout_try_cnt >= 2 {
tracing::warn!("Azure Blob Storage list timed out and succeeded after {} tries", timeout_try_cnt);
}
timeout_try_cnt = 1;
let Some(entry) = next_item else {
// The list is complete, so yield it.
break;
};
let mut res = Listing::default();
let entry = match entry {
Ok(entry) => entry,
Err(e) => {
// The error is potentially retryable, so we must rewind the loop after yielding.
yield Err(e);
continue;
}
};
next_marker = entry.continuation();
let prefix_iter = entry
.blobs
@@ -351,7 +372,7 @@ impl RemoteStorage for AzureBlobStorage {
last_modified: k.properties.last_modified.into(),
size: k.properties.content_length,
}
);
);
for key in blob_iter {
res.keys.push(key);

View File

@@ -360,7 +360,12 @@ impl RemoteStorage for LocalFs {
let mut objects = Vec::with_capacity(keys.len());
for key in keys {
let path = key.with_base(&self.storage_root);
let metadata = file_metadata(&path).await?;
let metadata = file_metadata(&path).await;
if let Err(DownloadError::NotFound) = metadata {
// Race: if the file is deleted between listing and metadata check, ignore it.
continue;
}
let metadata = metadata?;
if metadata.is_dir() {
continue;
}

View File

@@ -29,6 +29,7 @@ jsonwebtoken.workspace = true
nix.workspace = true
once_cell.workspace = true
pin-project-lite.workspace = true
pprof.workspace = true
regex.workspace = true
routerify.workspace = true
serde.workspace = true

View File

@@ -1,7 +1,8 @@
use crate::auth::{AuthError, Claims, SwappableJwtAuth};
use crate::http::error::{api_error_handler, route_error_handler, ApiError};
use anyhow::Context;
use hyper::header::{HeaderName, AUTHORIZATION};
use crate::http::request::{get_query_param, parse_query_param};
use anyhow::{anyhow, Context};
use hyper::header::{HeaderName, AUTHORIZATION, CONTENT_DISPOSITION};
use hyper::http::HeaderValue;
use hyper::Method;
use hyper::{header::CONTENT_TYPE, Body, Request, Response};
@@ -12,11 +13,13 @@ use routerify::{Middleware, RequestInfo, Router, RouterBuilder};
use tracing::{debug, info, info_span, warn, Instrument};
use std::future::Future;
use std::io::Write as _;
use std::str::FromStr;
use std::time::Duration;
use bytes::{Bytes, BytesMut};
use std::io::Write as _;
use tokio::sync::mpsc;
use pprof::protos::Message as _;
use tokio::sync::{mpsc, Mutex};
use tokio_stream::wrappers::ReceiverStream;
static SERVE_METRICS_COUNT: Lazy<IntCounter> = Lazy::new(|| {
@@ -328,6 +331,82 @@ pub async fn prometheus_metrics_handler(_req: Request<Body>) -> Result<Response<
Ok(response)
}
/// Generates CPU profiles.
pub async fn profile_cpu_handler(req: Request<Body>) -> Result<Response<Body>, ApiError> {
enum Format {
Pprof,
Svg,
}
// Parameters.
let format = match get_query_param(&req, "format")?.as_deref() {
None => Format::Pprof,
Some("pprof") => Format::Pprof,
Some("svg") => Format::Svg,
Some(format) => return Err(ApiError::BadRequest(anyhow!("invalid format {format}"))),
};
let seconds = match parse_query_param(&req, "seconds")? {
None => 5,
Some(seconds @ 1..=30) => seconds,
Some(_) => return Err(ApiError::BadRequest(anyhow!("duration must be 1-30 secs"))),
};
let frequency_hz = match parse_query_param(&req, "frequency")? {
None => 99,
Some(1001..) => return Err(ApiError::BadRequest(anyhow!("frequency must be <=1000 Hz"))),
Some(frequency) => frequency,
};
// Only allow one profiler at a time.
static PROFILE_LOCK: Lazy<Mutex<()>> = Lazy::new(|| Mutex::new(()));
let _lock = PROFILE_LOCK
.try_lock()
.map_err(|_| ApiError::Conflict("profiler already running".into()))?;
// Take the profile.
let report = tokio::task::spawn_blocking(move || {
let guard = pprof::ProfilerGuardBuilder::default()
.frequency(frequency_hz)
.blocklist(&["libc", "libgcc", "pthread", "vdso"])
.build()?;
std::thread::sleep(Duration::from_secs(seconds));
guard.report().build()
})
.await
.map_err(|join_err| ApiError::InternalServerError(join_err.into()))?
.map_err(|pprof_err| ApiError::InternalServerError(pprof_err.into()))?;
// Return the report in the requested format.
match format {
Format::Pprof => {
let mut body = Vec::new();
report
.pprof()
.map_err(|err| ApiError::InternalServerError(err.into()))?
.write_to_vec(&mut body)
.map_err(|err| ApiError::InternalServerError(err.into()))?;
Response::builder()
.status(200)
.header(CONTENT_TYPE, "application/octet-stream")
.header(CONTENT_DISPOSITION, "attachment; filename=\"profile.pb\"")
.body(Body::from(body))
.map_err(|err| ApiError::InternalServerError(err.into()))
}
Format::Svg => {
let mut body = Vec::new();
report
.flamegraph(&mut body)
.map_err(|err| ApiError::InternalServerError(err.into()))?;
Response::builder()
.status(200)
.header(CONTENT_TYPE, "image/svg+xml")
.body(Body::from(body))
.map_err(|err| ApiError::InternalServerError(err.into()))
}
}
}
pub fn add_request_id_middleware<B: hyper::body::HttpBody + Send + Sync + 'static>(
) -> Middleware<B, ApiError> {
Middleware::pre(move |req| async move {

View File

@@ -30,7 +30,7 @@ pub fn parse_request_param<T: FromStr>(
}
}
fn get_query_param<'a>(
pub fn get_query_param<'a>(
request: &'a Request<Body>,
param_name: &str,
) -> Result<Option<Cow<'a, str>>, ApiError> {

View File

@@ -83,7 +83,9 @@ where
}
wake_these.push(self.heap.pop().unwrap().wake_channel);
}
self.update_status();
if !wake_these.is_empty() {
self.update_status();
}
wake_these
}

View File

@@ -15,7 +15,6 @@ anyhow.workspace = true
arc-swap.workspace = true
async-compression.workspace = true
async-stream.workspace = true
async-timer.workspace = true
bit_field.workspace = true
byteorder.workspace = true
bytes.workspace = true
@@ -44,6 +43,7 @@ postgres.workspace = true
postgres_backend.workspace = true
postgres-protocol.workspace = true
postgres-types.workspace = true
postgres_initdb.workspace = true
rand.workspace = true
range-set-blaze = { version = "0.1.16", features = ["alloc"] }
regex.workspace = true
@@ -69,6 +69,7 @@ url.workspace = true
walkdir.workspace = true
metrics.workspace = true
pageserver_api.workspace = true
pageserver_client.workspace = true # for ResponseErrorMessageExt TOOD refactor that
pageserver_compaction.workspace = true
postgres_connection.workspace = true
postgres_ffi.workspace = true

View File

@@ -144,6 +144,10 @@ pub struct PageServerConf {
/// JWT token for use with the control plane API.
pub control_plane_api_token: Option<SecretString>,
pub import_pgdata_upcall_api: Option<Url>,
pub import_pgdata_upcall_api_token: Option<SecretString>,
pub import_pgdata_aws_endpoint_url: Option<Url>,
/// If true, pageserver will make best-effort to operate without a control plane: only
/// for use in major incidents.
pub control_plane_emergency_mode: bool,
@@ -328,6 +332,9 @@ impl PageServerConf {
control_plane_api,
control_plane_api_token,
control_plane_emergency_mode,
import_pgdata_upcall_api,
import_pgdata_upcall_api_token,
import_pgdata_aws_endpoint_url,
heatmap_upload_concurrency,
secondary_download_concurrency,
ingest_batch_size,
@@ -383,6 +390,9 @@ impl PageServerConf {
timeline_offloading,
ephemeral_bytes_per_memory_kb,
server_side_batch_timeout,
import_pgdata_upcall_api,
import_pgdata_upcall_api_token: import_pgdata_upcall_api_token.map(SecretString::from),
import_pgdata_aws_endpoint_url,
// ------------------------------------------------------------
// fields that require additional validation or custom handling

View File

@@ -15,6 +15,7 @@ use tokio_util::sync::CancellationToken;
use tracing::info;
use tracing::warn;
use utils::backoff;
use utils::pausable_failpoint;
use crate::metrics;
@@ -90,6 +91,7 @@ impl Deleter {
/// Block until everything in accumulator has been executed
async fn flush(&mut self) -> Result<(), DeletionQueueError> {
while !self.accumulator.is_empty() && !self.cancel.is_cancelled() {
pausable_failpoint!("deletion-queue-before-execute-pause");
match self.remote_delete().await {
Ok(()) => {
// Note: we assume that the remote storage layer returns Ok(()) if some

View File

@@ -623,6 +623,8 @@ paths:
existing_initdb_timeline_id:
type: string
format: hex
import_pgdata:
$ref: "#/components/schemas/TimelineCreateRequestImportPgdata"
responses:
"201":
description: Timeline was created, or already existed with matching parameters
@@ -979,6 +981,34 @@ components:
$ref: "#/components/schemas/TenantConfig"
effective_config:
$ref: "#/components/schemas/TenantConfig"
TimelineCreateRequestImportPgdata:
type: object
required:
- location
- idempotency_key
properties:
idempotency_key:
type: string
location:
$ref: "#/components/schemas/TimelineCreateRequestImportPgdataLocation"
TimelineCreateRequestImportPgdataLocation:
type: object
properties:
AwsS3:
$ref: "#/components/schemas/TimelineCreateRequestImportPgdataLocationAwsS3"
TimelineCreateRequestImportPgdataLocationAwsS3:
type: object
properties:
region:
type: string
bucket:
type: string
key:
type: string
required:
- region
- bucket
- key
TimelineInfo:
type: object
required:

View File

@@ -40,6 +40,7 @@ use pageserver_api::models::TenantSorting;
use pageserver_api::models::TenantState;
use pageserver_api::models::TimelineArchivalConfigRequest;
use pageserver_api::models::TimelineCreateRequestMode;
use pageserver_api::models::TimelineCreateRequestModeImportPgdata;
use pageserver_api::models::TimelinesInfoAndOffloaded;
use pageserver_api::models::TopTenantShardItem;
use pageserver_api::models::TopTenantShardsRequest;
@@ -55,6 +56,7 @@ use tokio_util::sync::CancellationToken;
use tracing::*;
use utils::auth::JwtAuth;
use utils::failpoint_support::failpoints_handler;
use utils::http::endpoint::profile_cpu_handler;
use utils::http::endpoint::prometheus_metrics_handler;
use utils::http::endpoint::request_span;
use utils::http::request::must_parse_query_param;
@@ -80,6 +82,7 @@ use crate::tenant::secondary::SecondaryController;
use crate::tenant::size::ModelInputs;
use crate::tenant::storage_layer::LayerAccessStatsReset;
use crate::tenant::storage_layer::LayerName;
use crate::tenant::timeline::import_pgdata;
use crate::tenant::timeline::offload::offload_timeline;
use crate::tenant::timeline::offload::OffloadError;
use crate::tenant::timeline::CompactFlags;
@@ -125,7 +128,7 @@ pub struct State {
conf: &'static PageServerConf,
tenant_manager: Arc<TenantManager>,
auth: Option<Arc<SwappableJwtAuth>>,
allowlist_routes: Vec<Uri>,
allowlist_routes: &'static [&'static str],
remote_storage: GenericRemoteStorage,
broker_client: storage_broker::BrokerClientChannel,
disk_usage_eviction_state: Arc<disk_usage_eviction_task::State>,
@@ -146,10 +149,13 @@ impl State {
deletion_queue_client: DeletionQueueClient,
secondary_controller: SecondaryController,
) -> anyhow::Result<Self> {
let allowlist_routes = ["/v1/status", "/v1/doc", "/swagger.yml", "/metrics"]
.iter()
.map(|v| v.parse().unwrap())
.collect::<Vec<_>>();
let allowlist_routes = &[
"/v1/status",
"/v1/doc",
"/swagger.yml",
"/metrics",
"/profile/cpu",
];
Ok(Self {
conf,
tenant_manager,
@@ -576,6 +582,35 @@ async fn timeline_create_handler(
ancestor_timeline_id,
ancestor_start_lsn,
}),
TimelineCreateRequestMode::ImportPgdata {
import_pgdata:
TimelineCreateRequestModeImportPgdata {
location,
idempotency_key,
},
} => tenant::CreateTimelineParams::ImportPgdata(tenant::CreateTimelineParamsImportPgdata {
idempotency_key: import_pgdata::index_part_format::IdempotencyKey::new(
idempotency_key.0,
),
new_timeline_id,
location: {
use import_pgdata::index_part_format::Location;
use pageserver_api::models::ImportPgdataLocation;
match location {
#[cfg(feature = "testing")]
ImportPgdataLocation::LocalFs { path } => Location::LocalFs { path },
ImportPgdataLocation::AwsS3 {
region,
bucket,
key,
} => Location::AwsS3 {
region,
bucket,
key,
},
}
},
}),
};
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Error);
@@ -3148,7 +3183,7 @@ pub fn make_router(
if auth.is_some() {
router = router.middleware(auth_middleware(|request| {
let state = get_state(request);
if state.allowlist_routes.contains(request.uri()) {
if state.allowlist_routes.contains(&request.uri().path()) {
None
} else {
state.auth.as_deref()
@@ -3167,6 +3202,7 @@ pub fn make_router(
Ok(router
.data(state)
.get("/metrics", |r| request_span(r, prometheus_metrics_handler))
.get("/profile/cpu", |r| request_span(r, profile_cpu_handler))
.get("/v1/status", |r| api_handler(r, status_handler))
.put("/v1/failpoints", |r| {
testing_api_handler("manage failpoints", r, failpoints_handler)

View File

@@ -3,7 +3,6 @@
use anyhow::{bail, Context};
use async_compression::tokio::write::GzipEncoder;
use async_timer::Timer;
use bytes::Buf;
use futures::FutureExt;
use itertools::Itertools;
@@ -23,7 +22,6 @@ use pq_proto::FeStartupPacket;
use pq_proto::{BeMessage, FeMessage, RowDescriptor};
use std::borrow::Cow;
use std::io;
use std::pin::Pin;
use std::str;
use std::str::FromStr;
use std::sync::Arc;
@@ -316,15 +314,11 @@ struct PageServerHandler {
timeline_handles: TimelineHandles,
/// Messages queued up for the next processing batch
next_batch: Option<BatchedFeMessage>,
/// See [`PageServerConf::server_side_batch_timeout`]
server_side_batch_timeout: Option<Duration>,
server_side_batch_timer: Pin<Box<async_timer::timer::Platform>>,
}
struct Carry {
msg: BatchedFeMessage,
started_at: Instant,
}
struct TimelineHandles {
@@ -588,8 +582,8 @@ impl PageServerHandler {
connection_ctx,
timeline_handles: TimelineHandles::new(tenant_manager),
cancel,
next_batch: None,
server_side_batch_timeout,
server_side_batch_timer: Box::pin(async_timer::new_timer(Duration::from_secs(999))), // reset each iteration
}
}
@@ -617,86 +611,44 @@ impl PageServerHandler {
)
}
#[instrument(skip_all, level = tracing::Level::TRACE)]
async fn read_batch_from_connection<IO>(
&mut self,
pgb: &mut PostgresBackend<IO>,
tenant_id: &TenantId,
timeline_id: &TimelineId,
maybe_carry: &mut Option<Carry>,
ctx: &RequestContext,
) -> Result<BatchOrEof, QueryError>
) -> Result<Option<BatchOrEof>, QueryError>
where
IO: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
debug_assert_current_span_has_tenant_and_timeline_id_no_shard_id();
let mut batch = self.next_batch.take();
let mut batch_started_at: Option<std::time::Instant> = None;
let mut batching_deadline_storage = None; // TODO: can this just be an unsync once_cell?
loop {
// Create a future that will become ready when we need to stop batching.
use futures::future::Either;
let batching_deadline = match (
&*maybe_carry as &Option<Carry>,
&mut batching_deadline_storage,
) {
(None, None) => Either::Left(futures::future::pending()), // there's no deadline before we have something batched
(None, Some(_)) => unreachable!(),
(Some(_), Some(fut)) => Either::Right(fut), // below arm already ran
(Some(carry), None) => {
match self.server_side_batch_timeout {
None => {
return Ok(BatchOrEof::Batch(smallvec::smallvec![
maybe_carry
.take()
.expect("we already checked it's Some")
.msg
]))
}
Some(batch_timeout) => {
// Take into consideration the time the carry spent waiting.
let batch_timeout =
batch_timeout.saturating_sub(carry.started_at.elapsed());
if batch_timeout.is_zero() {
// the timer doesn't support restarting with zero duration
return Ok(BatchOrEof::Batch(smallvec::smallvec![
maybe_carry
.take()
.expect("we already checked it's Some")
.msg
]));
} else {
self.server_side_batch_timer.restart(batch_timeout);
batching_deadline_storage = Some(&mut self.server_side_batch_timer);
Either::Right(
batching_deadline_storage.as_mut().expect("we just set it"),
)
}
}
}
}
let next_batch: Option<BatchedFeMessage> = loop {
let sleep_fut = match (self.server_side_batch_timeout, batch_started_at) {
(Some(batch_timeout), Some(started_at)) => futures::future::Either::Left(
tokio::time::sleep_until((started_at + batch_timeout).into()),
),
_ => futures::future::Either::Right(futures::future::pending()),
};
let msg = tokio::select! {
biased;
_ = self.cancel.cancelled() => {
return Err(QueryError::Shutdown)
}
_ = batching_deadline => {
return Ok(BatchOrEof::Batch(smallvec::smallvec![maybe_carry.take().expect("per construction of batching_deadline").msg]));
msg = pgb.read_message() => {
msg
}
_ = sleep_fut => {
assert!(batch.is_some());
break None;
}
msg = pgb.read_message() => { msg }
};
let msg_start = Instant::now();
// Rest of this loop body is trying to batch `msg` into `batch`.
// If we can add msg to batch we continue into the next loop iteration.
// If we can't add msg to batch batch, we carry `msg` over to the next call.
let copy_data_bytes = match msg? {
Some(FeMessage::CopyData(bytes)) => bytes,
Some(FeMessage::Terminate) => {
return Ok(BatchOrEof::Eof);
return Ok(Some(BatchOrEof::Eof));
}
Some(m) => {
return Err(QueryError::Other(anyhow::anyhow!(
@@ -704,11 +656,10 @@ impl PageServerHandler {
)));
}
None => {
return Ok(BatchOrEof::Eof);
return Ok(Some(BatchOrEof::Eof));
} // client disconnected
};
trace!("query: {copy_data_bytes:?}");
fail::fail_point!("ps::handle-pagerequest-message");
// parse request
@@ -750,11 +701,11 @@ impl PageServerHandler {
span,
error: $error,
};
let batch_and_error = match maybe_carry.take() {
Some(carry) => smallvec::smallvec![carry.msg, error],
let batch_and_error = match batch {
Some(b) => smallvec::smallvec![b, error],
None => smallvec::smallvec![error],
};
Ok(BatchOrEof::Batch(batch_and_error))
Ok(Some(BatchOrEof::Batch(batch_and_error)))
}};
}
@@ -807,20 +758,26 @@ impl PageServerHandler {
}
};
//
// batch
//
match (maybe_carry.as_mut(), this_msg) {
let batch_timeout = match self.server_side_batch_timeout {
Some(value) => value,
None => {
// Batching is not enabled - stop on the first message.
return Ok(Some(BatchOrEof::Batch(smallvec::smallvec![this_msg])));
}
};
// check if we can batch
match (&mut batch, this_msg) {
(None, this_msg) => {
*maybe_carry = Some(Carry { msg: this_msg, started_at: msg_start });
batch = Some(this_msg);
}
(
Some(Carry { msg: BatchedFeMessage::GetPage {
Some(BatchedFeMessage::GetPage {
span: _,
shard: accum_shard,
pages: ref mut accum_pages,
pages: accum_pages,
effective_request_lsn: accum_lsn,
}, started_at: _}),
}),
BatchedFeMessage::GetPage {
span: _,
shard: this_shard,
@@ -830,14 +787,12 @@ impl PageServerHandler {
) if async {
assert_eq!(this_pages.len(), 1);
if accum_pages.len() >= Timeline::MAX_GET_VECTORED_KEYS as usize {
trace!(%accum_lsn, %this_lsn, "stopping batching because of batch size");
assert_eq!(accum_pages.len(), Timeline::MAX_GET_VECTORED_KEYS as usize);
return false;
}
if (accum_shard.tenant_shard_id, accum_shard.timeline_id)
!= (this_shard.tenant_shard_id, this_shard.timeline_id)
{
trace!(%accum_lsn, %this_lsn, "stopping batching because timeline object mismatch");
// TODO: we _could_ batch & execute each shard seperately (and in parallel).
// But the current logic for keeping responses in order does not support that.
return false;
@@ -845,7 +800,6 @@ impl PageServerHandler {
// the vectored get currently only supports a single LSN, so, bounce as soon
// as the effective request_lsn changes
if *accum_lsn != this_lsn {
trace!(%accum_lsn, %this_lsn, "stopping batching because LSN changed");
return false;
}
true
@@ -855,17 +809,21 @@ impl PageServerHandler {
// ok to batch
accum_pages.extend(this_pages);
}
(Some(carry), this_msg) => {
(Some(_), this_msg) => {
// by default, don't continue batching
let carry = std::mem::replace(carry,
Carry {
msg: this_msg,
started_at: msg_start,
});
return Ok(BatchOrEof::Batch(smallvec::smallvec![carry.msg]));
break Some(this_msg);
}
}
}
// batching impl piece
let started_at = batch_started_at.get_or_insert_with(Instant::now);
if started_at.elapsed() > batch_timeout {
break None;
}
};
self.next_batch = next_batch;
Ok(batch.map(|b| BatchOrEof::Batch(smallvec::smallvec![b])))
}
/// Pagestream sub-protocol handler.
@@ -903,17 +861,22 @@ impl PageServerHandler {
}
}
let mut carry: Option<Carry> = None;
// If [`PageServerHandler`] is reused for multiple pagestreams,
// then make sure to not process requests from the previous ones.
self.next_batch = None;
loop {
let maybe_batched = self
.read_batch_from_connection(pgb, &tenant_id, &timeline_id, &mut carry, &ctx)
.read_batch_from_connection(pgb, &tenant_id, &timeline_id, &ctx)
.await?;
let batched = match maybe_batched {
BatchOrEof::Batch(b) => b,
BatchOrEof::Eof => {
Some(BatchOrEof::Batch(b)) => b,
Some(BatchOrEof::Eof) => {
break;
}
None => {
continue;
}
};
for batch in batched {
@@ -959,7 +922,6 @@ impl PageServerHandler {
(
{
let npages = pages.len();
trace!(npages, "handling getpage request");
let res = self
.handle_get_page_at_lsn_request_batched(
&shard,
@@ -1106,21 +1068,26 @@ impl PageServerHandler {
));
}
if request_lsn < **latest_gc_cutoff_lsn {
// Check explicitly for INVALID just to get a less scary error message if the request is obviously bogus
if request_lsn == Lsn::INVALID {
return Err(PageStreamError::BadRequest(
"invalid LSN(0) in request".into(),
));
}
// Clients should only read from recent LSNs on their timeline, or from locations holding an LSN lease.
//
// We may have older data available, but we make a best effort to detect this case and return an error,
// to distinguish a misbehaving client (asking for old LSN) from a storage issue (data missing at a legitimate LSN).
if request_lsn < **latest_gc_cutoff_lsn && !timeline.is_gc_blocked_by_lsn_lease_deadline() {
let gc_info = &timeline.gc_info.read().unwrap();
if !gc_info.leases.contains_key(&request_lsn) {
// The requested LSN is below gc cutoff and is not guarded by a lease.
// Check explicitly for INVALID just to get a less scary error message if the
// request is obviously bogus
return Err(if request_lsn == Lsn::INVALID {
PageStreamError::BadRequest("invalid LSN(0) in request".into())
} else {
return Err(
PageStreamError::BadRequest(format!(
"tried to request a page version that was garbage collected. requested at {} gc cutoff {}",
request_lsn, **latest_gc_cutoff_lsn
).into())
});
);
}
}

View File

@@ -2276,9 +2276,9 @@ impl<'a> Version<'a> {
//--- Metadata structs stored in key-value pairs in the repository.
#[derive(Debug, Serialize, Deserialize)]
struct DbDirectory {
pub(crate) struct DbDirectory {
// (spcnode, dbnode) -> (do relmapper and PG_VERSION files exist)
dbdirs: HashMap<(Oid, Oid), bool>,
pub(crate) dbdirs: HashMap<(Oid, Oid), bool>,
}
// The format of TwoPhaseDirectory changed in PostgreSQL v17, because the filenames of
@@ -2287,8 +2287,8 @@ struct DbDirectory {
// "pg_twophsae/0000000A000002E4".
#[derive(Debug, Serialize, Deserialize)]
struct TwoPhaseDirectory {
xids: HashSet<TransactionId>,
pub(crate) struct TwoPhaseDirectory {
pub(crate) xids: HashSet<TransactionId>,
}
#[derive(Debug, Serialize, Deserialize)]
@@ -2297,12 +2297,12 @@ struct TwoPhaseDirectoryV17 {
}
#[derive(Debug, Serialize, Deserialize, Default)]
struct RelDirectory {
pub(crate) struct RelDirectory {
// Set of relations that exist. (relfilenode, forknum)
//
// TODO: Store it as a btree or radix tree or something else that spans multiple
// key-value pairs, if you have a lot of relations
rels: HashSet<(Oid, u8)>,
pub(crate) rels: HashSet<(Oid, u8)>,
}
#[derive(Debug, Serialize, Deserialize)]
@@ -2311,9 +2311,9 @@ struct RelSizeEntry {
}
#[derive(Debug, Serialize, Deserialize, Default)]
struct SlruSegmentDirectory {
pub(crate) struct SlruSegmentDirectory {
// Set of SLRU segments that exist.
segments: HashSet<u32>,
pub(crate) segments: HashSet<u32>,
}
#[derive(Copy, Clone, PartialEq, Eq, Debug, enum_map::Enum)]

View File

@@ -381,6 +381,8 @@ pub enum TaskKind {
UnitTest,
DetachAncestor,
ImportPgdata,
}
#[derive(Default)]

View File

@@ -43,7 +43,9 @@ use std::sync::atomic::AtomicBool;
use std::sync::Weak;
use std::time::SystemTime;
use storage_broker::BrokerClientChannel;
use timeline::import_pgdata;
use timeline::offload::offload_timeline;
use timeline::ShutdownMode;
use tokio::io::BufReader;
use tokio::sync::watch;
use tokio::task::JoinSet;
@@ -373,7 +375,6 @@ pub struct Tenant {
l0_flush_global_state: L0FlushGlobalState,
}
impl std::fmt::Debug for Tenant {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{} ({})", self.tenant_shard_id, self.current_state())
@@ -860,6 +861,7 @@ impl Debug for SetStoppingError {
pub(crate) enum CreateTimelineParams {
Bootstrap(CreateTimelineParamsBootstrap),
Branch(CreateTimelineParamsBranch),
ImportPgdata(CreateTimelineParamsImportPgdata),
}
#[derive(Debug)]
@@ -877,7 +879,14 @@ pub(crate) struct CreateTimelineParamsBranch {
pub(crate) ancestor_start_lsn: Option<Lsn>,
}
/// What is used to determine idempotency of a [`Tenant::create_timeline`] call in [`Tenant::start_creating_timeline`].
#[derive(Debug)]
pub(crate) struct CreateTimelineParamsImportPgdata {
pub(crate) new_timeline_id: TimelineId,
pub(crate) location: import_pgdata::index_part_format::Location,
pub(crate) idempotency_key: import_pgdata::index_part_format::IdempotencyKey,
}
/// What is used to determine idempotency of a [`Tenant::create_timeline`] call in [`Tenant::start_creating_timeline`] in [`Tenant::start_creating_timeline`].
///
/// Each [`Timeline`] object holds [`Self`] as an immutable property in [`Timeline::create_idempotency`].
///
@@ -907,19 +916,50 @@ pub(crate) enum CreateTimelineIdempotency {
ancestor_timeline_id: TimelineId,
ancestor_start_lsn: Lsn,
},
ImportPgdata(CreatingTimelineIdempotencyImportPgdata),
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) struct CreatingTimelineIdempotencyImportPgdata {
idempotency_key: import_pgdata::index_part_format::IdempotencyKey,
}
/// What is returned by [`Tenant::start_creating_timeline`].
#[must_use]
enum StartCreatingTimelineResult<'t> {
CreateGuard(TimelineCreateGuard<'t>),
enum StartCreatingTimelineResult {
CreateGuard(TimelineCreateGuard),
Idempotent(Arc<Timeline>),
}
enum TimelineInitAndSyncResult {
ReadyToActivate(Arc<Timeline>),
NeedsSpawnImportPgdata(TimelineInitAndSyncNeedsSpawnImportPgdata),
}
impl TimelineInitAndSyncResult {
fn ready_to_activate(self) -> Option<Arc<Timeline>> {
match self {
Self::ReadyToActivate(timeline) => Some(timeline),
_ => None,
}
}
}
#[must_use]
struct TimelineInitAndSyncNeedsSpawnImportPgdata {
timeline: Arc<Timeline>,
import_pgdata: import_pgdata::index_part_format::Root,
guard: TimelineCreateGuard,
}
/// What is returned by [`Tenant::create_timeline`].
enum CreateTimelineResult {
Created(Arc<Timeline>),
Idempotent(Arc<Timeline>),
/// IMPORTANT: This [`Arc<Timeline>`] object is not in [`Tenant::timelines`] when
/// we return this result, nor will this concrete object ever be added there.
/// Cf method comment on [`Tenant::create_timeline_import_pgdata`].
ImportSpawned(Arc<Timeline>),
}
impl CreateTimelineResult {
@@ -927,18 +967,19 @@ impl CreateTimelineResult {
match self {
Self::Created(_) => "Created",
Self::Idempotent(_) => "Idempotent",
Self::ImportSpawned(_) => "ImportSpawned",
}
}
fn timeline(&self) -> &Arc<Timeline> {
match self {
Self::Created(t) | Self::Idempotent(t) => t,
Self::Created(t) | Self::Idempotent(t) | Self::ImportSpawned(t) => t,
}
}
/// Unit test timelines aren't activated, test has to do it if it needs to.
#[cfg(test)]
fn into_timeline_for_test(self) -> Arc<Timeline> {
match self {
Self::Created(t) | Self::Idempotent(t) => t,
Self::Created(t) | Self::Idempotent(t) | Self::ImportSpawned(t) => t,
}
}
}
@@ -962,33 +1003,13 @@ pub enum CreateTimelineError {
}
#[derive(thiserror::Error, Debug)]
enum InitdbError {
Other(anyhow::Error),
pub enum InitdbError {
#[error("Operation was cancelled")]
Cancelled,
Spawn(std::io::Result<()>),
Failed(std::process::ExitStatus, Vec<u8>),
}
impl fmt::Display for InitdbError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
InitdbError::Cancelled => write!(f, "Operation was cancelled"),
InitdbError::Spawn(e) => write!(f, "Spawn error: {:?}", e),
InitdbError::Failed(status, stderr) => write!(
f,
"Command failed with status {:?}: {}",
status,
String::from_utf8_lossy(stderr)
),
InitdbError::Other(e) => write!(f, "Error: {:?}", e),
}
}
}
impl From<std::io::Error> for InitdbError {
fn from(error: std::io::Error) -> Self {
InitdbError::Spawn(Err(error))
}
#[error(transparent)]
Other(anyhow::Error),
#[error(transparent)]
Inner(postgres_initdb::Error),
}
enum CreateTimelineCause {
@@ -996,6 +1017,15 @@ enum CreateTimelineCause {
Delete,
}
enum LoadTimelineCause {
Attach,
Unoffload,
ImportPgdata {
create_guard: TimelineCreateGuard,
activate: ActivateTimelineArgs,
},
}
#[derive(thiserror::Error, Debug)]
pub(crate) enum GcError {
// The tenant is shutting down
@@ -1072,24 +1102,35 @@ impl Tenant {
/// it is marked as Active.
#[allow(clippy::too_many_arguments)]
async fn timeline_init_and_sync(
&self,
self: &Arc<Self>,
timeline_id: TimelineId,
resources: TimelineResources,
index_part: IndexPart,
mut index_part: IndexPart,
metadata: TimelineMetadata,
ancestor: Option<Arc<Timeline>>,
_ctx: &RequestContext,
) -> anyhow::Result<()> {
cause: LoadTimelineCause,
ctx: &RequestContext,
) -> anyhow::Result<TimelineInitAndSyncResult> {
let tenant_id = self.tenant_shard_id;
let idempotency = if metadata.ancestor_timeline().is_none() {
CreateTimelineIdempotency::Bootstrap {
pg_version: metadata.pg_version(),
let import_pgdata = index_part.import_pgdata.take();
let idempotency = match &import_pgdata {
Some(import_pgdata) => {
CreateTimelineIdempotency::ImportPgdata(CreatingTimelineIdempotencyImportPgdata {
idempotency_key: import_pgdata.idempotency_key().clone(),
})
}
} else {
CreateTimelineIdempotency::Branch {
ancestor_timeline_id: metadata.ancestor_timeline().unwrap(),
ancestor_start_lsn: metadata.ancestor_lsn(),
None => {
if metadata.ancestor_timeline().is_none() {
CreateTimelineIdempotency::Bootstrap {
pg_version: metadata.pg_version(),
}
} else {
CreateTimelineIdempotency::Branch {
ancestor_timeline_id: metadata.ancestor_timeline().unwrap(),
ancestor_start_lsn: metadata.ancestor_lsn(),
}
}
}
};
@@ -1121,39 +1162,91 @@ impl Tenant {
format!("Failed to load layermap for timeline {tenant_id}/{timeline_id}")
})?;
{
// avoiding holding it across awaits
let mut timelines_accessor = self.timelines.lock().unwrap();
match timelines_accessor.entry(timeline_id) {
// We should never try and load the same timeline twice during startup
Entry::Occupied(_) => {
unreachable!(
"Timeline {tenant_id}/{timeline_id} already exists in the tenant map"
);
match import_pgdata {
Some(import_pgdata) if !import_pgdata.is_done() => {
match cause {
LoadTimelineCause::Attach | LoadTimelineCause::Unoffload => (),
LoadTimelineCause::ImportPgdata { .. } => {
unreachable!("ImportPgdata should not be reloading timeline import is done and persisted as such in s3")
}
}
Entry::Vacant(v) => {
v.insert(Arc::clone(&timeline));
timeline.maybe_spawn_flush_loop();
let mut guard = self.timelines_creating.lock().unwrap();
if !guard.insert(timeline_id) {
// We should never try and load the same timeline twice during startup
unreachable!("Timeline {tenant_id}/{timeline_id} is already being created")
}
let timeline_create_guard = TimelineCreateGuard {
_tenant_gate_guard: self.gate.enter()?,
owning_tenant: self.clone(),
timeline_id,
idempotency,
// The users of this specific return value don't need the timline_path in there.
timeline_path: timeline
.conf
.timeline_path(&timeline.tenant_shard_id, &timeline.timeline_id),
};
Ok(TimelineInitAndSyncResult::NeedsSpawnImportPgdata(
TimelineInitAndSyncNeedsSpawnImportPgdata {
timeline,
import_pgdata,
guard: timeline_create_guard,
},
))
}
};
Some(_) | None => {
{
let mut timelines_accessor = self.timelines.lock().unwrap();
match timelines_accessor.entry(timeline_id) {
// We should never try and load the same timeline twice during startup
Entry::Occupied(_) => {
unreachable!(
"Timeline {tenant_id}/{timeline_id} already exists in the tenant map"
);
}
Entry::Vacant(v) => {
v.insert(Arc::clone(&timeline));
timeline.maybe_spawn_flush_loop();
}
}
}
// Sanity check: a timeline should have some content.
anyhow::ensure!(
ancestor.is_some()
|| timeline
.layers
.read()
.await
.layer_map()
.expect("currently loading, layer manager cannot be shutdown already")
.iter_historic_layers()
.next()
.is_some(),
"Timeline has no ancestor and no layer files"
);
// Sanity check: a timeline should have some content.
anyhow::ensure!(
ancestor.is_some()
|| timeline
.layers
.read()
.await
.layer_map()
.expect("currently loading, layer manager cannot be shutdown already")
.iter_historic_layers()
.next()
.is_some(),
"Timeline has no ancestor and no layer files"
);
Ok(())
match cause {
LoadTimelineCause::Attach | LoadTimelineCause::Unoffload => (),
LoadTimelineCause::ImportPgdata {
create_guard,
activate,
} => {
// TODO: see the comment in the task code above how I'm not so certain
// it is safe to activate here because of concurrent shutdowns.
match activate {
ActivateTimelineArgs::Yes { broker_client } => {
info!("activating timeline after reload from pgdata import task");
timeline.activate(self.clone(), broker_client, None, ctx);
}
ActivateTimelineArgs::No => (),
}
drop(create_guard);
}
}
Ok(TimelineInitAndSyncResult::ReadyToActivate(timeline))
}
}
}
/// Attach a tenant that's available in cloud storage.
@@ -1578,24 +1671,46 @@ impl Tenant {
}
// TODO again handle early failure
self.load_remote_timeline(
timeline_id,
index_part,
remote_metadata,
TimelineResources {
remote_client,
timeline_get_throttle: self.timeline_get_throttle.clone(),
l0_flush_global_state: self.l0_flush_global_state.clone(),
},
ctx,
)
.await
.with_context(|| {
format!(
"failed to load remote timeline {} for tenant {}",
timeline_id, self.tenant_shard_id
let effect = self
.load_remote_timeline(
timeline_id,
index_part,
remote_metadata,
TimelineResources {
remote_client,
timeline_get_throttle: self.timeline_get_throttle.clone(),
l0_flush_global_state: self.l0_flush_global_state.clone(),
},
LoadTimelineCause::Attach,
ctx,
)
})?;
.await
.with_context(|| {
format!(
"failed to load remote timeline {} for tenant {}",
timeline_id, self.tenant_shard_id
)
})?;
match effect {
TimelineInitAndSyncResult::ReadyToActivate(_) => {
// activation happens later, on Tenant::activate
}
TimelineInitAndSyncResult::NeedsSpawnImportPgdata(
TimelineInitAndSyncNeedsSpawnImportPgdata {
timeline,
import_pgdata,
guard,
},
) => {
tokio::task::spawn(self.clone().create_timeline_import_pgdata_task(
timeline,
import_pgdata,
ActivateTimelineArgs::No,
guard,
));
}
}
}
// Walk through deleted timelines, resume deletion
@@ -1719,13 +1834,14 @@ impl Tenant {
#[instrument(skip_all, fields(timeline_id=%timeline_id))]
async fn load_remote_timeline(
&self,
self: &Arc<Self>,
timeline_id: TimelineId,
index_part: IndexPart,
remote_metadata: TimelineMetadata,
resources: TimelineResources,
cause: LoadTimelineCause,
ctx: &RequestContext,
) -> anyhow::Result<()> {
) -> anyhow::Result<TimelineInitAndSyncResult> {
span::debug_assert_current_span_has_tenant_id();
info!("downloading index file for timeline {}", timeline_id);
@@ -1752,6 +1868,7 @@ impl Tenant {
index_part,
remote_metadata,
ancestor,
cause,
ctx,
)
.await
@@ -1938,6 +2055,7 @@ impl Tenant {
TimelineArchivalError::Other(anyhow::anyhow!("Timeline already exists"))
}
TimelineExclusionError::Other(e) => TimelineArchivalError::Other(e),
TimelineExclusionError::ShuttingDown => TimelineArchivalError::Cancelled,
})?;
let timeline_preload = self
@@ -1976,6 +2094,7 @@ impl Tenant {
index_part,
remote_metadata,
timeline_resources,
LoadTimelineCause::Unoffload,
&ctx,
)
.await
@@ -2213,7 +2332,7 @@ impl Tenant {
///
/// Tests should use `Tenant::create_test_timeline` to set up the minimum required metadata keys.
pub(crate) async fn create_empty_timeline(
&self,
self: &Arc<Self>,
new_timeline_id: TimelineId,
initdb_lsn: Lsn,
pg_version: u32,
@@ -2263,7 +2382,7 @@ impl Tenant {
// Our current tests don't need the background loops.
#[cfg(test)]
pub async fn create_test_timeline(
&self,
self: &Arc<Self>,
new_timeline_id: TimelineId,
initdb_lsn: Lsn,
pg_version: u32,
@@ -2302,7 +2421,7 @@ impl Tenant {
#[cfg(test)]
#[allow(clippy::too_many_arguments)]
pub async fn create_test_timeline_with_layers(
&self,
self: &Arc<Self>,
new_timeline_id: TimelineId,
initdb_lsn: Lsn,
pg_version: u32,
@@ -2439,6 +2558,16 @@ impl Tenant {
self.branch_timeline(&ancestor_timeline, new_timeline_id, ancestor_start_lsn, ctx)
.await?
}
CreateTimelineParams::ImportPgdata(params) => {
self.create_timeline_import_pgdata(
params,
ActivateTimelineArgs::Yes {
broker_client: broker_client.clone(),
},
ctx,
)
.await?
}
};
// At this point we have dropped our guard on [`Self::timelines_creating`], and
@@ -2481,11 +2610,202 @@ impl Tenant {
);
timeline
}
CreateTimelineResult::ImportSpawned(timeline) => {
info!("import task spawned, timeline will become visible and activated once the import is done");
timeline
}
};
Ok(activated_timeline)
}
/// The returned [`Arc<Timeline>`] is NOT in the [`Tenant::timelines`] map until the import
/// completes in the background. A DIFFERENT [`Arc<Timeline>`] will be inserted into the
/// [`Tenant::timelines`] map when the import completes.
/// We only return an [`Arc<Timeline>`] here so the API handler can create a [`pageserver_api::models::TimelineInfo`]
/// for the response.
async fn create_timeline_import_pgdata(
self: &Arc<Tenant>,
params: CreateTimelineParamsImportPgdata,
activate: ActivateTimelineArgs,
ctx: &RequestContext,
) -> Result<CreateTimelineResult, CreateTimelineError> {
let CreateTimelineParamsImportPgdata {
new_timeline_id,
location,
idempotency_key,
} = params;
let started_at = chrono::Utc::now().naive_utc();
//
// There's probably a simpler way to upload an index part, but, remote_timeline_client
// is the canonical way we do it.
// - create an empty timeline in-memory
// - use its remote_timeline_client to do the upload
// - dispose of the uninit timeline
// - keep the creation guard alive
let timeline_create_guard = match self
.start_creating_timeline(
new_timeline_id,
CreateTimelineIdempotency::ImportPgdata(CreatingTimelineIdempotencyImportPgdata {
idempotency_key: idempotency_key.clone(),
}),
)
.await?
{
StartCreatingTimelineResult::CreateGuard(guard) => guard,
StartCreatingTimelineResult::Idempotent(timeline) => {
return Ok(CreateTimelineResult::Idempotent(timeline))
}
};
let mut uninit_timeline = {
let this = &self;
let initdb_lsn = Lsn(0);
let _ctx = ctx;
async move {
let new_metadata = TimelineMetadata::new(
// Initialize disk_consistent LSN to 0, The caller must import some data to
// make it valid, before calling finish_creation()
Lsn(0),
None,
None,
Lsn(0),
initdb_lsn,
initdb_lsn,
15,
);
this.prepare_new_timeline(
new_timeline_id,
&new_metadata,
timeline_create_guard,
initdb_lsn,
None,
)
.await
}
}
.await?;
let in_progress = import_pgdata::index_part_format::InProgress {
idempotency_key,
location,
started_at,
};
let index_part = import_pgdata::index_part_format::Root::V1(
import_pgdata::index_part_format::V1::InProgress(in_progress),
);
uninit_timeline
.raw_timeline()
.unwrap()
.remote_client
.schedule_index_upload_for_import_pgdata_state_update(Some(index_part.clone()))?;
// wait_completion happens in caller
let (timeline, timeline_create_guard) = uninit_timeline.finish_creation_myself();
tokio::spawn(self.clone().create_timeline_import_pgdata_task(
timeline.clone(),
index_part,
activate,
timeline_create_guard,
));
// NB: the timeline doesn't exist in self.timelines at this point
Ok(CreateTimelineResult::ImportSpawned(timeline))
}
#[instrument(skip_all, fields(tenant_id=%self.tenant_shard_id.tenant_id, shard_id=%self.tenant_shard_id.shard_slug(), timeline_id=%timeline.timeline_id))]
async fn create_timeline_import_pgdata_task(
self: Arc<Tenant>,
timeline: Arc<Timeline>,
index_part: import_pgdata::index_part_format::Root,
activate: ActivateTimelineArgs,
timeline_create_guard: TimelineCreateGuard,
) {
debug_assert_current_span_has_tenant_and_timeline_id();
info!("starting");
scopeguard::defer! {info!("exiting")};
let res = self
.create_timeline_import_pgdata_task_impl(
timeline,
index_part,
activate,
timeline_create_guard,
)
.await;
if let Err(err) = &res {
error!(?err, "task failed");
// TODO sleep & retry, sensitive to tenant shutdown
// TODO: allow timeline deletion requests => should cancel the task
}
}
async fn create_timeline_import_pgdata_task_impl(
self: Arc<Tenant>,
timeline: Arc<Timeline>,
index_part: import_pgdata::index_part_format::Root,
activate: ActivateTimelineArgs,
timeline_create_guard: TimelineCreateGuard,
) -> Result<(), anyhow::Error> {
let ctx = RequestContext::new(TaskKind::ImportPgdata, DownloadBehavior::Warn);
info!("importing pgdata");
import_pgdata::doit(&timeline, index_part, &ctx, self.cancel.clone())
.await
.context("import")?;
info!("import done");
//
// Reload timeline from remote.
// This proves that the remote state is attachable, and it reuses the code.
//
// TODO: think about whether this is safe to do with concurrent Tenant::shutdown.
// timeline_create_guard hols the tenant gate open, so, shutdown cannot _complete_ until we exit.
// But our activate() call might launch new background tasks after Tenant::shutdown
// already went past shutting down the Tenant::timelines, which this timeline here is no part of.
// I think the same problem exists with the bootstrap & branch mgmt API tasks (tenant shutting
// down while bootstrapping/branching + activating), but, the race condition is much more likely
// to manifest because of the long runtime of this import task.
// in theory this shouldn't even .await anything except for coop yield
info!("shutting down timeline");
timeline.shutdown(ShutdownMode::Hard).await;
info!("timeline shut down, reloading from remote");
// TODO: we can't do the following check because create_timeline_import_pgdata must return an Arc<Timeline>
// let Some(timeline) = Arc::into_inner(timeline) else {
// anyhow::bail!("implementation error: timeline that we shut down was still referenced from somewhere");
// };
let timeline_id = timeline.timeline_id;
// load from object storage like Tenant::attach does
let resources = self.build_timeline_resources(timeline_id);
let index_part = resources
.remote_client
.download_index_file(&self.cancel)
.await?;
let index_part = match index_part {
MaybeDeletedIndexPart::Deleted(_) => {
// likely concurrent delete call, cplane should prevent this
anyhow::bail!("index part says deleted but we are not done creating yet, this should not happen but")
}
MaybeDeletedIndexPart::IndexPart(p) => p,
};
let metadata = index_part.metadata.clone();
self
.load_remote_timeline(timeline_id, index_part, metadata, resources, LoadTimelineCause::ImportPgdata{
create_guard: timeline_create_guard, activate, }, &ctx)
.await?
.ready_to_activate()
.context("implementation error: reloaded timeline still needs import after import reported success")?;
anyhow::Ok(())
}
pub(crate) async fn delete_timeline(
self: Arc<Self>,
timeline_id: TimelineId,
@@ -3337,6 +3657,13 @@ where
Ok(result)
}
enum ActivateTimelineArgs {
Yes {
broker_client: storage_broker::BrokerClientChannel,
},
No,
}
impl Tenant {
pub fn tenant_specific_overrides(&self) -> TenantConfOpt {
self.tenant_conf.load().tenant_conf.clone()
@@ -3520,6 +3847,7 @@ impl Tenant {
/// `validate_ancestor == false` is used when a timeline is created for deletion
/// and we might not have the ancestor present anymore which is fine for to be
/// deleted timelines.
#[allow(clippy::too_many_arguments)]
fn create_timeline_struct(
&self,
new_timeline_id: TimelineId,
@@ -4283,16 +4611,17 @@ impl Tenant {
/// If the timeline was already created in the meantime, we check whether this
/// request conflicts or is idempotent , based on `state`.
async fn start_creating_timeline(
&self,
self: &Arc<Self>,
new_timeline_id: TimelineId,
idempotency: CreateTimelineIdempotency,
) -> Result<StartCreatingTimelineResult<'_>, CreateTimelineError> {
) -> Result<StartCreatingTimelineResult, CreateTimelineError> {
let allow_offloaded = false;
match self.create_timeline_create_guard(new_timeline_id, idempotency, allow_offloaded) {
Ok(create_guard) => {
pausable_failpoint!("timeline-creation-after-uninit");
Ok(StartCreatingTimelineResult::CreateGuard(create_guard))
}
Err(TimelineExclusionError::ShuttingDown) => Err(CreateTimelineError::ShuttingDown),
Err(TimelineExclusionError::AlreadyCreating) => {
// Creation is in progress, we cannot create it again, and we cannot
// check if this request matches the existing one, so caller must try
@@ -4582,7 +4911,7 @@ impl Tenant {
&'a self,
new_timeline_id: TimelineId,
new_metadata: &TimelineMetadata,
create_guard: TimelineCreateGuard<'a>,
create_guard: TimelineCreateGuard,
start_lsn: Lsn,
ancestor: Option<Arc<Timeline>>,
) -> anyhow::Result<UninitializedTimeline<'a>> {
@@ -4642,7 +4971,7 @@ impl Tenant {
/// The `allow_offloaded` parameter controls whether to tolerate the existence of
/// offloaded timelines or not.
fn create_timeline_create_guard(
&self,
self: &Arc<Self>,
timeline_id: TimelineId,
idempotency: CreateTimelineIdempotency,
allow_offloaded: bool,
@@ -4902,48 +5231,16 @@ async fn run_initdb(
let _permit = INIT_DB_SEMAPHORE.acquire().await;
let mut initdb_command = tokio::process::Command::new(&initdb_bin_path);
initdb_command
.args(["--pgdata", initdb_target_dir.as_ref()])
.args(["--username", &conf.superuser])
.args(["--encoding", "utf8"])
.args(["--locale", &conf.locale])
.arg("--no-instructions")
.arg("--no-sync")
.env_clear()
.env("LD_LIBRARY_PATH", &initdb_lib_dir)
.env("DYLD_LIBRARY_PATH", &initdb_lib_dir)
.stdin(std::process::Stdio::null())
// stdout invocation produces the same output every time, we don't need it
.stdout(std::process::Stdio::null())
// we would be interested in the stderr output, if there was any
.stderr(std::process::Stdio::piped());
// Before version 14, only the libc provide was available.
if pg_version > 14 {
// Version 17 brought with it a builtin locale provider which only provides
// C and C.UTF-8. While being safer for collation purposes since it is
// guaranteed to be consistent throughout a major release, it is also more
// performant.
let locale_provider = if pg_version >= 17 { "builtin" } else { "libc" };
initdb_command.args(["--locale-provider", locale_provider]);
}
let initdb_proc = initdb_command.spawn()?;
// Ideally we'd select here with the cancellation token, but the problem is that
// we can't safely terminate initdb: it launches processes of its own, and killing
// initdb doesn't kill them. After we return from this function, we want the target
// directory to be able to be cleaned up.
// See https://github.com/neondatabase/neon/issues/6385
let initdb_output = initdb_proc.wait_with_output().await?;
if !initdb_output.status.success() {
return Err(InitdbError::Failed(
initdb_output.status,
initdb_output.stderr,
));
}
let res = postgres_initdb::do_run_initdb(postgres_initdb::RunInitdbArgs {
superuser: &conf.superuser,
locale: &conf.locale,
initdb_bin: &initdb_bin_path,
pg_version,
library_search_path: &initdb_lib_dir,
pgdata: initdb_target_dir,
})
.await
.map_err(InitdbError::Inner);
// This isn't true cancellation support, see above. Still return an error to
// excercise the cancellation code path.
@@ -4951,7 +5248,7 @@ async fn run_initdb(
return Err(InitdbError::Cancelled);
}
Ok(())
res
}
/// Dump contents of a layer file to stdout.

View File

@@ -199,7 +199,7 @@ use utils::backoff::{
use utils::pausable_failpoint;
use utils::shard::ShardNumber;
use std::collections::{HashMap, VecDeque};
use std::collections::{HashMap, HashSet, VecDeque};
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Arc, Mutex, OnceLock};
use std::time::Duration;
@@ -223,7 +223,7 @@ use crate::task_mgr::shutdown_token;
use crate::tenant::debug_assert_current_span_has_tenant_and_timeline_id;
use crate::tenant::remote_timeline_client::download::download_retry;
use crate::tenant::storage_layer::AsLayerDesc;
use crate::tenant::upload_queue::{Delete, UploadQueueStoppedDeletable};
use crate::tenant::upload_queue::{Delete, OpType, UploadQueueStoppedDeletable};
use crate::tenant::TIMELINES_SEGMENT_NAME;
use crate::{
config::PageServerConf,
@@ -244,6 +244,7 @@ use self::index::IndexPart;
use super::config::AttachedLocationConfig;
use super::metadata::MetadataUpdate;
use super::storage_layer::{Layer, LayerName, ResidentLayer};
use super::timeline::import_pgdata;
use super::upload_queue::{NotInitialized, SetDeletedFlagProgress};
use super::{DeleteTimelineError, Generation};
@@ -813,6 +814,18 @@ impl RemoteTimelineClient {
Ok(need_wait)
}
/// Launch an index-file upload operation in the background, setting `import_pgdata` field.
pub(crate) fn schedule_index_upload_for_import_pgdata_state_update(
self: &Arc<Self>,
state: Option<import_pgdata::index_part_format::Root>,
) -> anyhow::Result<()> {
let mut guard = self.upload_queue.lock().unwrap();
let upload_queue = guard.initialized_mut()?;
upload_queue.dirty.import_pgdata = state;
self.schedule_index_upload(upload_queue)?;
Ok(())
}
///
/// Launch an index-file upload operation in the background, if necessary.
///
@@ -1090,7 +1103,7 @@ impl RemoteTimelineClient {
"scheduled layer file upload {layer}",
);
let op = UploadOp::UploadLayer(layer, metadata);
let op = UploadOp::UploadLayer(layer, metadata, None);
self.metric_begin(&op);
upload_queue.queued_operations.push_back(op);
}
@@ -1805,7 +1818,7 @@ impl RemoteTimelineClient {
// have finished.
upload_queue.inprogress_tasks.is_empty()
}
UploadOp::Delete(_) => {
UploadOp::Delete(..) => {
// Wait for preceding uploads to finish. Concurrent deletions are OK, though.
upload_queue.num_inprogress_deletions == upload_queue.inprogress_tasks.len()
}
@@ -1833,19 +1846,32 @@ impl RemoteTimelineClient {
}
// We can launch this task. Remove it from the queue first.
let next_op = upload_queue.queued_operations.pop_front().unwrap();
let mut next_op = upload_queue.queued_operations.pop_front().unwrap();
debug!("starting op: {}", next_op);
// Update the counters
match next_op {
UploadOp::UploadLayer(_, _) => {
// Update the counters and prepare
match &mut next_op {
UploadOp::UploadLayer(layer, meta, mode) => {
if upload_queue
.recently_deleted
.remove(&(layer.layer_desc().layer_name().clone(), meta.generation))
{
*mode = Some(OpType::FlushDeletion);
} else {
*mode = Some(OpType::MayReorder)
}
upload_queue.num_inprogress_layer_uploads += 1;
}
UploadOp::UploadMetadata { .. } => {
upload_queue.num_inprogress_metadata_uploads += 1;
}
UploadOp::Delete(_) => {
UploadOp::Delete(Delete { layers }) => {
for (name, meta) in layers {
upload_queue
.recently_deleted
.insert((name.clone(), meta.generation));
}
upload_queue.num_inprogress_deletions += 1;
}
UploadOp::Barrier(sender) => {
@@ -1921,7 +1947,66 @@ impl RemoteTimelineClient {
}
let upload_result: anyhow::Result<()> = match &task.op {
UploadOp::UploadLayer(ref layer, ref layer_metadata) => {
UploadOp::UploadLayer(ref layer, ref layer_metadata, mode) => {
if let Some(OpType::FlushDeletion) = mode {
if self.config.read().unwrap().block_deletions {
// Of course, this is not efficient... but usually the queue should be empty.
let mut queue_locked = self.upload_queue.lock().unwrap();
let mut detected = false;
if let Ok(queue) = queue_locked.initialized_mut() {
for list in queue.blocked_deletions.iter_mut() {
list.layers.retain(|(name, meta)| {
if name == &layer.layer_desc().layer_name()
&& meta.generation == layer_metadata.generation
{
detected = true;
// remove the layer from deletion queue
false
} else {
// keep the layer
true
}
});
}
}
if detected {
info!(
"cancelled blocked deletion of layer {} at gen {:?}",
layer.layer_desc().layer_name(),
layer_metadata.generation
);
}
} else {
// TODO: we did not guarantee that upload task starts after deletion task, so there could be possibly race conditions
// that we still get the layer deleted. But this only happens if someone creates a layer immediately after it's deleted,
// which is not possible in the current system.
info!(
"waiting for deletion queue flush to complete before uploading layer {} at gen {:?}",
layer.layer_desc().layer_name(),
layer_metadata.generation
);
{
// We are going to flush, we can clean up the recently deleted list.
let mut queue_locked = self.upload_queue.lock().unwrap();
if let Ok(queue) = queue_locked.initialized_mut() {
queue.recently_deleted.clear();
}
}
if let Err(e) = self.deletion_queue_client.flush_execute().await {
warn!(
"failed to flush the deletion queue before uploading layer {} at gen {:?}, still proceeding to upload: {e:#} ",
layer.layer_desc().layer_name(),
layer_metadata.generation
);
} else {
info!(
"done flushing deletion queue before uploading layer {} at gen {:?}",
layer.layer_desc().layer_name(),
layer_metadata.generation
);
}
}
}
let local_path = layer.local_path();
// We should only be uploading layers created by this `Tenant`'s lifetime, so
@@ -2085,7 +2170,7 @@ impl RemoteTimelineClient {
upload_queue.inprogress_tasks.remove(&task.task_id);
let lsn_update = match task.op {
UploadOp::UploadLayer(_, _) => {
UploadOp::UploadLayer(_, _, _) => {
upload_queue.num_inprogress_layer_uploads -= 1;
None
}
@@ -2162,7 +2247,7 @@ impl RemoteTimelineClient {
)> {
use RemoteTimelineClientMetricsCallTrackSize::DontTrackSize;
let res = match op {
UploadOp::UploadLayer(_, m) => (
UploadOp::UploadLayer(_, m, _) => (
RemoteOpFileKind::Layer,
RemoteOpKind::Upload,
RemoteTimelineClientMetricsCallTrackSize::Bytes(m.file_size),
@@ -2259,6 +2344,7 @@ impl RemoteTimelineClient {
blocked_deletions: Vec::new(),
shutting_down: false,
shutdown_ready: Arc::new(tokio::sync::Semaphore::new(0)),
recently_deleted: HashSet::new(),
};
let upload_queue = std::mem::replace(

View File

@@ -706,7 +706,7 @@ where
.and_then(|x| x)
}
async fn download_retry_forever<T, O, F>(
pub(crate) async fn download_retry_forever<T, O, F>(
op: O,
description: &str,
cancel: &CancellationToken,

View File

@@ -12,6 +12,7 @@ use utils::id::TimelineId;
use crate::tenant::metadata::TimelineMetadata;
use crate::tenant::storage_layer::LayerName;
use crate::tenant::timeline::import_pgdata;
use crate::tenant::Generation;
use pageserver_api::shard::ShardIndex;
@@ -37,6 +38,13 @@ pub struct IndexPart {
#[serde(skip_serializing_if = "Option::is_none")]
pub archived_at: Option<NaiveDateTime>,
/// This field supports import-from-pgdata ("fast imports" platform feature).
/// We don't currently use fast imports, so, this field is None for all production timelines.
/// See <https://github.com/neondatabase/neon/pull/9218> for more information.
#[serde(default)]
#[serde(skip_serializing_if = "Option::is_none")]
pub import_pgdata: Option<import_pgdata::index_part_format::Root>,
/// Per layer file name metadata, which can be present for a present or missing layer file.
///
/// Older versions of `IndexPart` will not have this property or have only a part of metadata
@@ -90,10 +98,11 @@ impl IndexPart {
/// - 7: metadata_bytes is no longer written, but still read
/// - 8: added `archived_at`
/// - 9: +gc_blocking
const LATEST_VERSION: usize = 9;
/// - 10: +import_pgdata
const LATEST_VERSION: usize = 10;
// Versions we may see when reading from a bucket.
pub const KNOWN_VERSIONS: &'static [usize] = &[1, 2, 3, 4, 5, 6, 7, 8, 9];
pub const KNOWN_VERSIONS: &'static [usize] = &[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
pub const FILE_NAME: &'static str = "index_part.json";
@@ -108,6 +117,7 @@ impl IndexPart {
lineage: Default::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
}
}
@@ -381,6 +391,7 @@ mod tests {
lineage: Lineage::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -425,6 +436,7 @@ mod tests {
lineage: Lineage::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -470,6 +482,7 @@ mod tests {
lineage: Lineage::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -518,6 +531,7 @@ mod tests {
lineage: Lineage::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let empty_layers_parsed = IndexPart::from_json_bytes(empty_layers_json.as_bytes()).unwrap();
@@ -561,6 +575,7 @@ mod tests {
lineage: Lineage::default(),
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -607,6 +622,7 @@ mod tests {
},
gc_blocking: None,
last_aux_file_policy: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -658,6 +674,7 @@ mod tests {
},
gc_blocking: None,
last_aux_file_policy: Some(AuxFilePolicy::V2),
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -714,6 +731,7 @@ mod tests {
lineage: Default::default(),
gc_blocking: None,
last_aux_file_policy: Default::default(),
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -771,6 +789,7 @@ mod tests {
lineage: Default::default(),
gc_blocking: None,
last_aux_file_policy: Default::default(),
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
@@ -833,6 +852,83 @@ mod tests {
}),
last_aux_file_policy: Default::default(),
archived_at: None,
import_pgdata: None,
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();
assert_eq!(part, expected);
}
#[test]
fn v10_importpgdata_is_parsed() {
let example = r#"{
"version": 10,
"layer_metadata":{
"000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9": { "file_size": 25600000 },
"000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016B59D8-00000000016B5A51": { "file_size": 9007199254741001 }
},
"disk_consistent_lsn":"0/16960E8",
"metadata": {
"disk_consistent_lsn": "0/16960E8",
"prev_record_lsn": "0/1696070",
"ancestor_timeline": "e45a7f37d3ee2ff17dc14bf4f4e3f52e",
"ancestor_lsn": "0/0",
"latest_gc_cutoff_lsn": "0/1696070",
"initdb_lsn": "0/1696070",
"pg_version": 14
},
"gc_blocking": {
"started_at": "2024-07-19T09:00:00.123",
"reasons": ["DetachAncestor"]
},
"import_pgdata": {
"V1": {
"Done": {
"idempotency_key": "specified-by-client-218a5213-5044-4562-a28d-d024c5f057f5",
"started_at": "2024-11-13T09:23:42.123",
"finished_at": "2024-11-13T09:42:23.123"
}
}
}
}"#;
let expected = IndexPart {
version: 10,
layer_metadata: HashMap::from([
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap(), LayerFileMetadata {
file_size: 25600000,
generation: Generation::none(),
shard: ShardIndex::unsharded()
}),
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016B59D8-00000000016B5A51".parse().unwrap(), LayerFileMetadata {
file_size: 9007199254741001,
generation: Generation::none(),
shard: ShardIndex::unsharded()
})
]),
disk_consistent_lsn: "0/16960E8".parse::<Lsn>().unwrap(),
metadata: TimelineMetadata::new(
Lsn::from_str("0/16960E8").unwrap(),
Some(Lsn::from_str("0/1696070").unwrap()),
Some(TimelineId::from_str("e45a7f37d3ee2ff17dc14bf4f4e3f52e").unwrap()),
Lsn::INVALID,
Lsn::from_str("0/1696070").unwrap(),
Lsn::from_str("0/1696070").unwrap(),
14,
).with_recalculated_checksum().unwrap(),
deleted_at: None,
lineage: Default::default(),
gc_blocking: Some(GcBlocking {
started_at: parse_naive_datetime("2024-07-19T09:00:00.123000000"),
reasons: enumset::EnumSet::from_iter([GcBlockingReason::DetachAncestor]),
}),
last_aux_file_policy: Default::default(),
archived_at: None,
import_pgdata: Some(import_pgdata::index_part_format::Root::V1(import_pgdata::index_part_format::V1::Done(import_pgdata::index_part_format::Done{
started_at: parse_naive_datetime("2024-11-13T09:23:42.123000000"),
finished_at: parse_naive_datetime("2024-11-13T09:42:23.123000000"),
idempotency_key: import_pgdata::index_part_format::IdempotencyKey::new("specified-by-client-218a5213-5044-4562-a28d-d024c5f057f5".to_string()),
})))
};
let part = IndexPart::from_json_bytes(example.as_bytes()).unwrap();

View File

@@ -4,6 +4,7 @@ pub mod delete;
pub(crate) mod detach_ancestor;
mod eviction_task;
pub(crate) mod handle;
pub(crate) mod import_pgdata;
mod init;
pub mod layer_manager;
pub(crate) mod logical_size;
@@ -2085,6 +2086,11 @@ impl Timeline {
.unwrap_or(self.conf.default_tenant_conf.lsn_lease_length_for_ts)
}
pub(crate) fn is_gc_blocked_by_lsn_lease_deadline(&self) -> bool {
let tenant_conf = self.tenant_conf.load();
tenant_conf.is_gc_blocked_by_lsn_lease_deadline()
}
pub(crate) fn get_lazy_slru_download(&self) -> bool {
let tenant_conf = self.tenant_conf.load();
tenant_conf
@@ -2647,6 +2653,7 @@ impl Timeline {
//
// NB: generation numbers naturally protect against this because they disambiguate
// (1) and (4)
// TODO: this is basically a no-op now, should we remove it?
self.remote_client.schedule_barrier()?;
// Tenant::create_timeline will wait for these uploads to happen before returning, or
// on retry.
@@ -2702,20 +2709,23 @@ impl Timeline {
{
Some(cancel) => cancel.cancel(),
None => {
let state = self.current_state();
if matches!(
state,
TimelineState::Broken { .. } | TimelineState::Stopping
) {
// Can happen when timeline detail endpoint is used when deletion is ongoing (or its broken).
// Don't make noise.
} else {
warn!("unexpected: cancel_wait_for_background_loop_concurrency_limit_semaphore not set, priority-boosting of logical size calculation will not work");
debug_assert!(false);
match self.current_state() {
TimelineState::Broken { .. } | TimelineState::Stopping => {
// Can happen when timeline detail endpoint is used when deletion is ongoing (or its broken).
// Don't make noise.
}
TimelineState::Loading => {
// Import does not return an activated timeline.
info!("discarding priority boost for logical size calculation because timeline is not yet active");
}
TimelineState::Active => {
// activation should be setting the once cell
warn!("unexpected: cancel_wait_for_background_loop_concurrency_limit_semaphore not set, priority-boosting of logical size calculation will not work");
debug_assert!(false);
}
}
}
};
}
}
}

View File

@@ -0,0 +1,218 @@
use std::sync::Arc;
use anyhow::{bail, Context};
use remote_storage::RemotePath;
use tokio_util::sync::CancellationToken;
use tracing::{info, info_span, Instrument};
use utils::lsn::Lsn;
use crate::{context::RequestContext, tenant::metadata::TimelineMetadata};
use super::Timeline;
mod flow;
mod importbucket_client;
mod importbucket_format;
pub(crate) mod index_part_format;
pub(crate) mod upcall_api;
pub async fn doit(
timeline: &Arc<Timeline>,
index_part: index_part_format::Root,
ctx: &RequestContext,
cancel: CancellationToken,
) -> anyhow::Result<()> {
let index_part_format::Root::V1(v1) = index_part;
let index_part_format::InProgress {
location,
idempotency_key,
started_at,
} = match v1 {
index_part_format::V1::Done(_) => return Ok(()),
index_part_format::V1::InProgress(in_progress) => in_progress,
};
let storage = importbucket_client::new(timeline.conf, &location, cancel.clone()).await?;
info!("get spec early so we know we'll be able to upcall when done");
let Some(spec) = storage.get_spec().await? else {
bail!("spec not found")
};
let upcall_client =
upcall_api::Client::new(timeline.conf, cancel.clone()).context("create upcall client")?;
//
// send an early progress update to clean up k8s job early and generate potentially useful logs
//
info!("send early progress update");
upcall_client
.send_progress_until_success(&spec)
.instrument(info_span!("early_progress_update"))
.await?;
let status_prefix = RemotePath::from_string("status").unwrap();
//
// See if shard is done.
// TODO: incorporate generations into status key for split brain safety. Figure out together with checkpointing.
//
let shard_status_key =
status_prefix.join(format!("shard-{}", timeline.tenant_shard_id.shard_slug()));
let shard_status: Option<importbucket_format::ShardStatus> =
storage.get_json(&shard_status_key).await?;
info!(?shard_status, "peeking shard status");
if shard_status.map(|st| st.done).unwrap_or(false) {
info!("shard status indicates that the shard is done, skipping import");
} else {
// TODO: checkpoint the progress into the IndexPart instead of restarting
// from the beginning.
//
// Wipe the slate clean - the flow does not allow resuming.
// We can implement resuming in the future by checkpointing the progress into the IndexPart.
//
info!("wipe the slate clean");
{
// TODO: do we need to hold GC lock for this?
let mut guard = timeline.layers.write().await;
assert!(
guard.layer_map()?.open_layer.is_none(),
"while importing, there should be no in-memory layer" // this just seems like a good place to assert it
);
let all_layers_keys = guard.all_persistent_layers();
let all_layers: Vec<_> = all_layers_keys
.iter()
.map(|key| guard.get_from_key(key))
.collect();
let open = guard.open_mut().context("open_mut")?;
timeline.remote_client.schedule_gc_update(&all_layers)?;
open.finish_gc_timeline(&all_layers);
}
//
// Wait for pgdata to finish uploading
//
info!("wait for pgdata to reach status 'done'");
let pgdata_status_key = status_prefix.join("pgdata");
loop {
let res = async {
let pgdata_status: Option<importbucket_format::PgdataStatus> = storage
.get_json(&pgdata_status_key)
.await
.context("get pgdata status")?;
info!(?pgdata_status, "peeking pgdata status");
if pgdata_status.map(|st| st.done).unwrap_or(false) {
Ok(())
} else {
Err(anyhow::anyhow!("pgdata not done yet"))
}
}
.await;
match res {
Ok(_) => break,
Err(err) => {
info!(?err, "indefintely waiting for pgdata to finish");
if tokio::time::timeout(std::time::Duration::from_secs(10), cancel.cancelled())
.await
.is_ok()
{
bail!("cancelled while waiting for pgdata");
}
}
}
}
//
// Do the import
//
info!("do the import");
let control_file = storage.get_control_file().await?;
let base_lsn = control_file.base_lsn();
info!("update TimelineMetadata based on LSNs from control file");
{
let pg_version = control_file.pg_version();
let _ctx: &RequestContext = ctx;
async move {
// FIXME: The 'disk_consistent_lsn' should be the LSN at the *end* of the
// checkpoint record, and prev_record_lsn should point to its beginning.
// We should read the real end of the record from the WAL, but here we
// just fake it.
let disk_consistent_lsn = Lsn(base_lsn.0 + 8);
let prev_record_lsn = base_lsn;
let metadata = TimelineMetadata::new(
disk_consistent_lsn,
Some(prev_record_lsn),
None, // no ancestor
Lsn(0), // no ancestor lsn
base_lsn, // latest_gc_cutoff_lsn
base_lsn, // initdb_lsn
pg_version,
);
let _start_lsn = disk_consistent_lsn + 1;
timeline
.remote_client
.schedule_index_upload_for_full_metadata_update(&metadata)?;
timeline.remote_client.wait_completion().await?;
anyhow::Ok(())
}
}
.await?;
flow::run(
timeline.clone(),
base_lsn,
control_file,
storage.clone(),
ctx,
)
.await?;
//
// Communicate that shard is done.
//
storage
.put_json(
&shard_status_key,
&importbucket_format::ShardStatus { done: true },
)
.await
.context("put shard status")?;
}
//
// Ensure at-least-once deliver of the upcall to cplane
// before we mark the task as done and never come here again.
//
info!("send final progress update");
upcall_client
.send_progress_until_success(&spec)
.instrument(info_span!("final_progress_update"))
.await?;
//
// Mark as done in index_part.
// This makes subsequent timeline loads enter the normal load code path
// instead of spawning the import task and calling this here function.
//
info!("mark import as complete in index part");
timeline
.remote_client
.schedule_index_upload_for_import_pgdata_state_update(Some(index_part_format::Root::V1(
index_part_format::V1::Done(index_part_format::Done {
idempotency_key,
started_at,
finished_at: chrono::Utc::now().naive_utc(),
}),
)))?;
timeline.remote_client.wait_completion().await?;
Ok(())
}

View File

@@ -0,0 +1,798 @@
//! Import a PGDATA directory into an empty root timeline.
//!
//! This module is adapted hackathon code by Heikki and Stas.
//! Other code in the parent module was written by Christian as part of a customer PoC.
//!
//! The hackathon code was producing image layer files as a free-standing program.
//!
//! It has been modified to
//! - run inside a running Pageserver, within the proper lifecycles of Timeline -> Tenant(Shard)
//! - => sharding-awareness: produce image layers with only the data relevant for this shard
//! - => S3 as the source for the PGDATA instead of local filesystem
//!
//! TODOs before productionization:
//! - ChunkProcessingJob size / ImportJob::total_size does not account for sharding.
//! => produced image layers likely too small.
//! - ChunkProcessingJob should cut up an ImportJob to hit exactly target image layer size.
//! - asserts / unwraps need to be replaced with errors
//! - don't trust remote objects will be small (=prevent OOMs in those cases)
//! - limit all in-memory buffers in size, or download to disk and read from there
//! - limit task concurrency
//! - generally play nice with other tenants in the system
//! - importbucket is different bucket than main pageserver storage, so, should be fine wrt S3 rate limits
//! - but concerns like network bandwidth, local disk write bandwidth, local disk capacity, etc
//! - integrate with layer eviction system
//! - audit for Tenant::cancel nor Timeline::cancel responsivity
//! - audit for Tenant/Timeline gate holding (we spawn tokio tasks during this flow!)
//!
//! An incomplete set of TODOs from the Hackathon:
//! - version-specific CheckPointData (=> pgv abstraction, already exists for regular walingest)
use std::sync::Arc;
use anyhow::{bail, ensure};
use bytes::Bytes;
use itertools::Itertools;
use pageserver_api::{
key::{rel_block_to_key, rel_dir_to_key, rel_size_to_key, relmap_file_key, DBDIR_KEY},
reltag::RelTag,
shard::ShardIdentity,
};
use postgres_ffi::{pg_constants, relfile_utils::parse_relfilename, BLCKSZ};
use tokio::task::JoinSet;
use tracing::{debug, info_span, instrument, Instrument};
use crate::{
assert_u64_eq_usize::UsizeIsU64,
pgdatadir_mapping::{SlruSegmentDirectory, TwoPhaseDirectory},
};
use crate::{
context::{DownloadBehavior, RequestContext},
pgdatadir_mapping::{DbDirectory, RelDirectory},
task_mgr::TaskKind,
tenant::storage_layer::{ImageLayerWriter, Layer},
};
use pageserver_api::key::Key;
use pageserver_api::key::{
slru_block_to_key, slru_dir_to_key, slru_segment_size_to_key, CHECKPOINT_KEY, CONTROLFILE_KEY,
TWOPHASEDIR_KEY,
};
use pageserver_api::keyspace::singleton_range;
use pageserver_api::keyspace::{contiguous_range_len, is_contiguous_range};
use pageserver_api::reltag::SlruKind;
use utils::bin_ser::BeSer;
use utils::lsn::Lsn;
use std::collections::HashSet;
use std::ops::Range;
use super::{
importbucket_client::{ControlFile, RemoteStorageWrapper},
Timeline,
};
use remote_storage::RemotePath;
pub async fn run(
timeline: Arc<Timeline>,
pgdata_lsn: Lsn,
control_file: ControlFile,
storage: RemoteStorageWrapper,
ctx: &RequestContext,
) -> anyhow::Result<()> {
Flow {
timeline,
pgdata_lsn,
control_file,
tasks: Vec::new(),
storage,
}
.run(ctx)
.await
}
struct Flow {
timeline: Arc<Timeline>,
pgdata_lsn: Lsn,
control_file: ControlFile,
tasks: Vec<AnyImportTask>,
storage: RemoteStorageWrapper,
}
impl Flow {
/// Perform the ingestion into [`Self::timeline`].
/// Assumes the timeline is empty (= no layers).
pub async fn run(mut self, ctx: &RequestContext) -> anyhow::Result<()> {
let pgdata_lsn = Lsn(self.control_file.control_file_data().checkPoint).align();
self.pgdata_lsn = pgdata_lsn;
let datadir = PgDataDir::new(&self.storage).await?;
// Import dbdir (00:00:00 keyspace)
// This is just constructed here, but will be written to the image layer in the first call to import_db()
let dbdir_buf = Bytes::from(DbDirectory::ser(&DbDirectory {
dbdirs: datadir
.dbs
.iter()
.map(|db| ((db.spcnode, db.dboid), true))
.collect(),
})?);
self.tasks
.push(ImportSingleKeyTask::new(DBDIR_KEY, dbdir_buf).into());
// Import databases (00:spcnode:dbnode keyspace for each db)
for db in datadir.dbs {
self.import_db(&db).await?;
}
// Import SLRUs
// pg_xact (01:00 keyspace)
self.import_slru(SlruKind::Clog, &self.storage.pgdata().join("pg_xact"))
.await?;
// pg_multixact/members (01:01 keyspace)
self.import_slru(
SlruKind::MultiXactMembers,
&self.storage.pgdata().join("pg_multixact/members"),
)
.await?;
// pg_multixact/offsets (01:02 keyspace)
self.import_slru(
SlruKind::MultiXactOffsets,
&self.storage.pgdata().join("pg_multixact/offsets"),
)
.await?;
// Import pg_twophase.
// TODO: as empty
let twophasedir_buf = TwoPhaseDirectory::ser(&TwoPhaseDirectory {
xids: HashSet::new(),
})?;
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
TWOPHASEDIR_KEY,
Bytes::from(twophasedir_buf),
)));
// Controlfile, checkpoint
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
CONTROLFILE_KEY,
self.control_file.control_file_buf().clone(),
)));
let checkpoint_buf = self
.control_file
.control_file_data()
.checkPointCopy
.encode()?;
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
CHECKPOINT_KEY,
checkpoint_buf,
)));
// Assigns parts of key space to later parallel jobs
let mut last_end_key = Key::MIN;
let mut current_chunk = Vec::new();
let mut current_chunk_size: usize = 0;
let mut parallel_jobs = Vec::new();
for task in std::mem::take(&mut self.tasks).into_iter() {
if current_chunk_size + task.total_size() > 1024 * 1024 * 1024 {
let key_range = last_end_key..task.key_range().start;
parallel_jobs.push(ChunkProcessingJob::new(
key_range.clone(),
std::mem::take(&mut current_chunk),
&self,
));
last_end_key = key_range.end;
current_chunk_size = 0;
}
current_chunk_size += task.total_size();
current_chunk.push(task);
}
parallel_jobs.push(ChunkProcessingJob::new(
last_end_key..Key::MAX,
current_chunk,
&self,
));
// Start all jobs simultaneosly
let mut work = JoinSet::new();
// TODO: semaphore?
for job in parallel_jobs {
let ctx: RequestContext =
ctx.detached_child(TaskKind::ImportPgdata, DownloadBehavior::Error);
work.spawn(async move { job.run(&ctx).await }.instrument(info_span!("parallel_job")));
}
let mut results = Vec::new();
while let Some(result) = work.join_next().await {
match result {
Ok(res) => {
results.push(res);
}
Err(_joinset_err) => {
results.push(Err(anyhow::anyhow!(
"parallel job panicked or cancelled, check pageserver logs"
)));
}
}
}
if results.iter().all(|r| r.is_ok()) {
Ok(())
} else {
let mut msg = String::new();
for result in results {
if let Err(err) = result {
msg.push_str(&format!("{err:?}\n\n"));
}
}
bail!("Some parallel jobs failed:\n\n{msg}");
}
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(dboid=%db.dboid, tablespace=%db.spcnode, path=%db.path))]
async fn import_db(&mut self, db: &PgDataDirDb) -> anyhow::Result<()> {
debug!("start");
scopeguard::defer! {
debug!("return");
}
// Import relmap (00:spcnode:dbnode:00:*:00)
let relmap_key = relmap_file_key(db.spcnode, db.dboid);
debug!("Constructing relmap entry, key {relmap_key}");
let relmap_path = db.path.join("pg_filenode.map");
let relmap_buf = self.storage.get(&relmap_path).await?;
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
relmap_key, relmap_buf,
)));
// Import reldir (00:spcnode:dbnode:00:*:01)
let reldir_key = rel_dir_to_key(db.spcnode, db.dboid);
debug!("Constructing reldirs entry, key {reldir_key}");
let reldir_buf = RelDirectory::ser(&RelDirectory {
rels: db
.files
.iter()
.map(|f| (f.rel_tag.relnode, f.rel_tag.forknum))
.collect(),
})?;
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
reldir_key,
Bytes::from(reldir_buf),
)));
// Import data (00:spcnode:dbnode:reloid:fork:blk) and set sizes for each last
// segment in a given relation (00:spcnode:dbnode:reloid:fork:ff)
for file in &db.files {
debug!(%file.path, %file.filesize, "importing file");
let len = file.filesize;
ensure!(len % 8192 == 0);
let start_blk: u32 = file.segno * (1024 * 1024 * 1024 / 8192);
let start_key = rel_block_to_key(file.rel_tag, start_blk);
let end_key = rel_block_to_key(file.rel_tag, start_blk + (len / 8192) as u32);
self.tasks
.push(AnyImportTask::RelBlocks(ImportRelBlocksTask::new(
*self.timeline.get_shard_identity(),
start_key..end_key,
&file.path,
self.storage.clone(),
)));
// Set relsize for the last segment (00:spcnode:dbnode:reloid:fork:ff)
if let Some(nblocks) = file.nblocks {
let size_key = rel_size_to_key(file.rel_tag);
//debug!("Setting relation size (path={path}, rel_tag={rel_tag}, segno={segno}) to {nblocks}, key {size_key}");
let buf = nblocks.to_le_bytes();
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
size_key,
Bytes::from(buf.to_vec()),
)));
}
}
Ok(())
}
async fn import_slru(&mut self, kind: SlruKind, path: &RemotePath) -> anyhow::Result<()> {
let segments = self.storage.listfilesindir(path).await?;
let segments: Vec<(String, u32, usize)> = segments
.into_iter()
.filter_map(|(path, size)| {
let filename = path.object_name()?;
let segno = u32::from_str_radix(filename, 16).ok()?;
Some((filename.to_string(), segno, size))
})
.collect();
// Write SlruDir
let slrudir_key = slru_dir_to_key(kind);
let segnos: HashSet<u32> = segments
.iter()
.map(|(_path, segno, _size)| *segno)
.collect();
let slrudir = SlruSegmentDirectory { segments: segnos };
let slrudir_buf = SlruSegmentDirectory::ser(&slrudir)?;
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
slrudir_key,
Bytes::from(slrudir_buf),
)));
for (segpath, segno, size) in segments {
// SlruSegBlocks for each segment
let p = path.join(&segpath);
let file_size = size;
ensure!(file_size % 8192 == 0);
let nblocks = u32::try_from(file_size / 8192)?;
let start_key = slru_block_to_key(kind, segno, 0);
let end_key = slru_block_to_key(kind, segno, nblocks);
debug!(%p, segno=%segno, %size, %start_key, %end_key, "scheduling SLRU segment");
self.tasks
.push(AnyImportTask::SlruBlocks(ImportSlruBlocksTask::new(
*self.timeline.get_shard_identity(),
start_key..end_key,
&p,
self.storage.clone(),
)));
// Followed by SlruSegSize
let segsize_key = slru_segment_size_to_key(kind, segno);
let segsize_buf = nblocks.to_le_bytes();
self.tasks
.push(AnyImportTask::SingleKey(ImportSingleKeyTask::new(
segsize_key,
Bytes::copy_from_slice(&segsize_buf),
)));
}
Ok(())
}
}
//
// dbdir iteration tools
//
struct PgDataDir {
pub dbs: Vec<PgDataDirDb>, // spcnode, dboid, path
}
struct PgDataDirDb {
pub spcnode: u32,
pub dboid: u32,
pub path: RemotePath,
pub files: Vec<PgDataDirDbFile>,
}
struct PgDataDirDbFile {
pub path: RemotePath,
pub rel_tag: RelTag,
pub segno: u32,
pub filesize: usize,
// Cummulative size of the given fork, set only for the last segment of that fork
pub nblocks: Option<usize>,
}
impl PgDataDir {
async fn new(storage: &RemoteStorageWrapper) -> anyhow::Result<Self> {
let datadir_path = storage.pgdata();
// Import ordinary databases, DEFAULTTABLESPACE_OID is smaller than GLOBALTABLESPACE_OID, so import them first
// Traverse database in increasing oid order
let basedir = &datadir_path.join("base");
let db_oids: Vec<_> = storage
.listdir(basedir)
.await?
.into_iter()
.filter_map(|path| path.object_name().and_then(|name| name.parse::<u32>().ok()))
.sorted()
.collect();
debug!(?db_oids, "found databases");
let mut databases = Vec::new();
for dboid in db_oids {
databases.push(
PgDataDirDb::new(
storage,
&basedir.join(dboid.to_string()),
pg_constants::DEFAULTTABLESPACE_OID,
dboid,
&datadir_path,
)
.await?,
);
}
// special case for global catalogs
databases.push(
PgDataDirDb::new(
storage,
&datadir_path.join("global"),
postgres_ffi::pg_constants::GLOBALTABLESPACE_OID,
0,
&datadir_path,
)
.await?,
);
databases.sort_by_key(|db| (db.spcnode, db.dboid));
Ok(Self { dbs: databases })
}
}
impl PgDataDirDb {
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%dboid, %db_path))]
async fn new(
storage: &RemoteStorageWrapper,
db_path: &RemotePath,
spcnode: u32,
dboid: u32,
datadir_path: &RemotePath,
) -> anyhow::Result<Self> {
let mut files: Vec<PgDataDirDbFile> = storage
.listfilesindir(db_path)
.await?
.into_iter()
.filter_map(|(path, size)| {
debug!(%path, %size, "found file in dbdir");
path.object_name().and_then(|name| {
// returns (relnode, forknum, segno)
parse_relfilename(name).ok().map(|x| (size, x))
})
})
.sorted_by_key(|(_, relfilename)| *relfilename)
.map(|(filesize, (relnode, forknum, segno))| {
let rel_tag = RelTag {
spcnode,
dbnode: dboid,
relnode,
forknum,
};
let path = datadir_path.join(rel_tag.to_segfile_name(segno));
assert!(filesize % BLCKSZ as usize == 0); // TODO: this should result in an error
let nblocks = filesize / BLCKSZ as usize;
PgDataDirDbFile {
path,
filesize,
rel_tag,
segno,
nblocks: Some(nblocks), // first non-cummulative sizes
}
})
.collect();
// Set cummulative sizes. Do all of that math here, so that later we could easier
// parallelize over segments and know with which segments we need to write relsize
// entry.
let mut cumulative_nblocks: usize = 0;
let mut prev_rel_tag: Option<RelTag> = None;
for i in 0..files.len() {
if prev_rel_tag == Some(files[i].rel_tag) {
cumulative_nblocks += files[i].nblocks.unwrap();
} else {
cumulative_nblocks = files[i].nblocks.unwrap();
}
files[i].nblocks = if i == files.len() - 1 || files[i + 1].rel_tag != files[i].rel_tag {
Some(cumulative_nblocks)
} else {
None
};
prev_rel_tag = Some(files[i].rel_tag);
}
Ok(PgDataDirDb {
files,
path: db_path.clone(),
spcnode,
dboid,
})
}
}
trait ImportTask {
fn key_range(&self) -> Range<Key>;
fn total_size(&self) -> usize {
// TODO: revisit this
if is_contiguous_range(&self.key_range()) {
contiguous_range_len(&self.key_range()) as usize * 8192
} else {
u32::MAX as usize
}
}
async fn doit(
self,
layer_writer: &mut ImageLayerWriter,
ctx: &RequestContext,
) -> anyhow::Result<usize>;
}
struct ImportSingleKeyTask {
key: Key,
buf: Bytes,
}
impl ImportSingleKeyTask {
fn new(key: Key, buf: Bytes) -> Self {
ImportSingleKeyTask { key, buf }
}
}
impl ImportTask for ImportSingleKeyTask {
fn key_range(&self) -> Range<Key> {
singleton_range(self.key)
}
async fn doit(
self,
layer_writer: &mut ImageLayerWriter,
ctx: &RequestContext,
) -> anyhow::Result<usize> {
layer_writer.put_image(self.key, self.buf, ctx).await?;
Ok(1)
}
}
struct ImportRelBlocksTask {
shard_identity: ShardIdentity,
key_range: Range<Key>,
path: RemotePath,
storage: RemoteStorageWrapper,
}
impl ImportRelBlocksTask {
fn new(
shard_identity: ShardIdentity,
key_range: Range<Key>,
path: &RemotePath,
storage: RemoteStorageWrapper,
) -> Self {
ImportRelBlocksTask {
shard_identity,
key_range,
path: path.clone(),
storage,
}
}
}
impl ImportTask for ImportRelBlocksTask {
fn key_range(&self) -> Range<Key> {
self.key_range.clone()
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%self.path))]
async fn doit(
self,
layer_writer: &mut ImageLayerWriter,
ctx: &RequestContext,
) -> anyhow::Result<usize> {
debug!("Importing relation file");
let (rel_tag, start_blk) = self.key_range.start.to_rel_block()?;
let (rel_tag_end, end_blk) = self.key_range.end.to_rel_block()?;
assert_eq!(rel_tag, rel_tag_end);
let ranges = (start_blk..end_blk)
.enumerate()
.filter_map(|(i, blknum)| {
let key = rel_block_to_key(rel_tag, blknum);
if self.shard_identity.is_key_disposable(&key) {
return None;
}
let file_offset = i.checked_mul(8192).unwrap();
Some((
vec![key],
file_offset,
file_offset.checked_add(8192).unwrap(),
))
})
.coalesce(|(mut acc, acc_start, acc_end), (mut key, start, end)| {
assert_eq!(key.len(), 1);
assert!(!acc.is_empty());
assert!(acc_end > acc_start);
if acc_end == start /* TODO additional max range check here, to limit memory consumption per task to X */ {
acc.push(key.pop().unwrap());
Ok((acc, acc_start, end))
} else {
Err(((acc, acc_start, acc_end), (key, start, end)))
}
});
let mut nimages = 0;
for (keys, range_start, range_end) in ranges {
let range_buf = self
.storage
.get_range(&self.path, range_start.into_u64(), range_end.into_u64())
.await?;
let mut buf = Bytes::from(range_buf);
// TODO: batched writes
for key in keys {
let image = buf.split_to(8192);
layer_writer.put_image(key, image, ctx).await?;
nimages += 1;
}
}
Ok(nimages)
}
}
struct ImportSlruBlocksTask {
shard_identity: ShardIdentity,
key_range: Range<Key>,
path: RemotePath,
storage: RemoteStorageWrapper,
}
impl ImportSlruBlocksTask {
fn new(
shard_identity: ShardIdentity,
key_range: Range<Key>,
path: &RemotePath,
storage: RemoteStorageWrapper,
) -> Self {
ImportSlruBlocksTask {
shard_identity,
key_range,
path: path.clone(),
storage,
}
}
}
impl ImportTask for ImportSlruBlocksTask {
fn key_range(&self) -> Range<Key> {
self.key_range.clone()
}
async fn doit(
self,
layer_writer: &mut ImageLayerWriter,
ctx: &RequestContext,
) -> anyhow::Result<usize> {
debug!("Importing SLRU segment file {}", self.path);
let buf = self.storage.get(&self.path).await?;
let (kind, segno, start_blk) = self.key_range.start.to_slru_block()?;
let (_kind, _segno, end_blk) = self.key_range.end.to_slru_block()?;
let mut blknum = start_blk;
let mut nimages = 0;
let mut file_offset = 0;
while blknum < end_blk {
let key = slru_block_to_key(kind, segno, blknum);
assert!(
!self.shard_identity.is_key_disposable(&key),
"SLRU keys need to go into every shard"
);
let buf = &buf[file_offset..(file_offset + 8192)];
file_offset += 8192;
layer_writer
.put_image(key, Bytes::copy_from_slice(buf), ctx)
.await?;
blknum += 1;
nimages += 1;
}
Ok(nimages)
}
}
enum AnyImportTask {
SingleKey(ImportSingleKeyTask),
RelBlocks(ImportRelBlocksTask),
SlruBlocks(ImportSlruBlocksTask),
}
impl ImportTask for AnyImportTask {
fn key_range(&self) -> Range<Key> {
match self {
Self::SingleKey(t) => t.key_range(),
Self::RelBlocks(t) => t.key_range(),
Self::SlruBlocks(t) => t.key_range(),
}
}
/// returns the number of images put into the `layer_writer`
async fn doit(
self,
layer_writer: &mut ImageLayerWriter,
ctx: &RequestContext,
) -> anyhow::Result<usize> {
match self {
Self::SingleKey(t) => t.doit(layer_writer, ctx).await,
Self::RelBlocks(t) => t.doit(layer_writer, ctx).await,
Self::SlruBlocks(t) => t.doit(layer_writer, ctx).await,
}
}
}
impl From<ImportSingleKeyTask> for AnyImportTask {
fn from(t: ImportSingleKeyTask) -> Self {
Self::SingleKey(t)
}
}
impl From<ImportRelBlocksTask> for AnyImportTask {
fn from(t: ImportRelBlocksTask) -> Self {
Self::RelBlocks(t)
}
}
impl From<ImportSlruBlocksTask> for AnyImportTask {
fn from(t: ImportSlruBlocksTask) -> Self {
Self::SlruBlocks(t)
}
}
struct ChunkProcessingJob {
timeline: Arc<Timeline>,
range: Range<Key>,
tasks: Vec<AnyImportTask>,
pgdata_lsn: Lsn,
}
impl ChunkProcessingJob {
fn new(range: Range<Key>, tasks: Vec<AnyImportTask>, env: &Flow) -> Self {
assert!(env.pgdata_lsn.is_valid());
Self {
timeline: env.timeline.clone(),
range,
tasks,
pgdata_lsn: env.pgdata_lsn,
}
}
async fn run(self, ctx: &RequestContext) -> anyhow::Result<()> {
let mut writer = ImageLayerWriter::new(
self.timeline.conf,
self.timeline.timeline_id,
self.timeline.tenant_shard_id,
&self.range,
self.pgdata_lsn,
ctx,
)
.await?;
let mut nimages = 0;
for task in self.tasks {
nimages += task.doit(&mut writer, ctx).await?;
}
let resident_layer = if nimages > 0 {
let (desc, path) = writer.finish(ctx).await?;
Layer::finish_creating(self.timeline.conf, &self.timeline, desc, &path)?
} else {
// dropping the writer cleans up
return Ok(());
};
// this is sharing the same code as create_image_layers
let mut guard = self.timeline.layers.write().await;
guard
.open_mut()?
.track_new_image_layers(&[resident_layer.clone()], &self.timeline.metrics);
crate::tenant::timeline::drop_wlock(guard);
// Schedule the layer for upload but don't add barriers such as
// wait for completion or index upload, so we don't inhibit upload parallelism.
// TODO: limit upload parallelism somehow (e.g. by limiting concurrency of jobs?)
// TODO: or regulate parallelism by upload queue depth? Prob should happen at a higher level.
self.timeline
.remote_client
.schedule_layer_file_upload(resident_layer)?;
Ok(())
}
}

View File

@@ -0,0 +1,315 @@
use std::{ops::Bound, sync::Arc};
use anyhow::Context;
use bytes::Bytes;
use postgres_ffi::ControlFileData;
use remote_storage::{
Download, DownloadError, DownloadOpts, GenericRemoteStorage, Listing, ListingObject, RemotePath,
};
use serde::de::DeserializeOwned;
use tokio_util::sync::CancellationToken;
use tracing::{debug, info, instrument};
use utils::lsn::Lsn;
use crate::{assert_u64_eq_usize::U64IsUsize, config::PageServerConf};
use super::{importbucket_format, index_part_format};
pub async fn new(
conf: &'static PageServerConf,
location: &index_part_format::Location,
cancel: CancellationToken,
) -> Result<RemoteStorageWrapper, anyhow::Error> {
// FIXME: we probably want some timeout, and we might be able to assume the max file
// size on S3 is 1GiB (postgres segment size). But the problem is that the individual
// downloaders don't know enough about concurrent downloads to make a guess on the
// expected bandwidth and resulting best timeout.
let timeout = std::time::Duration::from_secs(24 * 60 * 60);
let location_storage = match location {
#[cfg(feature = "testing")]
index_part_format::Location::LocalFs { path } => {
GenericRemoteStorage::LocalFs(remote_storage::LocalFs::new(path.clone(), timeout)?)
}
index_part_format::Location::AwsS3 {
region,
bucket,
key,
} => {
// TODO: think about security implications of letting the client specify the bucket & prefix.
// It's the most flexible right now, but, possibly we want to move bucket name into PS conf
// and force the timeline_id into the prefix?
GenericRemoteStorage::AwsS3(Arc::new(
remote_storage::S3Bucket::new(
&remote_storage::S3Config {
bucket_name: bucket.clone(),
prefix_in_bucket: Some(key.clone()),
bucket_region: region.clone(),
endpoint: conf
.import_pgdata_aws_endpoint_url
.clone()
.map(|url| url.to_string()), // by specifying None here, remote_storage/aws-sdk-rust will infer from env
concurrency_limit: 100.try_into().unwrap(), // TODO: think about this
max_keys_per_list_response: Some(1000), // TODO: think about this
upload_storage_class: None, // irrelevant
},
timeout,
)
.await
.context("setup s3 bucket")?,
))
}
};
let storage_wrapper = RemoteStorageWrapper::new(location_storage, cancel);
Ok(storage_wrapper)
}
/// Wrap [`remote_storage`] APIs to make it look a bit more like a filesystem API
/// such as [`tokio::fs`], which was used in the original implementation of the import code.
#[derive(Clone)]
pub struct RemoteStorageWrapper {
storage: GenericRemoteStorage,
cancel: CancellationToken,
}
impl RemoteStorageWrapper {
pub fn new(storage: GenericRemoteStorage, cancel: CancellationToken) -> Self {
Self { storage, cancel }
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn listfilesindir(
&self,
path: &RemotePath,
) -> Result<Vec<(RemotePath, usize)>, DownloadError> {
assert!(
path.object_name().is_some(),
"must specify dirname, without trailing slash"
);
let path = path.add_trailing_slash();
let res = crate::tenant::remote_timeline_client::download::download_retry_forever(
|| async {
let Listing { keys, prefixes: _ } = self
.storage
.list(
Some(&path),
remote_storage::ListingMode::WithDelimiter,
None,
&self.cancel,
)
.await?;
let res = keys
.into_iter()
.map(|ListingObject { key, size, .. }| (key, size.into_usize()))
.collect();
Ok(res)
},
&format!("listfilesindir {path:?}"),
&self.cancel,
)
.await;
debug!(?res, "returning");
res
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn listdir(&self, path: &RemotePath) -> Result<Vec<RemotePath>, DownloadError> {
assert!(
path.object_name().is_some(),
"must specify dirname, without trailing slash"
);
let path = path.add_trailing_slash();
let res = crate::tenant::remote_timeline_client::download::download_retry_forever(
|| async {
let Listing { keys, prefixes } = self
.storage
.list(
Some(&path),
remote_storage::ListingMode::WithDelimiter,
None,
&self.cancel,
)
.await?;
let res = keys
.into_iter()
.map(|ListingObject { key, .. }| key)
.chain(prefixes.into_iter())
.collect();
Ok(res)
},
&format!("listdir {path:?}"),
&self.cancel,
)
.await;
debug!(?res, "returning");
res
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn get(&self, path: &RemotePath) -> Result<Bytes, DownloadError> {
let res = crate::tenant::remote_timeline_client::download::download_retry_forever(
|| async {
let Download {
download_stream, ..
} = self
.storage
.download(path, &DownloadOpts::default(), &self.cancel)
.await?;
let mut reader = tokio_util::io::StreamReader::new(download_stream);
// XXX optimize this, can we get the capacity hint from somewhere?
let mut buf = Vec::new();
tokio::io::copy_buf(&mut reader, &mut buf).await?;
Ok(Bytes::from(buf))
},
&format!("download {path:?}"),
&self.cancel,
)
.await;
debug!(len = res.as_ref().ok().map(|buf| buf.len()), "done");
res
}
pub async fn get_spec(&self) -> Result<Option<importbucket_format::Spec>, anyhow::Error> {
self.get_json(&RemotePath::from_string("spec.json").unwrap())
.await
.context("get spec")
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn get_json<T: DeserializeOwned>(
&self,
path: &RemotePath,
) -> Result<Option<T>, DownloadError> {
let buf = match self.get(path).await {
Ok(buf) => buf,
Err(DownloadError::NotFound) => return Ok(None),
Err(err) => return Err(err),
};
let res = serde_json::from_slice(&buf)
.context("serialize")
// TODO: own error type
.map_err(DownloadError::Other)?;
Ok(Some(res))
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn put_json<T>(&self, path: &RemotePath, value: &T) -> anyhow::Result<()>
where
T: serde::Serialize,
{
let buf = serde_json::to_vec(value)?;
let bytes = Bytes::from(buf);
utils::backoff::retry(
|| async {
let size = bytes.len();
let bytes = futures::stream::once(futures::future::ready(Ok(bytes.clone())));
self.storage
.upload_storage_object(bytes, size, path, &self.cancel)
.await
},
remote_storage::TimeoutOrCancel::caused_by_cancel,
1,
u32::MAX,
&format!("put json {path}"),
&self.cancel,
)
.await
.expect("practically infinite retries")
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn get_range(
&self,
path: &RemotePath,
start_inclusive: u64,
end_exclusive: u64,
) -> Result<Vec<u8>, DownloadError> {
let len = end_exclusive
.checked_sub(start_inclusive)
.unwrap()
.into_usize();
let res = crate::tenant::remote_timeline_client::download::download_retry_forever(
|| async {
let Download {
download_stream, ..
} = self
.storage
.download(
path,
&DownloadOpts {
etag: None,
byte_start: Bound::Included(start_inclusive),
byte_end: Bound::Excluded(end_exclusive)
},
&self.cancel)
.await?;
let mut reader = tokio_util::io::StreamReader::new(download_stream);
let mut buf = Vec::with_capacity(len);
tokio::io::copy_buf(&mut reader, &mut buf).await?;
Ok(buf)
},
&format!("download range len=0x{len:x} [0x{start_inclusive:x},0x{end_exclusive:x}) from {path:?}"),
&self.cancel,
)
.await;
debug!(len = res.as_ref().ok().map(|buf| buf.len()), "done");
res
}
pub fn pgdata(&self) -> RemotePath {
RemotePath::from_string("pgdata").unwrap()
}
pub async fn get_control_file(&self) -> Result<ControlFile, anyhow::Error> {
let control_file_path = self.pgdata().join("global/pg_control");
info!("get control file from {control_file_path}");
let control_file_buf = self.get(&control_file_path).await?;
ControlFile::new(control_file_buf)
}
}
pub struct ControlFile {
control_file_data: ControlFileData,
control_file_buf: Bytes,
}
impl ControlFile {
pub(crate) fn new(control_file_buf: Bytes) -> Result<Self, anyhow::Error> {
// XXX ControlFileData is version-specific, we're always using v14 here. v17 had changes.
let control_file_data = ControlFileData::decode(&control_file_buf)?;
let control_file = ControlFile {
control_file_data,
control_file_buf,
};
control_file.try_pg_version()?; // so that we can offer infallible pg_version()
Ok(control_file)
}
pub(crate) fn base_lsn(&self) -> Lsn {
Lsn(self.control_file_data.checkPoint).align()
}
pub(crate) fn pg_version(&self) -> u32 {
self.try_pg_version()
.expect("prepare() checks that try_pg_version doesn't error")
}
pub(crate) fn control_file_data(&self) -> &ControlFileData {
&self.control_file_data
}
pub(crate) fn control_file_buf(&self) -> &Bytes {
&self.control_file_buf
}
fn try_pg_version(&self) -> anyhow::Result<u32> {
Ok(match self.control_file_data.catalog_version_no {
// thesea are from catversion.h
202107181 => 14,
202209061 => 15,
202307071 => 16,
/* XXX pg17 */
catversion => {
anyhow::bail!("unrecognized catalog version {catversion}")
}
})
}
}

View File

@@ -0,0 +1,20 @@
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq, Eq)]
pub struct PgdataStatus {
pub done: bool,
// TODO: remaining fields
}
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq, Eq)]
pub struct ShardStatus {
pub done: bool,
// TODO: remaining fields
}
// TODO: dedupe with fast_import code
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq, Eq)]
pub struct Spec {
pub project_id: String,
pub branch_id: String,
}

View File

@@ -0,0 +1,68 @@
use serde::{Deserialize, Serialize};
#[cfg(feature = "testing")]
use camino::Utf8PathBuf;
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub enum Root {
V1(V1),
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub enum V1 {
InProgress(InProgress),
Done(Done),
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
#[serde(transparent)]
pub struct IdempotencyKey(String);
impl IdempotencyKey {
pub fn new(s: String) -> Self {
Self(s)
}
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub struct InProgress {
pub idempotency_key: IdempotencyKey,
pub location: Location,
pub started_at: chrono::NaiveDateTime,
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub struct Done {
pub idempotency_key: IdempotencyKey,
pub started_at: chrono::NaiveDateTime,
pub finished_at: chrono::NaiveDateTime,
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub enum Location {
#[cfg(feature = "testing")]
LocalFs { path: Utf8PathBuf },
AwsS3 {
region: String,
bucket: String,
key: String,
},
}
impl Root {
pub fn is_done(&self) -> bool {
match self {
Root::V1(v1) => match v1 {
V1::Done(_) => true,
V1::InProgress(_) => false,
},
}
}
pub fn idempotency_key(&self) -> &IdempotencyKey {
match self {
Root::V1(v1) => match v1 {
V1::InProgress(in_progress) => &in_progress.idempotency_key,
V1::Done(done) => &done.idempotency_key,
},
}
}
}

View File

@@ -0,0 +1,119 @@
//! FIXME: most of this is copy-paste from mgmt_api.rs ; dedupe into a `reqwest_utils::Client` crate.
use pageserver_client::mgmt_api::{Error, ResponseErrorMessageExt};
use serde::{Deserialize, Serialize};
use tokio_util::sync::CancellationToken;
use tracing::error;
use crate::config::PageServerConf;
use reqwest::Method;
use super::importbucket_format::Spec;
pub struct Client {
base_url: String,
authorization_header: Option<String>,
client: reqwest::Client,
cancel: CancellationToken,
}
pub type Result<T> = std::result::Result<T, Error>;
#[derive(Serialize, Deserialize, Debug)]
struct ImportProgressRequest {
// no fields yet, not sure if there every will be any
}
#[derive(Serialize, Deserialize, Debug)]
struct ImportProgressResponse {
// we don't care
}
impl Client {
pub fn new(conf: &PageServerConf, cancel: CancellationToken) -> anyhow::Result<Self> {
let Some(ref base_url) = conf.import_pgdata_upcall_api else {
anyhow::bail!("import_pgdata_upcall_api is not configured")
};
Ok(Self {
base_url: base_url.to_string(),
client: reqwest::Client::new(),
cancel,
authorization_header: conf
.import_pgdata_upcall_api_token
.as_ref()
.map(|secret_string| secret_string.get_contents())
.map(|jwt| format!("Bearer {jwt}")),
})
}
fn start_request<U: reqwest::IntoUrl>(
&self,
method: Method,
uri: U,
) -> reqwest::RequestBuilder {
let req = self.client.request(method, uri);
if let Some(value) = &self.authorization_header {
req.header(reqwest::header::AUTHORIZATION, value)
} else {
req
}
}
async fn request_noerror<B: serde::Serialize, U: reqwest::IntoUrl>(
&self,
method: Method,
uri: U,
body: B,
) -> Result<reqwest::Response> {
self.start_request(method, uri)
.json(&body)
.send()
.await
.map_err(Error::ReceiveBody)
}
async fn request<B: serde::Serialize, U: reqwest::IntoUrl>(
&self,
method: Method,
uri: U,
body: B,
) -> Result<reqwest::Response> {
let res = self.request_noerror(method, uri, body).await?;
let response = res.error_from_body().await?;
Ok(response)
}
pub async fn send_progress_once(&self, spec: &Spec) -> Result<()> {
let url = format!(
"{}/projects/{}/branches/{}/import_progress",
self.base_url, spec.project_id, spec.branch_id
);
let ImportProgressResponse {} = self
.request(Method::POST, url, &ImportProgressRequest {})
.await?
.json()
.await
.map_err(Error::ReceiveBody)?;
Ok(())
}
pub async fn send_progress_until_success(&self, spec: &Spec) -> anyhow::Result<()> {
loop {
match self.send_progress_once(spec).await {
Ok(()) => return Ok(()),
Err(Error::Cancelled) => return Err(anyhow::anyhow!("cancelled")),
Err(err) => {
error!(?err, "error sending progress, retrying");
if tokio::time::timeout(
std::time::Duration::from_secs(10),
self.cancel.cancelled(),
)
.await
.is_ok()
{
anyhow::bail!("cancelled while sending early progress update");
}
}
}
}
}
}

View File

@@ -3,7 +3,7 @@ use std::{collections::hash_map::Entry, fs, sync::Arc};
use anyhow::Context;
use camino::Utf8PathBuf;
use tracing::{error, info, info_span};
use utils::{fs_ext, id::TimelineId, lsn::Lsn};
use utils::{fs_ext, id::TimelineId, lsn::Lsn, sync::gate::GateGuard};
use crate::{
context::RequestContext,
@@ -23,14 +23,14 @@ use super::Timeline;
pub struct UninitializedTimeline<'t> {
pub(crate) owning_tenant: &'t Tenant,
timeline_id: TimelineId,
raw_timeline: Option<(Arc<Timeline>, TimelineCreateGuard<'t>)>,
raw_timeline: Option<(Arc<Timeline>, TimelineCreateGuard)>,
}
impl<'t> UninitializedTimeline<'t> {
pub(crate) fn new(
owning_tenant: &'t Tenant,
timeline_id: TimelineId,
raw_timeline: Option<(Arc<Timeline>, TimelineCreateGuard<'t>)>,
raw_timeline: Option<(Arc<Timeline>, TimelineCreateGuard)>,
) -> Self {
Self {
owning_tenant,
@@ -87,6 +87,10 @@ impl<'t> UninitializedTimeline<'t> {
}
}
pub(crate) fn finish_creation_myself(&mut self) -> (Arc<Timeline>, TimelineCreateGuard) {
self.raw_timeline.take().expect("already checked")
}
/// Prepares timeline data by loading it from the basebackup archive.
pub(crate) async fn import_basebackup_from_tar(
self,
@@ -167,9 +171,10 @@ pub(crate) fn cleanup_timeline_directory(create_guard: TimelineCreateGuard) {
/// A guard for timeline creations in process: as long as this object exists, the timeline ID
/// is kept in `[Tenant::timelines_creating]` to exclude concurrent attempts to create the same timeline.
#[must_use]
pub(crate) struct TimelineCreateGuard<'t> {
owning_tenant: &'t Tenant,
timeline_id: TimelineId,
pub(crate) struct TimelineCreateGuard {
pub(crate) _tenant_gate_guard: GateGuard,
pub(crate) owning_tenant: Arc<Tenant>,
pub(crate) timeline_id: TimelineId,
pub(crate) timeline_path: Utf8PathBuf,
pub(crate) idempotency: CreateTimelineIdempotency,
}
@@ -184,20 +189,27 @@ pub(crate) enum TimelineExclusionError {
},
#[error("Already creating")]
AlreadyCreating,
#[error("Shutting down")]
ShuttingDown,
// e.g. I/O errors, or some failure deep in postgres initdb
#[error(transparent)]
Other(#[from] anyhow::Error),
}
impl<'t> TimelineCreateGuard<'t> {
impl TimelineCreateGuard {
pub(crate) fn new(
owning_tenant: &'t Tenant,
owning_tenant: &Arc<Tenant>,
timeline_id: TimelineId,
timeline_path: Utf8PathBuf,
idempotency: CreateTimelineIdempotency,
allow_offloaded: bool,
) -> Result<Self, TimelineExclusionError> {
let _tenant_gate_guard = owning_tenant
.gate
.enter()
.map_err(|_| TimelineExclusionError::ShuttingDown)?;
// Lock order: this is the only place we take both locks. During drop() we only
// lock creating_timelines
let timelines = owning_tenant.timelines.lock().unwrap();
@@ -225,8 +237,12 @@ impl<'t> TimelineCreateGuard<'t> {
return Err(TimelineExclusionError::AlreadyCreating);
}
creating_timelines.insert(timeline_id);
drop(creating_timelines);
drop(timelines_offloaded);
drop(timelines);
Ok(Self {
owning_tenant,
_tenant_gate_guard,
owning_tenant: Arc::clone(owning_tenant),
timeline_id,
timeline_path,
idempotency,
@@ -234,7 +250,7 @@ impl<'t> TimelineCreateGuard<'t> {
}
}
impl Drop for TimelineCreateGuard<'_> {
impl Drop for TimelineCreateGuard {
fn drop(&mut self) {
self.owning_tenant
.timelines_creating

View File

@@ -3,6 +3,7 @@ use super::storage_layer::ResidentLayer;
use crate::tenant::metadata::TimelineMetadata;
use crate::tenant::remote_timeline_client::index::IndexPart;
use crate::tenant::remote_timeline_client::index::LayerFileMetadata;
use std::collections::HashSet;
use std::collections::{HashMap, VecDeque};
use std::fmt::Debug;
@@ -14,7 +15,6 @@ use utils::lsn::AtomicLsn;
use std::sync::atomic::AtomicU32;
use utils::lsn::Lsn;
#[cfg(feature = "testing")]
use utils::generation::Generation;
// clippy warns that Uninitialized is much smaller than Initialized, which wastes
@@ -38,6 +38,12 @@ impl UploadQueue {
}
}
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug)]
pub(crate) enum OpType {
MayReorder,
FlushDeletion,
}
/// This keeps track of queued and in-progress tasks.
pub(crate) struct UploadQueueInitialized {
/// Counter to assign task IDs
@@ -88,6 +94,9 @@ pub(crate) struct UploadQueueInitialized {
#[cfg(feature = "testing")]
pub(crate) dangling_files: HashMap<LayerName, Generation>,
/// Ensure we order file operations correctly.
pub(crate) recently_deleted: HashSet<(LayerName, Generation)>,
/// Deletions that are blocked by the tenant configuration
pub(crate) blocked_deletions: Vec<Delete>,
@@ -183,6 +192,7 @@ impl UploadQueue {
queued_operations: VecDeque::new(),
#[cfg(feature = "testing")]
dangling_files: HashMap::new(),
recently_deleted: HashSet::new(),
blocked_deletions: Vec::new(),
shutting_down: false,
shutdown_ready: Arc::new(tokio::sync::Semaphore::new(0)),
@@ -224,6 +234,7 @@ impl UploadQueue {
queued_operations: VecDeque::new(),
#[cfg(feature = "testing")]
dangling_files: HashMap::new(),
recently_deleted: HashSet::new(),
blocked_deletions: Vec::new(),
shutting_down: false,
shutdown_ready: Arc::new(tokio::sync::Semaphore::new(0)),
@@ -282,8 +293,8 @@ pub(crate) struct Delete {
#[derive(Debug)]
pub(crate) enum UploadOp {
/// Upload a layer file
UploadLayer(ResidentLayer, LayerFileMetadata),
/// Upload a layer file. The last field indicates the last operation for thie file.
UploadLayer(ResidentLayer, LayerFileMetadata, Option<OpType>),
/// Upload a index_part.json file
UploadMetadata {
@@ -305,11 +316,11 @@ pub(crate) enum UploadOp {
impl std::fmt::Display for UploadOp {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
UploadOp::UploadLayer(layer, metadata) => {
UploadOp::UploadLayer(layer, metadata, mode) => {
write!(
f,
"UploadLayer({}, size={:?}, gen={:?})",
layer, metadata.file_size, metadata.generation
"UploadLayer({}, size={:?}, gen={:?}, mode={:?})",
layer, metadata.file_size, metadata.generation, mode
)
}
UploadOp::UploadMetadata { uploaded, .. } => {

159
poetry.lock generated
View File

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.4 and should not be changed by hand.
[[package]]
name = "aiohappyeyeballs"
@@ -114,7 +114,6 @@ files = [
[package.dependencies]
aiohappyeyeballs = ">=2.3.0"
aiosignal = ">=1.1.2"
async-timeout = {version = ">=4.0,<6.0", markers = "python_version < \"3.11\""}
attrs = ">=17.3.0"
frozenlist = ">=1.1.1"
multidict = ">=4.5,<7.0"
@@ -219,10 +218,8 @@ files = [
]
[package.dependencies]
exceptiongroup = {version = ">=1.0.2", markers = "python_version < \"3.11\""}
idna = ">=2.8"
sniffio = ">=1.1"
typing-extensions = {version = ">=4.1", markers = "python_version < \"3.11\""}
[package.extras]
doc = ["Sphinx (>=7)", "packaging", "sphinx-autodoc-typehints (>=1.2.0)", "sphinx-rtd-theme"]
@@ -737,10 +734,7 @@ files = [
[package.dependencies]
jmespath = ">=0.7.1,<2.0.0"
python-dateutil = ">=2.1,<3.0.0"
urllib3 = [
{version = ">=1.25.4,<1.27", markers = "python_version < \"3.10\""},
{version = ">=1.25.4,<2.1", markers = "python_version >= \"3.10\""},
]
urllib3 = {version = ">=1.25.4,<2.1", markers = "python_version >= \"3.10\""}
[package.extras]
crt = ["awscrt (==0.19.19)"]
@@ -1069,20 +1063,6 @@ docs = ["myst-parser (==0.18.0)", "sphinx (==5.1.1)"]
ssh = ["paramiko (>=2.4.3)"]
websockets = ["websocket-client (>=1.3.0)"]
[[package]]
name = "exceptiongroup"
version = "1.1.1"
description = "Backport of PEP 654 (exception groups)"
optional = false
python-versions = ">=3.7"
files = [
{file = "exceptiongroup-1.1.1-py3-none-any.whl", hash = "sha256:232c37c63e4f682982c8b6459f33a8981039e5fb8756b2074364e5055c498c9e"},
{file = "exceptiongroup-1.1.1.tar.gz", hash = "sha256:d484c3090ba2889ae2928419117447a14daf3c1231d5e30d0aae34f354f01785"},
]
[package.extras]
test = ["pytest (>=6)"]
[[package]]
name = "execnet"
version = "1.9.0"
@@ -1110,7 +1090,6 @@ files = [
[package.dependencies]
click = ">=8.0"
importlib-metadata = {version = ">=3.6.0", markers = "python_version < \"3.10\""}
itsdangerous = ">=2.0"
Jinja2 = ">=3.0"
Werkzeug = ">=2.2.2"
@@ -1319,25 +1298,6 @@ files = [
{file = "idna-3.7.tar.gz", hash = "sha256:028ff3aadf0609c1fd278d8ea3089299412a7a8b9bd005dd08b9f8285bcb5cfc"},
]
[[package]]
name = "importlib-metadata"
version = "4.12.0"
description = "Read metadata from Python packages"
optional = false
python-versions = ">=3.7"
files = [
{file = "importlib_metadata-4.12.0-py3-none-any.whl", hash = "sha256:7401a975809ea1fdc658c3aa4f78cc2195a0e019c5cbc4c06122884e9ae80c23"},
{file = "importlib_metadata-4.12.0.tar.gz", hash = "sha256:637245b8bab2b6502fcbc752cc4b7a6f6243bb02b31c5c26156ad103d3d45670"},
]
[package.dependencies]
zipp = ">=0.5"
[package.extras]
docs = ["jaraco.packaging (>=9)", "rst.linker (>=1.9)", "sphinx"]
perf = ["ipython"]
testing = ["flufl.flake8", "importlib-resources (>=1.3)", "packaging", "pyfakefs", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)", "pytest-perf (>=0.9.2)"]
[[package]]
name = "iniconfig"
version = "1.1.1"
@@ -1898,48 +1858,54 @@ files = [
[[package]]
name = "mypy"
version = "1.3.0"
version = "1.13.0"
description = "Optional static typing for Python"
optional = false
python-versions = ">=3.7"
python-versions = ">=3.8"
files = [
{file = "mypy-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:c1eb485cea53f4f5284e5baf92902cd0088b24984f4209e25981cc359d64448d"},
{file = "mypy-1.3.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4c99c3ecf223cf2952638da9cd82793d8f3c0c5fa8b6ae2b2d9ed1e1ff51ba85"},
{file = "mypy-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:550a8b3a19bb6589679a7c3c31f64312e7ff482a816c96e0cecec9ad3a7564dd"},
{file = "mypy-1.3.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:cbc07246253b9e3d7d74c9ff948cd0fd7a71afcc2b77c7f0a59c26e9395cb152"},
{file = "mypy-1.3.0-cp310-cp310-win_amd64.whl", hash = "sha256:a22435632710a4fcf8acf86cbd0d69f68ac389a3892cb23fbad176d1cddaf228"},
{file = "mypy-1.3.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6e33bb8b2613614a33dff70565f4c803f889ebd2f859466e42b46e1df76018dd"},
{file = "mypy-1.3.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7d23370d2a6b7a71dc65d1266f9a34e4cde9e8e21511322415db4b26f46f6b8c"},
{file = "mypy-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:658fe7b674769a0770d4b26cb4d6f005e88a442fe82446f020be8e5f5efb2fae"},
{file = "mypy-1.3.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:6e42d29e324cdda61daaec2336c42512e59c7c375340bd202efa1fe0f7b8f8ca"},
{file = "mypy-1.3.0-cp311-cp311-win_amd64.whl", hash = "sha256:d0b6c62206e04061e27009481cb0ec966f7d6172b5b936f3ead3d74f29fe3dcf"},
{file = "mypy-1.3.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:76ec771e2342f1b558c36d49900dfe81d140361dd0d2df6cd71b3db1be155409"},
{file = "mypy-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ebc95f8386314272bbc817026f8ce8f4f0d2ef7ae44f947c4664efac9adec929"},
{file = "mypy-1.3.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:faff86aa10c1aa4a10e1a301de160f3d8fc8703b88c7e98de46b531ff1276a9a"},
{file = "mypy-1.3.0-cp37-cp37m-win_amd64.whl", hash = "sha256:8c5979d0deb27e0f4479bee18ea0f83732a893e81b78e62e2dda3e7e518c92ee"},
{file = "mypy-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:c5d2cc54175bab47011b09688b418db71403aefad07cbcd62d44010543fc143f"},
{file = "mypy-1.3.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:87df44954c31d86df96c8bd6e80dfcd773473e877ac6176a8e29898bfb3501cb"},
{file = "mypy-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:473117e310febe632ddf10e745a355714e771ffe534f06db40702775056614c4"},
{file = "mypy-1.3.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:74bc9b6e0e79808bf8678d7678b2ae3736ea72d56eede3820bd3849823e7f305"},
{file = "mypy-1.3.0-cp38-cp38-win_amd64.whl", hash = "sha256:44797d031a41516fcf5cbfa652265bb994e53e51994c1bd649ffcd0c3a7eccbf"},
{file = "mypy-1.3.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:ddae0f39ca146972ff6bb4399f3b2943884a774b8771ea0a8f50e971f5ea5ba8"},
{file = "mypy-1.3.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:1c4c42c60a8103ead4c1c060ac3cdd3ff01e18fddce6f1016e08939647a0e703"},
{file = "mypy-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e86c2c6852f62f8f2b24cb7a613ebe8e0c7dc1402c61d36a609174f63e0ff017"},
{file = "mypy-1.3.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:f9dca1e257d4cc129517779226753dbefb4f2266c4eaad610fc15c6a7e14283e"},
{file = "mypy-1.3.0-cp39-cp39-win_amd64.whl", hash = "sha256:95d8d31a7713510685b05fbb18d6ac287a56c8f6554d88c19e73f724a445448a"},
{file = "mypy-1.3.0-py3-none-any.whl", hash = "sha256:a8763e72d5d9574d45ce5881962bc8e9046bf7b375b0abf031f3e6811732a897"},
{file = "mypy-1.3.0.tar.gz", hash = "sha256:e1f4d16e296f5135624b34e8fb741eb0eadedca90862405b1f1fde2040b9bd11"},
{file = "mypy-1.13.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:6607e0f1dd1fb7f0aca14d936d13fd19eba5e17e1cd2a14f808fa5f8f6d8f60a"},
{file = "mypy-1.13.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:8a21be69bd26fa81b1f80a61ee7ab05b076c674d9b18fb56239d72e21d9f4c80"},
{file = "mypy-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7b2353a44d2179846a096e25691d54d59904559f4232519d420d64da6828a3a7"},
{file = "mypy-1.13.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:0730d1c6a2739d4511dc4253f8274cdd140c55c32dfb0a4cf8b7a43f40abfa6f"},
{file = "mypy-1.13.0-cp310-cp310-win_amd64.whl", hash = "sha256:c5fc54dbb712ff5e5a0fca797e6e0aa25726c7e72c6a5850cfd2adbc1eb0a372"},
{file = "mypy-1.13.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:581665e6f3a8a9078f28d5502f4c334c0c8d802ef55ea0e7276a6e409bc0d82d"},
{file = "mypy-1.13.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:3ddb5b9bf82e05cc9a627e84707b528e5c7caaa1c55c69e175abb15a761cec2d"},
{file = "mypy-1.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:20c7ee0bc0d5a9595c46f38beb04201f2620065a93755704e141fcac9f59db2b"},
{file = "mypy-1.13.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:3790ded76f0b34bc9c8ba4def8f919dd6a46db0f5a6610fb994fe8efdd447f73"},
{file = "mypy-1.13.0-cp311-cp311-win_amd64.whl", hash = "sha256:51f869f4b6b538229c1d1bcc1dd7d119817206e2bc54e8e374b3dfa202defcca"},
{file = "mypy-1.13.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5c7051a3461ae84dfb5dd15eff5094640c61c5f22257c8b766794e6dd85e72d5"},
{file = "mypy-1.13.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:39bb21c69a5d6342f4ce526e4584bc5c197fd20a60d14a8624d8743fffb9472e"},
{file = "mypy-1.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:164f28cb9d6367439031f4c81e84d3ccaa1e19232d9d05d37cb0bd880d3f93c2"},
{file = "mypy-1.13.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:a4c1bfcdbce96ff5d96fc9b08e3831acb30dc44ab02671eca5953eadad07d6d0"},
{file = "mypy-1.13.0-cp312-cp312-win_amd64.whl", hash = "sha256:a0affb3a79a256b4183ba09811e3577c5163ed06685e4d4b46429a271ba174d2"},
{file = "mypy-1.13.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a7b44178c9760ce1a43f544e595d35ed61ac2c3de306599fa59b38a6048e1aa7"},
{file = "mypy-1.13.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:5d5092efb8516d08440e36626f0153b5006d4088c1d663d88bf79625af3d1d62"},
{file = "mypy-1.13.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:de2904956dac40ced10931ac967ae63c5089bd498542194b436eb097a9f77bc8"},
{file = "mypy-1.13.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:7bfd8836970d33c2105562650656b6846149374dc8ed77d98424b40b09340ba7"},
{file = "mypy-1.13.0-cp313-cp313-win_amd64.whl", hash = "sha256:9f73dba9ec77acb86457a8fc04b5239822df0c14a082564737833d2963677dbc"},
{file = "mypy-1.13.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:100fac22ce82925f676a734af0db922ecfea991e1d7ec0ceb1e115ebe501301a"},
{file = "mypy-1.13.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:7bcb0bb7f42a978bb323a7c88f1081d1b5dee77ca86f4100735a6f541299d8fb"},
{file = "mypy-1.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bde31fc887c213e223bbfc34328070996061b0833b0a4cfec53745ed61f3519b"},
{file = "mypy-1.13.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:07de989f89786f62b937851295ed62e51774722e5444a27cecca993fc3f9cd74"},
{file = "mypy-1.13.0-cp38-cp38-win_amd64.whl", hash = "sha256:4bde84334fbe19bad704b3f5b78c4abd35ff1026f8ba72b29de70dda0916beb6"},
{file = "mypy-1.13.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:0246bcb1b5de7f08f2826451abd947bf656945209b140d16ed317f65a17dc7dc"},
{file = "mypy-1.13.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:7f5b7deae912cf8b77e990b9280f170381fdfbddf61b4ef80927edd813163732"},
{file = "mypy-1.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7029881ec6ffb8bc233a4fa364736789582c738217b133f1b55967115288a2bc"},
{file = "mypy-1.13.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:3e38b980e5681f28f033f3be86b099a247b13c491f14bb8b1e1e134d23bb599d"},
{file = "mypy-1.13.0-cp39-cp39-win_amd64.whl", hash = "sha256:a6789be98a2017c912ae6ccb77ea553bbaf13d27605d2ca20a76dfbced631b24"},
{file = "mypy-1.13.0-py3-none-any.whl", hash = "sha256:9c250883f9fd81d212e0952c92dbfcc96fc237f4b7c92f56ac81fd48460b3e5a"},
{file = "mypy-1.13.0.tar.gz", hash = "sha256:0291a61b6fbf3e6673e3405cfcc0e7650bebc7939659fdca2702958038bd835e"},
]
[package.dependencies]
mypy-extensions = ">=1.0.0"
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
typing-extensions = ">=3.10"
typing-extensions = ">=4.6.0"
[package.extras]
dmypy = ["psutil (>=4.0)"]
faster-cache = ["orjson"]
install-types = ["pip"]
python2 = ["typed-ast (>=1.4.0,<2)"]
mypyc = ["setuptools (>=50)"]
reports = ["lxml"]
[[package]]
@@ -2514,11 +2480,9 @@ files = [
[package.dependencies]
colorama = {version = "*", markers = "sys_platform == \"win32\""}
exceptiongroup = {version = ">=1.0.0rc8", markers = "python_version < \"3.11\""}
iniconfig = "*"
packaging = "*"
pluggy = ">=0.12,<2.0"
tomli = {version = ">=1.0.0", markers = "python_version < \"3.11\""}
[package.extras]
testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "nose", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"]
@@ -2581,10 +2545,7 @@ files = [
]
[package.dependencies]
pytest = [
{version = ">=5.0", markers = "python_version < \"3.10\""},
{version = ">=6.2.4", markers = "python_version >= \"3.10\""},
]
pytest = {version = ">=6.2.4", markers = "python_version >= \"3.10\""}
[[package]]
name = "pytest-repeat"
@@ -3092,17 +3053,6 @@ files = [
{file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"},
]
[[package]]
name = "tomli"
version = "2.0.1"
description = "A lil' TOML parser"
optional = false
python-versions = ">=3.7"
files = [
{file = "tomli-2.0.1-py3-none-any.whl", hash = "sha256:939de3e7a6161af0c887ef91b7d41a53e7c5a1ca976325f429cb46ea9bc30ecc"},
{file = "tomli-2.0.1.tar.gz", hash = "sha256:de526c12914f0c550d15924c62d72abc48d6fe7364aa87328337a31007fe8a4f"},
]
[[package]]
name = "types-jwcrypto"
version = "1.5.0.20240925"
@@ -3359,16 +3309,6 @@ files = [
{file = "wrapt-1.14.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:8ad85f7f4e20964db4daadcab70b47ab05c7c1cf2a7c1e51087bfaa83831854c"},
{file = "wrapt-1.14.1-cp310-cp310-win32.whl", hash = "sha256:a9a52172be0b5aae932bef82a79ec0a0ce87288c7d132946d645eba03f0ad8a8"},
{file = "wrapt-1.14.1-cp310-cp310-win_amd64.whl", hash = "sha256:6d323e1554b3d22cfc03cd3243b5bb815a51f5249fdcbb86fda4bf62bab9e164"},
{file = "wrapt-1.14.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:ecee4132c6cd2ce5308e21672015ddfed1ff975ad0ac8d27168ea82e71413f55"},
{file = "wrapt-1.14.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2020f391008ef874c6d9e208b24f28e31bcb85ccff4f335f15a3251d222b92d9"},
{file = "wrapt-1.14.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2feecf86e1f7a86517cab34ae6c2f081fd2d0dac860cb0c0ded96d799d20b335"},
{file = "wrapt-1.14.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:240b1686f38ae665d1b15475966fe0472f78e71b1b4903c143a842659c8e4cb9"},
{file = "wrapt-1.14.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a9008dad07d71f68487c91e96579c8567c98ca4c3881b9b113bc7b33e9fd78b8"},
{file = "wrapt-1.14.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:6447e9f3ba72f8e2b985a1da758767698efa72723d5b59accefd716e9e8272bf"},
{file = "wrapt-1.14.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:acae32e13a4153809db37405f5eba5bac5fbe2e2ba61ab227926a22901051c0a"},
{file = "wrapt-1.14.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:49ef582b7a1152ae2766557f0550a9fcbf7bbd76f43fbdc94dd3bf07cc7168be"},
{file = "wrapt-1.14.1-cp311-cp311-win32.whl", hash = "sha256:358fe87cc899c6bb0ddc185bf3dbfa4ba646f05b1b0b9b5a27c2cb92c2cea204"},
{file = "wrapt-1.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:26046cd03936ae745a502abf44dac702a5e6880b2b01c29aea8ddf3353b68224"},
{file = "wrapt-1.14.1-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:43ca3bbbe97af00f49efb06e352eae40434ca9d915906f77def219b88e85d907"},
{file = "wrapt-1.14.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:6b1a564e6cb69922c7fe3a678b9f9a3c54e72b469875aa8018f18b4d1dd1adf3"},
{file = "wrapt-1.14.1-cp35-cp35m-manylinux2010_i686.whl", hash = "sha256:00b6d4ea20a906c0ca56d84f93065b398ab74b927a7a3dbd470f6fc503f95dc3"},
@@ -3523,21 +3463,6 @@ idna = ">=2.0"
multidict = ">=4.0"
propcache = ">=0.2.0"
[[package]]
name = "zipp"
version = "3.19.1"
description = "Backport of pathlib-compatible object wrapper for zip files"
optional = false
python-versions = ">=3.8"
files = [
{file = "zipp-3.19.1-py3-none-any.whl", hash = "sha256:2828e64edb5386ea6a52e7ba7cdb17bb30a73a858f5eb6eb93d8d36f5ea26091"},
{file = "zipp-3.19.1.tar.gz", hash = "sha256:35427f6d5594f4acf82d25541438348c26736fa9b3afa2754bcd63cdb99d8e8f"},
]
[package.extras]
doc = ["furo", "jaraco.packaging (>=9.3)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)", "sphinx-lint"]
test = ["big-O", "jaraco.functools", "jaraco.itertools", "jaraco.test", "more-itertools", "pytest (>=6,!=8.1.*)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=2.2)", "pytest-ignore-flaky", "pytest-mypy", "pytest-ruff (>=0.2.1)"]
[[package]]
name = "zstandard"
version = "0.21.0"
@@ -3598,5 +3523,5 @@ cffi = ["cffi (>=1.11)"]
[metadata]
lock-version = "2.0"
python-versions = "^3.9"
content-hash = "8cb9c38d83eec441391c0528ac2fbefde18c734373b2399e07c69382044e8ced"
python-versions = "^3.11"
content-hash = "21debe1116843e5d14bdf37d6e265c68c63a98a64ba04ec8b8a02af2e8d9f486"

View File

@@ -6,6 +6,7 @@ use tokio_postgres::config::SslMode;
use tracing::{info, info_span};
use super::ComputeCredentialKeys;
use crate::auth::IpPattern;
use crate::cache::Cached;
use crate::config::AuthenticationConfig;
use crate::context::RequestContext;
@@ -74,10 +75,10 @@ impl ConsoleRedirectBackend {
ctx: &RequestContext,
auth_config: &'static AuthenticationConfig,
client: &mut PqStream<impl AsyncRead + AsyncWrite + Unpin>,
) -> auth::Result<ConsoleRedirectNodeInfo> {
) -> auth::Result<(ConsoleRedirectNodeInfo, Option<Vec<IpPattern>>)> {
authenticate(ctx, auth_config, &self.console_uri, client)
.await
.map(ConsoleRedirectNodeInfo)
.map(|(node_info, ip_allowlist)| (ConsoleRedirectNodeInfo(node_info), ip_allowlist))
}
}
@@ -102,7 +103,7 @@ async fn authenticate(
auth_config: &'static AuthenticationConfig,
link_uri: &reqwest::Url,
client: &mut PqStream<impl AsyncRead + AsyncWrite + Unpin>,
) -> auth::Result<NodeInfo> {
) -> auth::Result<(NodeInfo, Option<Vec<IpPattern>>)> {
ctx.set_auth_method(crate::context::AuthMethod::ConsoleRedirect);
// registering waiter can fail if we get unlucky with rng.
@@ -176,9 +177,12 @@ async fn authenticate(
config.password(password.as_ref());
}
Ok(NodeInfo {
config,
aux: db_info.aux,
allow_self_signed_compute: false, // caller may override
})
Ok((
NodeInfo {
config,
aux: db_info.aux,
allow_self_signed_compute: false, // caller may override
},
db_info.allowed_ips,
))
}

View File

@@ -132,6 +132,93 @@ struct JwkSet<'a> {
keys: Vec<&'a RawValue>,
}
/// Given a jwks_url, fetch the JWKS and parse out all the signing JWKs.
/// Returns `None` and log a warning if there are any errors.
async fn fetch_jwks(
client: &reqwest_middleware::ClientWithMiddleware,
jwks_url: url::Url,
) -> Option<jose_jwk::JwkSet> {
let req = client.get(jwks_url.clone());
// TODO(conrad): We need to filter out URLs that point to local resources. Public internet only.
let resp = req.send().await.and_then(|r| {
r.error_for_status()
.map_err(reqwest_middleware::Error::Reqwest)
});
let resp = match resp {
Ok(r) => r,
// TODO: should we re-insert JWKs if we want to keep this JWKs URL?
// I expect these failures would be quite sparse.
Err(e) => {
tracing::warn!(url=?jwks_url, error=?e, "could not fetch JWKs");
return None;
}
};
let resp: http::Response<reqwest::Body> = resp.into();
let bytes = match read_body_with_limit(resp.into_body(), MAX_JWK_BODY_SIZE).await {
Ok(bytes) => bytes,
Err(e) => {
tracing::warn!(url=?jwks_url, error=?e, "could not decode JWKs");
return None;
}
};
let jwks = match serde_json::from_slice::<JwkSet>(&bytes) {
Ok(jwks) => jwks,
Err(e) => {
tracing::warn!(url=?jwks_url, error=?e, "could not decode JWKs");
return None;
}
};
// `jose_jwk::Jwk` is quite large (288 bytes). Let's not pre-allocate for what we don't need.
//
// Even though we limit our responses to 64KiB, we could still receive a payload like
// `{"keys":[` + repeat(`0`).take(30000).join(`,`) + `]}`. Parsing this as `RawValue` uses 468KiB.
// Pre-allocating the corresponding `Vec::<jose_jwk::Jwk>::with_capacity(30000)` uses 8.2MiB.
let mut keys = vec![];
let mut failed = 0;
for key in jwks.keys {
let key = match serde_json::from_str::<jose_jwk::Jwk>(key.get()) {
Ok(key) => key,
Err(e) => {
tracing::debug!(url=?jwks_url, failed=?e, "could not decode JWK");
failed += 1;
continue;
}
};
// if `use` (called `cls` in rust) is specified to be something other than signing,
// we can skip storing it.
if key
.prm
.cls
.as_ref()
.is_some_and(|c| *c != jose_jwk::Class::Signing)
{
continue;
}
keys.push(key);
}
keys.shrink_to_fit();
if failed > 0 {
tracing::warn!(url=?jwks_url, failed, "could not decode JWKs");
}
if keys.is_empty() {
tracing::warn!(url=?jwks_url, "no valid JWKs found inside the response body");
return None;
}
Some(jose_jwk::JwkSet { keys })
}
impl JwkCacheEntryLock {
async fn acquire_permit<'a>(self: &'a Arc<Self>) -> JwkRenewalPermit<'a> {
JwkRenewalPermit::acquire_permit(self).await
@@ -166,87 +253,15 @@ impl JwkCacheEntryLock {
// TODO(conrad): run concurrently
// TODO(conrad): strip the JWKs urls (should be checked by cplane as well - cloud#16284)
for rule in rules {
let req = client.get(rule.jwks_url.clone());
// TODO(conrad): eventually switch to using reqwest_middleware/`new_client_with_timeout`.
// TODO(conrad): We need to filter out URLs that point to local resources. Public internet only.
match req.send().await.and_then(|r| {
r.error_for_status()
.map_err(reqwest_middleware::Error::Reqwest)
}) {
// todo: should we re-insert JWKs if we want to keep this JWKs URL?
// I expect these failures would be quite sparse.
Err(e) => tracing::warn!(url=?rule.jwks_url, error=?e, "could not fetch JWKs"),
Ok(r) => {
let resp: http::Response<reqwest::Body> = r.into();
let bytes = match read_body_with_limit(resp.into_body(), MAX_JWK_BODY_SIZE)
.await
{
Ok(bytes) => bytes,
Err(e) => {
tracing::warn!(url=?rule.jwks_url, error=?e, "could not decode JWKs");
continue;
}
};
match serde_json::from_slice::<JwkSet>(&bytes) {
Err(e) => {
tracing::warn!(url=?rule.jwks_url, error=?e, "could not decode JWKs");
}
Ok(jwks) => {
// size_of::<&RawValue>() == 16
// size_of::<jose_jwk::Jwk>() == 288
// better to not pre-allocate this as it might be pretty large - especially if it has many
// keys we don't want or need.
// trivial 'attack': `{"keys":[` + repeat(`0`).take(30000).join(`,`) + `]}`
// this would consume 8MiB just like that!
let mut keys = vec![];
let mut failed = 0;
for key in jwks.keys {
match serde_json::from_str::<jose_jwk::Jwk>(key.get()) {
Ok(key) => {
// if `use` (called `cls` in rust) is specified to be something other than signing,
// we can skip storing it.
if key
.prm
.cls
.as_ref()
.is_some_and(|c| *c != jose_jwk::Class::Signing)
{
continue;
}
keys.push(key);
}
Err(e) => {
tracing::debug!(url=?rule.jwks_url, failed=?e, "could not decode JWK");
failed += 1;
}
}
}
keys.shrink_to_fit();
if failed > 0 {
tracing::warn!(url=?rule.jwks_url, failed, "could not decode JWKs");
}
if keys.is_empty() {
tracing::warn!(url=?rule.jwks_url, "no valid JWKs found inside the response body");
continue;
}
let jwks = jose_jwk::JwkSet { keys };
key_sets.insert(
rule.id,
KeySet {
jwks,
audience: rule.audience,
role_names: rule.role_names,
},
);
}
};
}
if let Some(jwks) = fetch_jwks(client, rule.jwks_url).await {
key_sets.insert(
rule.id,
KeySet {
jwks,
audience: rule.audience,
role_names: rule.role_names,
},
);
}
}

View File

@@ -6,7 +6,6 @@ pub mod local;
use std::net::IpAddr;
use std::sync::Arc;
use std::time::Duration;
pub use console_redirect::ConsoleRedirectBackend;
pub(crate) use console_redirect::ConsoleRedirectError;
@@ -30,7 +29,7 @@ use crate::intern::EndpointIdInt;
use crate::metrics::Metrics;
use crate::proxy::connect_compute::ComputeConnectBackend;
use crate::proxy::NeonOptions;
use crate::rate_limiter::{BucketRateLimiter, EndpointRateLimiter, RateBucketInfo};
use crate::rate_limiter::{BucketRateLimiter, EndpointRateLimiter};
use crate::stream::Stream;
use crate::types::{EndpointCacheKey, EndpointId, RoleName};
use crate::{scram, stream};
@@ -192,21 +191,6 @@ impl MaskedIp {
// This can't be just per IP because that would limit some PaaS that share IP addresses
pub type AuthRateLimiter = BucketRateLimiter<(EndpointIdInt, MaskedIp)>;
impl RateBucketInfo {
/// All of these are per endpoint-maskedip pair.
/// Context: 4096 rounds of pbkdf2 take about 1ms of cpu time to execute (1 milli-cpu-second or 1mcpus).
///
/// First bucket: 1000mcpus total per endpoint-ip pair
/// * 4096000 requests per second with 1 hash rounds.
/// * 1000 requests per second with 4096 hash rounds.
/// * 6.8 requests per second with 600000 hash rounds.
pub const DEFAULT_AUTH_SET: [Self; 3] = [
Self::new(1000 * 4096, Duration::from_secs(1)),
Self::new(600 * 4096, Duration::from_secs(60)),
Self::new(300 * 4096, Duration::from_secs(600)),
];
}
impl AuthenticationConfig {
pub(crate) fn check_rate_limit(
&self,

View File

@@ -428,8 +428,9 @@ async fn main() -> anyhow::Result<()> {
)?))),
None => None,
};
let cancellation_handler = Arc::new(CancellationHandler::<
Option<Arc<tokio::sync::Mutex<RedisPublisherClient>>>,
Option<Arc<Mutex<RedisPublisherClient>>>,
>::new(
cancel_map.clone(),
redis_publisher,

View File

@@ -10,16 +10,23 @@ use tokio_postgres::{CancelToken, NoTls};
use tracing::{debug, info};
use uuid::Uuid;
use crate::auth::{check_peer_addr_is_in_list, IpPattern};
use crate::error::ReportableError;
use crate::metrics::{CancellationRequest, CancellationSource, Metrics};
use crate::rate_limiter::LeakyBucketRateLimiter;
use crate::redis::cancellation_publisher::{
CancellationPublisher, CancellationPublisherMut, RedisPublisherClient,
};
use std::net::IpAddr;
use ipnet::{IpNet, Ipv4Net, Ipv6Net};
pub type CancelMap = Arc<DashMap<CancelKeyData, Option<CancelClosure>>>;
pub type CancellationHandlerMain = CancellationHandler<Option<Arc<Mutex<RedisPublisherClient>>>>;
pub(crate) type CancellationHandlerMainInternal = Option<Arc<Mutex<RedisPublisherClient>>>;
type IpSubnetKey = IpNet;
/// Enables serving `CancelRequest`s.
///
/// If `CancellationPublisher` is available, cancel request will be used to publish the cancellation key to other proxy instances.
@@ -29,14 +36,23 @@ pub struct CancellationHandler<P> {
/// This field used for the monitoring purposes.
/// Represents the source of the cancellation request.
from: CancellationSource,
// rate limiter of cancellation requests
limiter: Arc<std::sync::Mutex<LeakyBucketRateLimiter<IpSubnetKey>>>,
}
#[derive(Debug, Error)]
pub(crate) enum CancelError {
#[error("{0}")]
IO(#[from] std::io::Error),
#[error("{0}")]
Postgres(#[from] tokio_postgres::Error),
#[error("rate limit exceeded")]
RateLimit,
#[error("IP is not allowed")]
IpNotAllowed,
}
impl ReportableError for CancelError {
@@ -47,6 +63,8 @@ impl ReportableError for CancelError {
crate::error::ErrorKind::Postgres
}
CancelError::Postgres(_) => crate::error::ErrorKind::Compute,
CancelError::RateLimit => crate::error::ErrorKind::RateLimit,
CancelError::IpNotAllowed => crate::error::ErrorKind::User,
}
}
}
@@ -79,13 +97,36 @@ impl<P: CancellationPublisher> CancellationHandler<P> {
cancellation_handler: self,
}
}
/// Try to cancel a running query for the corresponding connection.
/// If the cancellation key is not found, it will be published to Redis.
/// check_allowed - if true, check if the IP is allowed to cancel the query
pub(crate) async fn cancel_session(
&self,
key: CancelKeyData,
session_id: Uuid,
peer_addr: &IpAddr,
check_allowed: bool,
) -> Result<(), CancelError> {
// TODO: check for unspecified address is only for backward compatibility, should be removed
if !peer_addr.is_unspecified() {
let subnet_key = match *peer_addr {
IpAddr::V4(ip) => IpNet::V4(Ipv4Net::new_assert(ip, 24).trunc()), // use defaut mask here
IpAddr::V6(ip) => IpNet::V6(Ipv6Net::new_assert(ip, 64).trunc()),
};
if !self.limiter.lock().unwrap().check(subnet_key, 1) {
tracing::debug!("Rate limit exceeded. Skipping cancellation message");
Metrics::get()
.proxy
.cancellation_requests_total
.inc(CancellationRequest {
source: self.from,
kind: crate::metrics::CancellationOutcome::RateLimitExceeded,
});
return Err(CancelError::RateLimit);
}
}
// NB: we should immediately release the lock after cloning the token.
let Some(cancel_closure) = self.map.get(&key).and_then(|x| x.clone()) else {
tracing::warn!("query cancellation key not found: {key}");
@@ -96,7 +137,13 @@ impl<P: CancellationPublisher> CancellationHandler<P> {
source: self.from,
kind: crate::metrics::CancellationOutcome::NotFound,
});
match self.client.try_publish(key, session_id).await {
if session_id == Uuid::nil() {
// was already published, do not publish it again
return Ok(());
}
match self.client.try_publish(key, session_id, *peer_addr).await {
Ok(()) => {} // do nothing
Err(e) => {
return Err(CancelError::IO(std::io::Error::new(
@@ -107,6 +154,13 @@ impl<P: CancellationPublisher> CancellationHandler<P> {
}
return Ok(());
};
if check_allowed
&& !check_peer_addr_is_in_list(peer_addr, cancel_closure.ip_allowlist.as_slice())
{
return Err(CancelError::IpNotAllowed);
}
Metrics::get()
.proxy
.cancellation_requests_total
@@ -135,13 +189,29 @@ impl CancellationHandler<()> {
map,
client: (),
from,
limiter: Arc::new(std::sync::Mutex::new(
LeakyBucketRateLimiter::<IpSubnetKey>::new_with_shards(
LeakyBucketRateLimiter::<IpSubnetKey>::DEFAULT,
64,
),
)),
}
}
}
impl<P: CancellationPublisherMut> CancellationHandler<Option<Arc<Mutex<P>>>> {
pub fn new(map: CancelMap, client: Option<Arc<Mutex<P>>>, from: CancellationSource) -> Self {
Self { map, client, from }
Self {
map,
client,
from,
limiter: Arc::new(std::sync::Mutex::new(
LeakyBucketRateLimiter::<IpSubnetKey>::new_with_shards(
LeakyBucketRateLimiter::<IpSubnetKey>::DEFAULT,
64,
),
)),
}
}
}
@@ -152,13 +222,19 @@ impl<P: CancellationPublisherMut> CancellationHandler<Option<Arc<Mutex<P>>>> {
pub struct CancelClosure {
socket_addr: SocketAddr,
cancel_token: CancelToken,
ip_allowlist: Vec<IpPattern>,
}
impl CancelClosure {
pub(crate) fn new(socket_addr: SocketAddr, cancel_token: CancelToken) -> Self {
pub(crate) fn new(
socket_addr: SocketAddr,
cancel_token: CancelToken,
ip_allowlist: Vec<IpPattern>,
) -> Self {
Self {
socket_addr,
cancel_token,
ip_allowlist,
}
}
/// Cancels the query running on user's compute node.
@@ -168,6 +244,9 @@ impl CancelClosure {
debug!("query was cancelled");
Ok(())
}
pub(crate) fn set_ip_allowlist(&mut self, ip_allowlist: Vec<IpPattern>) {
self.ip_allowlist = ip_allowlist;
}
}
/// Helper for registering query cancellation tokens.
@@ -229,6 +308,8 @@ mod tests {
cancel_key: 0,
},
Uuid::new_v4(),
&("127.0.0.1".parse().unwrap()),
true,
)
.await
.unwrap();

View File

@@ -342,7 +342,7 @@ impl ConnCfg {
// NB: CancelToken is supposed to hold socket_addr, but we use connect_raw.
// Yet another reason to rework the connection establishing code.
let cancel_closure = CancelClosure::new(socket_addr, client.cancel_token());
let cancel_closure = CancelClosure::new(socket_addr, client.cancel_token(), vec![]);
let connection = PostgresConnection {
stream,

View File

@@ -156,16 +156,21 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
let request_gauge = metrics.connection_requests.guard(proto);
let tls = config.tls_config.as_ref();
let record_handshake_error = !ctx.has_private_peer_addr();
let pause = ctx.latency_timer_pause(crate::metrics::Waiting::Client);
let do_handshake = handshake(ctx, stream, tls, record_handshake_error);
let (mut stream, params) =
match tokio::time::timeout(config.handshake_timeout, do_handshake).await?? {
HandshakeData::Startup(stream, params) => (stream, params),
HandshakeData::Cancel(cancel_key_data) => {
return Ok(cancellation_handler
.cancel_session(cancel_key_data, ctx.session_id())
.cancel_session(
cancel_key_data,
ctx.session_id(),
&ctx.peer_addr(),
config.authentication_config.ip_allowlist_check_enabled,
)
.await
.map(|()| None)?)
}
@@ -174,7 +179,7 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
ctx.set_db_options(params.clone());
let user_info = match backend
let (user_info, ip_allowlist) = match backend
.authenticate(ctx, &config.authentication_config, &mut stream)
.await
{
@@ -198,6 +203,8 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
.or_else(|e| stream.throw_error(e))
.await?;
node.cancel_closure
.set_ip_allowlist(ip_allowlist.unwrap_or_default());
let session = cancellation_handler.get_session();
prepare_client_connection(&node, &session, &mut stream).await?;

View File

@@ -351,6 +351,7 @@ pub enum CancellationSource {
pub enum CancellationOutcome {
NotFound,
Found,
RateLimitExceeded,
}
#[derive(LabelGroup)]

View File

@@ -268,12 +268,18 @@ pub(crate) async fn handle_client<S: AsyncRead + AsyncWrite + Unpin>(
let record_handshake_error = !ctx.has_private_peer_addr();
let pause = ctx.latency_timer_pause(crate::metrics::Waiting::Client);
let do_handshake = handshake(ctx, stream, mode.handshake_tls(tls), record_handshake_error);
let (mut stream, params) =
match tokio::time::timeout(config.handshake_timeout, do_handshake).await?? {
HandshakeData::Startup(stream, params) => (stream, params),
HandshakeData::Cancel(cancel_key_data) => {
return Ok(cancellation_handler
.cancel_session(cancel_key_data, ctx.session_id())
.cancel_session(
cancel_key_data,
ctx.session_id(),
&ctx.peer_addr(),
config.authentication_config.ip_allowlist_check_enabled,
)
.await
.map(|()| None)?)
}

View File

@@ -14,13 +14,13 @@ use tracing::info;
use crate::intern::EndpointIdInt;
pub(crate) struct GlobalRateLimiter {
pub struct GlobalRateLimiter {
data: Vec<RateBucket>,
info: Vec<RateBucketInfo>,
}
impl GlobalRateLimiter {
pub(crate) fn new(info: Vec<RateBucketInfo>) -> Self {
pub fn new(info: Vec<RateBucketInfo>) -> Self {
Self {
data: vec![
RateBucket {
@@ -34,7 +34,7 @@ impl GlobalRateLimiter {
}
/// Check that number of connections is below `max_rps` rps.
pub(crate) fn check(&mut self) -> bool {
pub fn check(&mut self) -> bool {
let now = Instant::now();
let should_allow_request = self
@@ -137,6 +137,19 @@ impl RateBucketInfo {
Self::new(200, Duration::from_secs(600)),
];
/// All of these are per endpoint-maskedip pair.
/// Context: 4096 rounds of pbkdf2 take about 1ms of cpu time to execute (1 milli-cpu-second or 1mcpus).
///
/// First bucket: 1000mcpus total per endpoint-ip pair
/// * 4096000 requests per second with 1 hash rounds.
/// * 1000 requests per second with 4096 hash rounds.
/// * 6.8 requests per second with 600000 hash rounds.
pub const DEFAULT_AUTH_SET: [Self; 3] = [
Self::new(1000 * 4096, Duration::from_secs(1)),
Self::new(600 * 4096, Duration::from_secs(60)),
Self::new(300 * 4096, Duration::from_secs(600)),
];
pub fn rps(&self) -> f64 {
(self.max_rpi as f64) / self.interval.as_secs_f64()
}

View File

@@ -8,5 +8,4 @@ pub(crate) use limit_algorithm::aimd::Aimd;
pub(crate) use limit_algorithm::{
DynamicLimiter, Outcome, RateLimitAlgorithm, RateLimiterConfig, Token,
};
pub(crate) use limiter::GlobalRateLimiter;
pub use limiter::{BucketRateLimiter, RateBucketInfo, WakeComputeRateLimiter};
pub use limiter::{BucketRateLimiter, GlobalRateLimiter, RateBucketInfo, WakeComputeRateLimiter};

View File

@@ -1,5 +1,6 @@
use std::sync::Arc;
use core::net::IpAddr;
use pq_proto::CancelKeyData;
use redis::AsyncCommands;
use tokio::sync::Mutex;
@@ -15,6 +16,7 @@ pub trait CancellationPublisherMut: Send + Sync + 'static {
&mut self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()>;
}
@@ -24,6 +26,7 @@ pub trait CancellationPublisher: Send + Sync + 'static {
&self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()>;
}
@@ -32,6 +35,7 @@ impl CancellationPublisher for () {
&self,
_cancel_key_data: CancelKeyData,
_session_id: Uuid,
_peer_addr: IpAddr,
) -> anyhow::Result<()> {
Ok(())
}
@@ -42,8 +46,10 @@ impl<P: CancellationPublisher> CancellationPublisherMut for P {
&mut self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
<P as CancellationPublisher>::try_publish(self, cancel_key_data, session_id).await
<P as CancellationPublisher>::try_publish(self, cancel_key_data, session_id, peer_addr)
.await
}
}
@@ -52,9 +58,10 @@ impl<P: CancellationPublisher> CancellationPublisher for Option<P> {
&self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
if let Some(p) = self {
p.try_publish(cancel_key_data, session_id).await
p.try_publish(cancel_key_data, session_id, peer_addr).await
} else {
Ok(())
}
@@ -66,10 +73,11 @@ impl<P: CancellationPublisherMut> CancellationPublisher for Arc<Mutex<P>> {
&self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
self.lock()
.await
.try_publish(cancel_key_data, session_id)
.try_publish(cancel_key_data, session_id, peer_addr)
.await
}
}
@@ -97,11 +105,13 @@ impl RedisPublisherClient {
&mut self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
let payload = serde_json::to_string(&Notification::Cancel(CancelSession {
region_id: Some(self.region_id.clone()),
cancel_key_data,
session_id,
peer_addr: Some(peer_addr),
}))?;
let _: () = self.client.publish(PROXY_CHANNEL_NAME, payload).await?;
Ok(())
@@ -120,13 +130,14 @@ impl RedisPublisherClient {
&mut self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
// TODO: review redundant error duplication logs.
if !self.limiter.check() {
tracing::info!("Rate limit exceeded. Skipping cancellation message");
return Err(anyhow::anyhow!("Rate limit exceeded"));
}
match self.publish(cancel_key_data, session_id).await {
match self.publish(cancel_key_data, session_id, peer_addr).await {
Ok(()) => return Ok(()),
Err(e) => {
tracing::error!("failed to publish a message: {e}");
@@ -134,7 +145,7 @@ impl RedisPublisherClient {
}
tracing::info!("Publisher is disconnected. Reconnectiong...");
self.try_connect().await?;
self.publish(cancel_key_data, session_id).await
self.publish(cancel_key_data, session_id, peer_addr).await
}
}
@@ -143,9 +154,13 @@ impl CancellationPublisherMut for RedisPublisherClient {
&mut self,
cancel_key_data: CancelKeyData,
session_id: Uuid,
peer_addr: IpAddr,
) -> anyhow::Result<()> {
tracing::info!("publishing cancellation key to Redis");
match self.try_publish_internal(cancel_key_data, session_id).await {
match self
.try_publish_internal(cancel_key_data, session_id, peer_addr)
.await
{
Ok(()) => {
tracing::debug!("cancellation key successfuly published to Redis");
Ok(())

View File

@@ -60,6 +60,7 @@ pub(crate) struct CancelSession {
pub(crate) region_id: Option<String>,
pub(crate) cancel_key_data: CancelKeyData,
pub(crate) session_id: Uuid,
pub(crate) peer_addr: Option<std::net::IpAddr>,
}
fn deserialize_json_string<'de, D, T>(deserializer: D) -> Result<T, D::Error>
@@ -137,10 +138,20 @@ impl<C: ProjectInfoCache + Send + Sync + 'static> MessageHandler<C> {
return Ok(());
}
}
// TODO: Remove unspecified peer_addr after the complete migration to the new format
let peer_addr = cancel_session
.peer_addr
.unwrap_or(std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED));
// This instance of cancellation_handler doesn't have a RedisPublisherClient so it can't publish the message.
match self
.cancellation_handler
.cancel_session(cancel_session.cancel_key_data, uuid::Uuid::nil())
.cancel_session(
cancel_session.cancel_key_data,
uuid::Uuid::nil(),
&peer_addr,
cancel_session.peer_addr.is_some(),
)
.await
{
Ok(()) => {}
@@ -335,6 +346,7 @@ mod tests {
cancel_key_data,
region_id: None,
session_id: uuid,
peer_addr: None,
});
let text = serde_json::to_string(&msg)?;
let result: Notification = serde_json::from_str(&text)?;
@@ -344,6 +356,7 @@ mod tests {
cancel_key_data,
region_id: Some("region".to_string()),
session_id: uuid,
peer_addr: None,
});
let text = serde_json::to_string(&msg)?;
let result: Notification = serde_json::from_str(&text)?;

View File

@@ -14,7 +14,7 @@ use hyper::{header, HeaderMap, Request, Response, StatusCode};
use pq_proto::StartupMessageParamsBuilder;
use serde::Serialize;
use serde_json::Value;
use tokio::time;
use tokio::time::{self, Instant};
use tokio_postgres::error::{DbError, ErrorPosition, SqlState};
use tokio_postgres::{GenericClient, IsolationLevel, NoTls, ReadyForQueryStatus, Transaction};
use tokio_util::sync::CancellationToken;
@@ -980,10 +980,11 @@ async fn query_to_json<T: GenericClient>(
current_size: &mut usize,
parsed_headers: HttpHeaders,
) -> Result<(ReadyForQueryStatus, impl Serialize), SqlOverHttpError> {
info!("executing query");
let query_start = Instant::now();
let query_params = data.params;
let mut row_stream = std::pin::pin!(client.query_raw_txt(&data.query, query_params).await?);
info!("finished executing query");
let query_acknowledged = Instant::now();
// Manually drain the stream into a vector to leave row_stream hanging
// around to get a command tag. Also check that the response is not too
@@ -1002,6 +1003,7 @@ async fn query_to_json<T: GenericClient>(
}
}
let query_resp_end = Instant::now();
let ready = row_stream.ready_status();
// grab the command tag and number of rows affected
@@ -1021,7 +1023,9 @@ async fn query_to_json<T: GenericClient>(
rows = rows.len(),
?ready,
command_tag,
"finished reading rows"
acknowledgement = ?(query_acknowledged - query_start),
response = ?(query_resp_end - query_start),
"finished executing query"
);
let columns_len = row_stream.columns().len();

View File

@@ -4,7 +4,7 @@ authors = []
package-mode = false
[tool.poetry.dependencies]
python = "^3.9"
python = "^3.11"
pytest = "^7.4.4"
psycopg2-binary = "^2.9.10"
typing-extensions = "^4.6.1"
@@ -51,7 +51,7 @@ testcontainers = "^4.8.1"
jsonnet = "^0.20.0"
[tool.poetry.group.dev.dependencies]
mypy = "==1.3.0"
mypy = "==1.13.0"
ruff = "^0.7.0"
[build-system]
@@ -89,7 +89,7 @@ module = [
ignore_missing_imports = true
[tool.ruff]
target-version = "py39"
target-version = "py311"
extend-exclude = [
"vendor/",
"target/",
@@ -108,6 +108,3 @@ select = [
"B", # bugbear
"UP", # pyupgrade
]
[tool.ruff.lint.pyupgrade]
keep-runtime-typing = true # Remove this stanza when we require Python 3.10

View File

@@ -30,6 +30,7 @@ once_cell.workspace = true
parking_lot.workspace = true
postgres.workspace = true
postgres-protocol.workspace = true
pprof.workspace = true
rand.workspace = true
regex.workspace = true
scopeguard.workspace = true

View File

@@ -14,6 +14,10 @@ cargo bench --package safekeeper --bench receive_wal process_msg/fsync=false
# List available benchmarks.
cargo bench --package safekeeper --benches -- --list
# Generate flamegraph profiles using pprof-rs, profiling for 10 seconds.
# Output in target/criterion/*/profile/flamegraph.svg.
cargo bench --package safekeeper --bench receive_wal process_msg/fsync=false --profile-time 10
```
Additional charts and statistics are available in `target/criterion/report/index.html`.

View File

@@ -10,6 +10,7 @@ use camino_tempfile::tempfile;
use criterion::{criterion_group, criterion_main, BatchSize, Bencher, Criterion};
use itertools::Itertools as _;
use postgres_ffi::v17::wal_generator::{LogicalMessageGenerator, WalGenerator};
use pprof::criterion::{Output, PProfProfiler};
use safekeeper::receive_wal::{self, WalAcceptor};
use safekeeper::safekeeper::{
AcceptorProposerMessage, AppendRequest, AppendRequestHeader, ProposerAcceptorMessage,
@@ -24,8 +25,9 @@ const GB: usize = 1024 * MB;
// Register benchmarks with Criterion.
criterion_group!(
benches,
bench_process_msg,
name = benches;
config = Criterion::default().with_profiler(PProfProfiler::new(100, Output::Flamegraph(None)));
targets = bench_process_msg,
bench_wal_acceptor,
bench_wal_acceptor_throughput,
bench_file_write

View File

@@ -1,7 +1,6 @@
use hyper::{Body, Request, Response, StatusCode, Uri};
use once_cell::sync::Lazy;
use hyper::{Body, Request, Response, StatusCode};
use serde::{Deserialize, Serialize};
use std::collections::{HashMap, HashSet};
use std::collections::HashMap;
use std::fmt;
use std::io::Write as _;
use std::str::FromStr;
@@ -14,7 +13,9 @@ use tokio_stream::wrappers::ReceiverStream;
use tokio_util::sync::CancellationToken;
use tracing::{info_span, Instrument};
use utils::failpoint_support::failpoints_handler;
use utils::http::endpoint::{prometheus_metrics_handler, request_span, ChannelWriter};
use utils::http::endpoint::{
profile_cpu_handler, prometheus_metrics_handler, request_span, ChannelWriter,
};
use utils::http::request::parse_query_param;
use postgres_ffi::WAL_SEGMENT_SIZE;
@@ -572,14 +573,8 @@ pub fn make_router(conf: SafeKeeperConf) -> RouterBuilder<hyper::Body, ApiError>
let mut router = endpoint::make_router();
if conf.http_auth.is_some() {
router = router.middleware(auth_middleware(|request| {
#[allow(clippy::mutable_key_type)]
static ALLOWLIST_ROUTES: Lazy<HashSet<Uri>> = Lazy::new(|| {
["/v1/status", "/metrics"]
.iter()
.map(|v| v.parse().unwrap())
.collect()
});
if ALLOWLIST_ROUTES.contains(request.uri()) {
const ALLOWLIST_ROUTES: &[&str] = &["/v1/status", "/metrics", "/profile/cpu"];
if ALLOWLIST_ROUTES.contains(&request.uri().path()) {
None
} else {
// Option<Arc<SwappableJwtAuth>> is always provided as data below, hence unwrap().
@@ -598,6 +593,7 @@ pub fn make_router(conf: SafeKeeperConf) -> RouterBuilder<hyper::Body, ApiError>
.data(Arc::new(conf))
.data(auth)
.get("/metrics", |r| request_span(r, prometheus_metrics_handler))
.get("/profile/cpu", |r| request_span(r, profile_cpu_handler))
.get("/v1/status", |r| request_span(r, status_handler))
.put("/v1/failpoints", |r| {
request_span(r, move |r| async {

View File

@@ -14,7 +14,7 @@ import psycopg2.extras
import toml
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
FLAKY_TESTS_QUERY = """
SELECT
@@ -65,7 +65,7 @@ def main(args: argparse.Namespace):
pageserver_virtual_file_io_engine_parameter = ""
# re-use existing records of flaky tests from before parametrization by compaction_algorithm
def get_pageserver_default_tenant_config_compaction_algorithm() -> Optional[dict[str, Any]]:
def get_pageserver_default_tenant_config_compaction_algorithm() -> dict[str, Any] | None:
"""Duplicated from parametrize.py"""
toml_table = os.getenv("PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM")
if toml_table is None:

View File

@@ -194,9 +194,11 @@ async def main_impl(args, report_out, client: Client):
tenant_ids = await client.get_tenant_ids()
get_timeline_id_coros = [client.get_timeline_ids(tenant_id) for tenant_id in tenant_ids]
gathered = await asyncio.gather(*get_timeline_id_coros, return_exceptions=True)
assert len(tenant_ids) == len(gathered)
tenant_and_timline_ids = []
for tid, tlids in zip(tenant_ids, gathered):
for tid, tlids in zip(tenant_ids, gathered, strict=True):
# TODO: add error handling if tlids isinstance(Exception)
assert isinstance(tlids, list)
for tlid in tlids:
tenant_and_timline_ids.append((tid, tlid))
elif len(comps) == 1:

View File

@@ -11,7 +11,7 @@ import re
import sys
from contextlib import contextmanager
from dataclasses import dataclass
from datetime import datetime, timezone
from datetime import UTC, datetime
from pathlib import Path
import backoff
@@ -140,8 +140,8 @@ def ingest_test_result(
suite=labels["suite"],
name=unparametrized_name,
status=test["status"],
started_at=datetime.fromtimestamp(test["time"]["start"] / 1000, tz=timezone.utc),
stopped_at=datetime.fromtimestamp(test["time"]["stop"] / 1000, tz=timezone.utc),
started_at=datetime.fromtimestamp(test["time"]["start"] / 1000, tz=UTC),
stopped_at=datetime.fromtimestamp(test["time"]["stop"] / 1000, tz=UTC),
duration=test["time"]["duration"],
flaky=test["flaky"] or test["retriesStatusChange"],
arch=arch,

View File

@@ -113,7 +113,7 @@ The test suite has a Python enum with equal name but different meaning:
```python
@enum.unique
class RemoteStorageKind(str, enum.Enum):
class RemoteStorageKind(StrEnum):
LOCAL_FS = "local_fs"
MOCK_S3 = "mock_s3"
REAL_S3 = "real_s3"

View File

@@ -1,7 +1,7 @@
from __future__ import annotations
from dataclasses import dataclass
from enum import Enum
from enum import StrEnum
from typing import Any
import jwt
@@ -37,8 +37,7 @@ class AuthKeys:
return self.generate_token(scope=TokenScope.TENANT, tenant_id=str(tenant_id))
# TODO: Replace with `StrEnum` when we upgrade to python 3.11
class TokenScope(str, Enum):
class TokenScope(StrEnum):
ADMIN = "admin"
PAGE_SERVER_API = "pageserverapi"
GENERATIONS_API = "generations_api"

View File

@@ -9,6 +9,7 @@ import re
import timeit
from contextlib import contextmanager
from datetime import datetime
from enum import StrEnum
from pathlib import Path
from typing import TYPE_CHECKING
@@ -24,8 +25,7 @@ from fixtures.log_helper import log
from fixtures.neon_fixtures import NeonPageserver
if TYPE_CHECKING:
from collections.abc import Iterator, Mapping
from typing import Callable, Optional
from collections.abc import Callable, Iterator, Mapping
"""
@@ -61,7 +61,7 @@ class PgBenchRunResult:
number_of_threads: int
number_of_transactions_actually_processed: int
latency_average: float
latency_stddev: Optional[float]
latency_stddev: float | None
tps: float
run_duration: float
run_start_timestamp: int
@@ -171,14 +171,14 @@ _PGBENCH_INIT_EXTRACTORS: Mapping[str, re.Pattern[str]] = {
@dataclasses.dataclass
class PgBenchInitResult:
total: Optional[float]
drop_tables: Optional[float]
create_tables: Optional[float]
client_side_generate: Optional[float]
server_side_generate: Optional[float]
vacuum: Optional[float]
primary_keys: Optional[float]
foreign_keys: Optional[float]
total: float | None
drop_tables: float | None
create_tables: float | None
client_side_generate: float | None
server_side_generate: float | None
vacuum: float | None
primary_keys: float | None
foreign_keys: float | None
duration: float
start_timestamp: int
end_timestamp: int
@@ -196,7 +196,7 @@ class PgBenchInitResult:
last_line = stderr.splitlines()[-1]
timings: dict[str, Optional[float]] = {}
timings: dict[str, float | None] = {}
last_line_items = re.split(r"\(|\)|,", last_line)
for item in last_line_items:
for key, regex in _PGBENCH_INIT_EXTRACTORS.items():
@@ -227,7 +227,7 @@ class PgBenchInitResult:
@enum.unique
class MetricReport(str, enum.Enum): # str is a hack to make it json serializable
class MetricReport(StrEnum): # str is a hack to make it json serializable
# this means that this is a constant test parameter
# like number of transactions, or number of clients
TEST_PARAM = "test_param"
@@ -256,9 +256,8 @@ class NeonBenchmarker:
metric_value: float,
unit: str,
report: MetricReport,
labels: Optional[
dict[str, str]
] = None, # use this to associate additional key/value pairs in json format for associated Neon object IDs like project ID with the metric
# use this to associate additional key/value pairs in json format for associated Neon object IDs like project ID with the metric
labels: dict[str, str] | None = None,
):
"""
Record a benchmark result.
@@ -412,7 +411,7 @@ class NeonBenchmarker:
self,
pageserver: NeonPageserver,
metric_name: str,
label_filters: Optional[dict[str, str]] = None,
label_filters: dict[str, str] | None = None,
) -> int:
"""Fetch the value of given int counter from pageserver metrics."""
all_metrics = pageserver.http_client().get_metrics()

View File

@@ -2,14 +2,14 @@ from __future__ import annotations
import random
from dataclasses import dataclass
from enum import Enum
from enum import StrEnum
from functools import total_ordering
from typing import TYPE_CHECKING, TypeVar
from typing_extensions import override
if TYPE_CHECKING:
from typing import Any, Union
from typing import Any
T = TypeVar("T", bound="Id")
@@ -24,7 +24,7 @@ class Lsn:
representation is like "1/0123abcd". See also pg_lsn datatype in Postgres
"""
def __init__(self, x: Union[int, str]):
def __init__(self, x: int | str):
if isinstance(x, int):
self.lsn_int = x
else:
@@ -67,7 +67,7 @@ class Lsn:
return NotImplemented
return self.lsn_int - other.lsn_int
def __add__(self, other: Union[int, Lsn]) -> Lsn:
def __add__(self, other: int | Lsn) -> Lsn:
if isinstance(other, int):
return Lsn(self.lsn_int + other)
elif isinstance(other, Lsn):
@@ -190,8 +190,23 @@ class TenantTimelineId:
)
# Workaround for compat with python 3.9, which does not have `typing.Self`
TTenantShardId = TypeVar("TTenantShardId", bound="TenantShardId")
@dataclass
class ShardIndex:
shard_number: int
shard_count: int
# cf impl Display for ShardIndex
@override
def __str__(self) -> str:
return f"{self.shard_number:02x}{self.shard_count:02x}"
@classmethod
def parse(cls: type[ShardIndex], input: str) -> ShardIndex:
assert len(input) == 4
return cls(
shard_number=int(input[0:2], 16),
shard_count=int(input[2:4], 16),
)
class TenantShardId:
@@ -202,7 +217,7 @@ class TenantShardId:
assert self.shard_number < self.shard_count or self.shard_count == 0
@classmethod
def parse(cls: type[TTenantShardId], input: str) -> TTenantShardId:
def parse(cls: type[TenantShardId], input: str) -> TenantShardId:
if len(input) == 32:
return cls(
tenant_id=TenantId(input),
@@ -226,6 +241,10 @@ class TenantShardId:
# Unsharded case: equivalent of Rust TenantShardId::unsharded(tenant_id)
return str(self.tenant_id)
@property
def shard_index(self) -> ShardIndex:
return ShardIndex(self.shard_number, self.shard_count)
@override
def __repr__(self):
return self.__str__()
@@ -249,7 +268,6 @@ class TenantShardId:
return hash(self._tuple())
# TODO: Replace with `StrEnum` when we upgrade to python 3.11
class TimelineArchivalState(str, Enum):
class TimelineArchivalState(StrEnum):
ARCHIVED = "Archived"
UNARCHIVED = "Unarchived"

View File

@@ -99,7 +99,7 @@ class PgCompare(ABC):
assert row is not None
assert len(row) == len(pg_stat.columns)
for col, val in zip(pg_stat.columns, row):
for col, val in zip(pg_stat.columns, row, strict=False):
results[f"{pg_stat.table}.{col}"] = int(val)
return results

View File

@@ -12,7 +12,8 @@ from fixtures.common_types import TenantId
from fixtures.log_helper import log
if TYPE_CHECKING:
from typing import Any, Callable, Optional
from collections.abc import Callable
from typing import Any
class ComputeReconfigure:
@@ -20,12 +21,12 @@ class ComputeReconfigure:
self.server = server
self.control_plane_compute_hook_api = f"http://{server.host}:{server.port}/notify-attach"
self.workloads: dict[TenantId, Any] = {}
self.on_notify: Optional[Callable[[Any], None]] = None
self.on_notify: Callable[[Any], None] | None = None
def register_workload(self, workload: Any):
self.workloads[workload.tenant_id] = workload
def register_on_notify(self, fn: Optional[Callable[[Any], None]]):
def register_on_notify(self, fn: Callable[[Any], None] | None):
"""
Add some extra work during a notification, like sleeping to slow things down, or
logging what was notified.
@@ -68,7 +69,7 @@ def compute_reconfigure_listener(make_httpserver: HTTPServer):
# This causes the endpoint to query storage controller for its location, which
# is redundant since we already have it here, but this avoids extending the
# neon_local CLI to take full lists of locations
reconfigure_threads.submit(lambda workload=workload: workload.reconfigure()) # type: ignore[no-any-return]
reconfigure_threads.submit(lambda workload=workload: workload.reconfigure()) # type: ignore[misc]
return Response(status=200)

View File

@@ -31,7 +31,7 @@ from h2.settings import SettingCodes
from typing_extensions import override
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
RequestData = collections.namedtuple("RequestData", ["headers", "data"])
@@ -49,7 +49,7 @@ class H2Protocol(asyncio.Protocol):
def __init__(self):
config = H2Configuration(client_side=False, header_encoding="utf-8")
self.conn = H2Connection(config=config)
self.transport: Optional[asyncio.Transport] = None
self.transport: asyncio.Transport | None = None
self.stream_data: dict[int, RequestData] = {}
self.flow_control_futures: dict[int, asyncio.Future[Any]] = {}
@@ -61,7 +61,7 @@ class H2Protocol(asyncio.Protocol):
self.transport.write(self.conn.data_to_send())
@override
def connection_lost(self, exc: Optional[Exception]):
def connection_lost(self, exc: Exception | None):
for future in self.flow_control_futures.values():
future.cancel()
self.flow_control_futures = {}

View File

@@ -1,16 +1,12 @@
from __future__ import annotations
from collections import defaultdict
from typing import TYPE_CHECKING
from prometheus_client.parser import text_string_to_metric_families
from prometheus_client.samples import Sample
from fixtures.log_helper import log
if TYPE_CHECKING:
from typing import Optional
class Metrics:
metrics: dict[str, list[Sample]]
@@ -20,7 +16,7 @@ class Metrics:
self.metrics = defaultdict(list)
self.name = name
def query_all(self, name: str, filter: Optional[dict[str, str]] = None) -> list[Sample]:
def query_all(self, name: str, filter: dict[str, str] | None = None) -> list[Sample]:
filter = filter or {}
res: list[Sample] = []
@@ -32,7 +28,7 @@ class Metrics:
pass
return res
def query_one(self, name: str, filter: Optional[dict[str, str]] = None) -> Sample:
def query_one(self, name: str, filter: dict[str, str] | None = None) -> Sample:
res = self.query_all(name, filter or {})
assert len(res) == 1, f"expected single sample for {name} {filter}, found {res}"
return res[0]
@@ -47,9 +43,7 @@ class MetricsGetter:
def get_metrics(self) -> Metrics:
raise NotImplementedError()
def get_metric_value(
self, name: str, filter: Optional[dict[str, str]] = None
) -> Optional[float]:
def get_metric_value(self, name: str, filter: dict[str, str] | None = None) -> float | None:
metrics = self.get_metrics()
results = metrics.query_all(name, filter=filter)
if not results:
@@ -59,7 +53,7 @@ class MetricsGetter:
return results[0].value
def get_metrics_values(
self, names: list[str], filter: Optional[dict[str, str]] = None, absence_ok: bool = False
self, names: list[str], filter: dict[str, str] | None = None, absence_ok: bool = False
) -> dict[str, float]:
"""
When fetching multiple named metrics, it is more efficient to use this

View File

@@ -8,7 +8,7 @@ import requests
from fixtures.log_helper import log
if TYPE_CHECKING:
from typing import Any, Literal, Optional
from typing import Any, Literal
from fixtures.pg_version import PgVersion
@@ -40,11 +40,11 @@ class NeonAPI:
def create_project(
self,
pg_version: Optional[PgVersion] = None,
name: Optional[str] = None,
branch_name: Optional[str] = None,
branch_role_name: Optional[str] = None,
branch_database_name: Optional[str] = None,
pg_version: PgVersion | None = None,
name: str | None = None,
branch_name: str | None = None,
branch_role_name: str | None = None,
branch_database_name: str | None = None,
) -> dict[str, Any]:
data: dict[str, Any] = {
"project": {
@@ -179,8 +179,8 @@ class NeonAPI:
def get_connection_uri(
self,
project_id: str,
branch_id: Optional[str] = None,
endpoint_id: Optional[str] = None,
branch_id: str | None = None,
endpoint_id: str | None = None,
database_name: str = "neondb",
role_name: str = "neondb_owner",
pooled: bool = True,
@@ -249,7 +249,7 @@ class NeonAPI:
@final
class NeonApiEndpoint:
def __init__(self, neon_api: NeonAPI, pg_version: PgVersion, project_id: Optional[str]):
def __init__(self, neon_api: NeonAPI, pg_version: PgVersion, project_id: str | None):
self.neon_api = neon_api
self.project_id: str
self.endpoint_id: str

View File

@@ -20,13 +20,9 @@ from fixtures.pg_version import PgVersion
if TYPE_CHECKING:
from typing import (
Any,
Optional,
TypeVar,
cast,
)
T = TypeVar("T")
# Used to be an ABC. abc.ABC removed due to linter without name change.
class AbstractNeonCli:
@@ -36,7 +32,7 @@ class AbstractNeonCli:
Do not use directly, use specific subclasses instead.
"""
def __init__(self, extra_env: Optional[dict[str, str]], binpath: Path):
def __init__(self, extra_env: dict[str, str] | None, binpath: Path):
self.extra_env = extra_env
self.binpath = binpath
@@ -45,7 +41,7 @@ class AbstractNeonCli:
def raw_cli(
self,
arguments: list[str],
extra_env_vars: Optional[dict[str, str]] = None,
extra_env_vars: dict[str, str] | None = None,
check_return_code=True,
timeout=None,
) -> subprocess.CompletedProcess[str]:
@@ -173,7 +169,7 @@ class NeonLocalCli(AbstractNeonCli):
def __init__(
self,
extra_env: Optional[dict[str, str]],
extra_env: dict[str, str] | None,
binpath: Path,
repo_dir: Path,
pg_distrib_dir: Path,
@@ -195,10 +191,10 @@ class NeonLocalCli(AbstractNeonCli):
tenant_id: TenantId,
timeline_id: TimelineId,
pg_version: PgVersion,
conf: Optional[dict[str, Any]] = None,
shard_count: Optional[int] = None,
shard_stripe_size: Optional[int] = None,
placement_policy: Optional[str] = None,
conf: dict[str, Any] | None = None,
shard_count: int | None = None,
shard_stripe_size: int | None = None,
placement_policy: str | None = None,
set_default: bool = False,
):
"""
@@ -302,8 +298,8 @@ class NeonLocalCli(AbstractNeonCli):
tenant_id: TenantId,
timeline_id: TimelineId,
new_branch_name,
ancestor_branch_name: Optional[str] = None,
ancestor_start_lsn: Optional[Lsn] = None,
ancestor_branch_name: str | None = None,
ancestor_start_lsn: Lsn | None = None,
):
cmd = [
"timeline",
@@ -331,8 +327,8 @@ class NeonLocalCli(AbstractNeonCli):
base_lsn: Lsn,
base_tarfile: Path,
pg_version: PgVersion,
end_lsn: Optional[Lsn] = None,
wal_tarfile: Optional[Path] = None,
end_lsn: Lsn | None = None,
wal_tarfile: Path | None = None,
):
cmd = [
"timeline",
@@ -380,7 +376,7 @@ class NeonLocalCli(AbstractNeonCli):
def init(
self,
init_config: dict[str, Any],
force: Optional[str] = None,
force: str | None = None,
) -> subprocess.CompletedProcess[str]:
with tempfile.NamedTemporaryFile(mode="w+") as init_config_tmpfile:
init_config_tmpfile.write(toml.dumps(init_config))
@@ -400,9 +396,9 @@ class NeonLocalCli(AbstractNeonCli):
def storage_controller_start(
self,
timeout_in_seconds: Optional[int] = None,
instance_id: Optional[int] = None,
base_port: Optional[int] = None,
timeout_in_seconds: int | None = None,
instance_id: int | None = None,
base_port: int | None = None,
):
cmd = ["storage_controller", "start"]
if timeout_in_seconds is not None:
@@ -413,7 +409,7 @@ class NeonLocalCli(AbstractNeonCli):
cmd.append(f"--base-port={base_port}")
return self.raw_cli(cmd)
def storage_controller_stop(self, immediate: bool, instance_id: Optional[int] = None):
def storage_controller_stop(self, immediate: bool, instance_id: int | None = None):
cmd = ["storage_controller", "stop"]
if immediate:
cmd.extend(["-m", "immediate"])
@@ -424,8 +420,8 @@ class NeonLocalCli(AbstractNeonCli):
def pageserver_start(
self,
id: int,
extra_env_vars: Optional[dict[str, str]] = None,
timeout_in_seconds: Optional[int] = None,
extra_env_vars: dict[str, str] | None = None,
timeout_in_seconds: int | None = None,
) -> subprocess.CompletedProcess[str]:
start_args = ["pageserver", "start", f"--id={id}"]
if timeout_in_seconds is not None:
@@ -442,9 +438,9 @@ class NeonLocalCli(AbstractNeonCli):
def safekeeper_start(
self,
id: int,
extra_opts: Optional[list[str]] = None,
extra_env_vars: Optional[dict[str, str]] = None,
timeout_in_seconds: Optional[int] = None,
extra_opts: list[str] | None = None,
extra_env_vars: dict[str, str] | None = None,
timeout_in_seconds: int | None = None,
) -> subprocess.CompletedProcess[str]:
if extra_opts is not None:
extra_opts = [f"-e={opt}" for opt in extra_opts]
@@ -457,7 +453,7 @@ class NeonLocalCli(AbstractNeonCli):
)
def safekeeper_stop(
self, id: Optional[int] = None, immediate=False
self, id: int | None = None, immediate=False
) -> subprocess.CompletedProcess[str]:
args = ["safekeeper", "stop"]
if id is not None:
@@ -467,7 +463,7 @@ class NeonLocalCli(AbstractNeonCli):
return self.raw_cli(args)
def storage_broker_start(
self, timeout_in_seconds: Optional[int] = None
self, timeout_in_seconds: int | None = None
) -> subprocess.CompletedProcess[str]:
cmd = ["storage_broker", "start"]
if timeout_in_seconds is not None:
@@ -485,10 +481,10 @@ class NeonLocalCli(AbstractNeonCli):
http_port: int,
tenant_id: TenantId,
pg_version: PgVersion,
endpoint_id: Optional[str] = None,
endpoint_id: str | None = None,
hot_standby: bool = False,
lsn: Optional[Lsn] = None,
pageserver_id: Optional[int] = None,
lsn: Lsn | None = None,
pageserver_id: int | None = None,
allow_multiple=False,
) -> subprocess.CompletedProcess[str]:
args = [
@@ -523,11 +519,11 @@ class NeonLocalCli(AbstractNeonCli):
def endpoint_start(
self,
endpoint_id: str,
safekeepers: Optional[list[int]] = None,
remote_ext_config: Optional[str] = None,
pageserver_id: Optional[int] = None,
safekeepers: list[int] | None = None,
remote_ext_config: str | None = None,
pageserver_id: int | None = None,
allow_multiple=False,
basebackup_request_tries: Optional[int] = None,
basebackup_request_tries: int | None = None,
) -> subprocess.CompletedProcess[str]:
args = [
"endpoint",
@@ -555,9 +551,9 @@ class NeonLocalCli(AbstractNeonCli):
def endpoint_reconfigure(
self,
endpoint_id: str,
tenant_id: Optional[TenantId] = None,
pageserver_id: Optional[int] = None,
safekeepers: Optional[list[int]] = None,
tenant_id: TenantId | None = None,
pageserver_id: int | None = None,
safekeepers: list[int] | None = None,
check_return_code=True,
) -> subprocess.CompletedProcess[str]:
args = ["endpoint", "reconfigure", endpoint_id]
@@ -574,7 +570,7 @@ class NeonLocalCli(AbstractNeonCli):
endpoint_id: str,
destroy=False,
check_return_code=True,
mode: Optional[str] = None,
mode: str | None = None,
) -> subprocess.CompletedProcess[str]:
args = [
"endpoint",

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,7 @@ from __future__ import annotations
import re
from dataclasses import dataclass
from typing import TYPE_CHECKING, Union
from typing import TYPE_CHECKING
from fixtures.common_types import KEY_MAX, KEY_MIN, Key, Lsn
@@ -46,7 +46,7 @@ class DeltaLayerName:
return ret
LayerName = Union[ImageLayerName, DeltaLayerName]
LayerName = ImageLayerName | DeltaLayerName
class InvalidFileName(Exception):

View File

@@ -1,24 +1,32 @@
from __future__ import annotations
import dataclasses
import json
import random
import string
import time
from collections import defaultdict
from dataclasses import dataclass
from datetime import datetime
from typing import TYPE_CHECKING, Any
from typing import Any
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from fixtures.common_types import Lsn, TenantId, TenantShardId, TimelineArchivalState, TimelineId
from fixtures.common_types import (
Id,
Lsn,
TenantId,
TenantShardId,
TimelineArchivalState,
TimelineId,
)
from fixtures.log_helper import log
from fixtures.metrics import Metrics, MetricsGetter, parse_metrics
from fixtures.pg_version import PgVersion
from fixtures.utils import Fn
if TYPE_CHECKING:
from typing import Optional, Union
class PageserverApiException(Exception):
def __init__(self, message, status_code: int):
@@ -27,6 +35,69 @@ class PageserverApiException(Exception):
self.status_code = status_code
@dataclass
class ImportPgdataIdemptencyKey:
key: str
@staticmethod
def random() -> ImportPgdataIdemptencyKey:
return ImportPgdataIdemptencyKey(
"".join(random.choices(string.ascii_letters + string.digits, k=20))
)
@dataclass
class LocalFs:
path: str
@dataclass
class AwsS3:
region: str
bucket: str
key: str
@dataclass
class ImportPgdataLocation:
LocalFs: None | LocalFs = None
AwsS3: None | AwsS3 = None
@dataclass
class TimelineCreateRequestModeImportPgdata:
location: ImportPgdataLocation
idempotency_key: ImportPgdataIdemptencyKey
@dataclass
class TimelineCreateRequestMode:
Branch: None | dict[str, Any] = None
Bootstrap: None | dict[str, Any] = None
ImportPgdata: None | TimelineCreateRequestModeImportPgdata = None
@dataclass
class TimelineCreateRequest:
new_timeline_id: TimelineId
mode: TimelineCreateRequestMode
def to_json(self) -> str:
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o) and not isinstance(o, type):
return dataclasses.asdict(o)
elif isinstance(o, Id):
return o.id.hex()
return super().default(o)
# mode is flattened
this = dataclasses.asdict(self)
mode = this.pop("mode")
this.update(mode)
return json.dumps(self, cls=EnhancedJSONEncoder)
class TimelineCreate406(PageserverApiException):
def __init__(self, res: requests.Response):
assert res.status_code == 406
@@ -43,7 +114,7 @@ class TimelineCreate409(PageserverApiException):
class InMemoryLayerInfo:
kind: str
lsn_start: str
lsn_end: Optional[str]
lsn_end: str | None
@classmethod
def from_json(cls, d: dict[str, Any]) -> InMemoryLayerInfo:
@@ -60,10 +131,10 @@ class HistoricLayerInfo:
layer_file_name: str
layer_file_size: int
lsn_start: str
lsn_end: Optional[str]
lsn_end: str | None
remote: bool
# None for image layers, true if pageserver thinks this is an L0 delta layer
l0: Optional[bool]
l0: bool | None
visible: bool
@classmethod
@@ -180,8 +251,8 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
self,
port: int,
is_testing_enabled_or_skip: Fn,
auth_token: Optional[str] = None,
retries: Optional[Retry] = None,
auth_token: str | None = None,
retries: Retry | None = None,
):
super().__init__()
self.port = port
@@ -278,7 +349,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def tenant_attach(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
generation: int,
config: None | dict[str, Any] = None,
):
@@ -305,7 +376,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
},
)
def tenant_reset(self, tenant_id: Union[TenantId, TenantShardId], drop_cache: bool):
def tenant_reset(self, tenant_id: TenantId | TenantShardId, drop_cache: bool):
params = {}
if drop_cache:
params["drop_cache"] = "true"
@@ -315,10 +386,10 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def tenant_location_conf(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
location_conf: dict[str, Any],
flush_ms=None,
lazy: Optional[bool] = None,
lazy: bool | None = None,
):
body = location_conf.copy()
@@ -346,20 +417,20 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert isinstance(res_json["tenant_shards"], list)
return res_json
def tenant_get_location(self, tenant_id: TenantShardId):
def tenant_get_location(self, tenant_id: TenantId | TenantShardId):
res = self.get(
f"http://localhost:{self.port}/v1/location_config/{tenant_id}",
)
self.verbose_error(res)
return res.json()
def tenant_delete(self, tenant_id: Union[TenantId, TenantShardId]):
def tenant_delete(self, tenant_id: TenantId | TenantShardId):
res = self.delete(f"http://localhost:{self.port}/v1/tenant/{tenant_id}")
self.verbose_error(res)
return res
def tenant_status(
self, tenant_id: Union[TenantId, TenantShardId], activate: bool = False
self, tenant_id: TenantId | TenantShardId, activate: bool = False
) -> dict[Any, Any]:
"""
:activate: hint the server not to accelerate activation of this tenant in response
@@ -378,17 +449,17 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert isinstance(res_json, dict)
return res_json
def tenant_config(self, tenant_id: Union[TenantId, TenantShardId]) -> TenantConfig:
def tenant_config(self, tenant_id: TenantId | TenantShardId) -> TenantConfig:
res = self.get(f"http://localhost:{self.port}/v1/tenant/{tenant_id}/config")
self.verbose_error(res)
return TenantConfig.from_json(res.json())
def tenant_heatmap_upload(self, tenant_id: Union[TenantId, TenantShardId]):
def tenant_heatmap_upload(self, tenant_id: TenantId | TenantShardId):
res = self.post(f"http://localhost:{self.port}/v1/tenant/{tenant_id}/heatmap_upload")
self.verbose_error(res)
def tenant_secondary_download(
self, tenant_id: Union[TenantId, TenantShardId], wait_ms: Optional[int] = None
self, tenant_id: TenantId | TenantShardId, wait_ms: int | None = None
) -> tuple[int, dict[Any, Any]]:
url = f"http://localhost:{self.port}/v1/tenant/{tenant_id}/secondary/download"
if wait_ms is not None:
@@ -397,13 +468,13 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
self.verbose_error(res)
return (res.status_code, res.json())
def tenant_secondary_status(self, tenant_id: Union[TenantId, TenantShardId]):
def tenant_secondary_status(self, tenant_id: TenantId | TenantShardId):
url = f"http://localhost:{self.port}/v1/tenant/{tenant_id}/secondary/status"
res = self.get(url)
self.verbose_error(res)
return res.json()
def set_tenant_config(self, tenant_id: Union[TenantId, TenantShardId], config: dict[str, Any]):
def set_tenant_config(self, tenant_id: TenantId | TenantShardId, config: dict[str, Any]):
"""
Only use this via storage_controller.pageserver_api().
@@ -420,8 +491,8 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def patch_tenant_config_client_side(
self,
tenant_id: TenantId,
inserts: Optional[dict[str, Any]] = None,
removes: Optional[list[str]] = None,
inserts: dict[str, Any] | None = None,
removes: list[str] | None = None,
):
"""
Only use this via storage_controller.pageserver_api().
@@ -436,11 +507,11 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
del current[key]
self.set_tenant_config(tenant_id, current)
def tenant_size(self, tenant_id: Union[TenantId, TenantShardId]) -> int:
def tenant_size(self, tenant_id: TenantId | TenantShardId) -> int:
return self.tenant_size_and_modelinputs(tenant_id)[0]
def tenant_size_and_modelinputs(
self, tenant_id: Union[TenantId, TenantShardId]
self, tenant_id: TenantId | TenantShardId
) -> tuple[int, dict[str, Any]]:
"""
Returns the tenant size, together with the model inputs as the second tuple item.
@@ -456,7 +527,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert isinstance(inputs, dict)
return (size, inputs)
def tenant_size_debug(self, tenant_id: Union[TenantId, TenantShardId]) -> str:
def tenant_size_debug(self, tenant_id: TenantId | TenantShardId) -> str:
"""
Returns the tenant size debug info, as an HTML string
"""
@@ -468,10 +539,10 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def tenant_time_travel_remote_storage(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timestamp: datetime,
done_if_after: datetime,
shard_counts: Optional[list[int]] = None,
shard_counts: list[int] | None = None,
):
"""
Issues a request to perform time travel operations on the remote storage
@@ -490,7 +561,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_list(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
include_non_incremental_logical_size: bool = False,
include_timeline_dir_layer_file_size_sum: bool = False,
) -> list[dict[str, Any]]:
@@ -510,7 +581,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_and_offloaded_list(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
) -> TimelinesInfoAndOffloaded:
res = self.get(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline_and_offloaded",
@@ -523,11 +594,11 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_create(
self,
pg_version: PgVersion,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
new_timeline_id: TimelineId,
ancestor_timeline_id: Optional[TimelineId] = None,
ancestor_start_lsn: Optional[Lsn] = None,
existing_initdb_timeline_id: Optional[TimelineId] = None,
ancestor_timeline_id: TimelineId | None = None,
ancestor_start_lsn: Lsn | None = None,
existing_initdb_timeline_id: TimelineId | None = None,
**kwargs,
) -> dict[Any, Any]:
body: dict[str, Any] = {
@@ -558,7 +629,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_detail(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
include_non_incremental_logical_size: bool = False,
include_timeline_dir_layer_file_size_sum: bool = False,
@@ -584,7 +655,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return res_json
def timeline_delete(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, **kwargs
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, **kwargs
):
"""
Note that deletion is not instant, it is scheduled and performed mostly in the background.
@@ -600,9 +671,9 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_gc(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
gc_horizon: Optional[int],
gc_horizon: int | None,
) -> dict[str, Any]:
"""
Unlike most handlers, this will wait for the layers to be actually
@@ -624,16 +695,14 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert isinstance(res_json, dict)
return res_json
def timeline_block_gc(self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId):
def timeline_block_gc(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId):
res = self.post(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/block_gc",
)
log.info(f"Got GC request response code: {res.status_code}")
self.verbose_error(res)
def timeline_unblock_gc(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
):
def timeline_unblock_gc(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId):
res = self.post(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/unblock_gc",
)
@@ -642,7 +711,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_offload(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
):
self.is_testing_enabled_or_skip()
@@ -658,14 +727,14 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_compact(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
force_repartition=False,
force_image_layer_creation=False,
force_l0_compaction=False,
wait_until_uploaded=False,
enhanced_gc_bottom_most_compaction=False,
body: Optional[dict[str, Any]] = None,
body: dict[str, Any] | None = None,
):
self.is_testing_enabled_or_skip()
query = {}
@@ -692,7 +761,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert res_json is None
def timeline_preserve_initdb_archive(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId
):
log.info(
f"Requesting initdb archive preservation for tenant {tenant_id} and timeline {timeline_id}"
@@ -704,7 +773,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_archival_config(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
state: TimelineArchivalState,
):
@@ -720,7 +789,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_get_lsn_by_timestamp(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
timestamp: datetime,
with_lease: bool = False,
@@ -739,7 +808,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return res_json
def timeline_lsn_lease(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, lsn: Lsn
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, lsn: Lsn
):
data = {
"lsn": str(lsn),
@@ -755,7 +824,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return res_json
def timeline_get_timestamp_of_lsn(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, lsn: Lsn
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, lsn: Lsn
):
log.info(f"Requesting time range of lsn {lsn}, tenant {tenant_id}, timeline {timeline_id}")
res = self.get(
@@ -765,9 +834,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
res_json = res.json()
return res_json
def timeline_layer_map_info(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
):
def timeline_layer_map_info(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId):
log.info(f"Requesting layer map info of tenant {tenant_id}, timeline {timeline_id}")
res = self.get(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/layer",
@@ -778,13 +845,13 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_checkpoint(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
force_repartition=False,
force_image_layer_creation=False,
force_l0_compaction=False,
wait_until_uploaded=False,
compact: Optional[bool] = None,
compact: bool | None = None,
**kwargs,
):
self.is_testing_enabled_or_skip()
@@ -801,7 +868,9 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
if compact is not None:
query["compact"] = "true" if compact else "false"
log.info(f"Requesting checkpoint: tenant {tenant_id}, timeline {timeline_id}")
log.info(
f"Requesting checkpoint: tenant {tenant_id}, timeline {timeline_id}, wait_until_uploaded={wait_until_uploaded}"
)
res = self.put(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/checkpoint",
params=query,
@@ -814,7 +883,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_spawn_download_remote_layers(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
max_concurrent_downloads: int,
) -> dict[str, Any]:
@@ -833,7 +902,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_poll_download_remote_layers_status(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
spawn_response: dict[str, Any],
poll_state=None,
@@ -855,7 +924,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def timeline_download_remote_layers(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
max_concurrent_downloads: int,
errors_ok=False,
@@ -905,7 +974,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
timeline_id: TimelineId,
file_kind: str,
op_kind: str,
) -> Optional[int]:
) -> int | None:
metrics = [
"pageserver_remote_timeline_client_calls_started_total",
"pageserver_remote_timeline_client_calls_finished_total",
@@ -929,7 +998,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def layer_map_info(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
) -> LayerMapInfo:
res = self.get(
@@ -939,7 +1008,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return LayerMapInfo.from_json(res.json())
def timeline_layer_scan_disposable_keys(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, layer_name: str
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, layer_name: str
) -> ScanDisposableKeysResponse:
res = self.post(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/layer/{layer_name}/scan_disposable_keys",
@@ -949,7 +1018,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return ScanDisposableKeysResponse.from_json(res.json())
def download_layer(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, layer_name: str
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, layer_name: str
):
res = self.get(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/layer/{layer_name}",
@@ -958,9 +1027,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert res.status_code == 200
def download_all_layers(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
):
def download_all_layers(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId):
info = self.layer_map_info(tenant_id, timeline_id)
for layer in info.historic_layers:
if not layer.remote:
@@ -969,9 +1036,9 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def detach_ancestor(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
batch_size: Optional[int] = None,
batch_size: int | None = None,
**kwargs,
) -> set[TimelineId]:
params = {}
@@ -987,7 +1054,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
return set(map(TimelineId, json["reparented_timelines"]))
def evict_layer(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId, layer_name: str
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId, layer_name: str
):
res = self.delete(
f"http://localhost:{self.port}/v1/tenant/{tenant_id}/timeline/{timeline_id}/layer/{layer_name}",
@@ -996,7 +1063,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
assert res.status_code in (200, 304)
def evict_all_layers(self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId):
def evict_all_layers(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId):
info = self.layer_map_info(tenant_id, timeline_id)
for layer in info.historic_layers:
self.evict_layer(tenant_id, timeline_id, layer.layer_file_name)
@@ -1009,7 +1076,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
self.verbose_error(res)
return res.json()
def tenant_break(self, tenant_id: Union[TenantId, TenantShardId]):
def tenant_break(self, tenant_id: TenantId | TenantShardId):
res = self.put(f"http://localhost:{self.port}/v1/tenant/{tenant_id}/break")
self.verbose_error(res)
@@ -1058,7 +1125,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def perf_info(
self,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
):
self.is_testing_enabled_or_skip()

View File

@@ -13,7 +13,8 @@ from fixtures.neon_fixtures import (
from fixtures.remote_storage import LocalFsStorage, RemoteStorageKind
if TYPE_CHECKING:
from typing import Any, Callable
from collections.abc import Callable
from typing import Any
def single_timeline(

View File

@@ -17,14 +17,14 @@ from fixtures.remote_storage import RemoteStorage, RemoteStorageKind, S3Storage
from fixtures.utils import wait_until
if TYPE_CHECKING:
from typing import Any, Optional, Union
from typing import Any
def assert_tenant_state(
pageserver_http: PageserverHttpClient,
tenant: TenantId,
expected_state: str,
message: Optional[str] = None,
message: str | None = None,
) -> None:
tenant_status = pageserver_http.tenant_status(tenant)
log.info(f"tenant_status: {tenant_status}")
@@ -33,7 +33,7 @@ def assert_tenant_state(
def remote_consistent_lsn(
pageserver_http: PageserverHttpClient,
tenant: Union[TenantId, TenantShardId],
tenant: TenantId | TenantShardId,
timeline: TimelineId,
) -> Lsn:
detail = pageserver_http.timeline_detail(tenant, timeline)
@@ -51,7 +51,7 @@ def remote_consistent_lsn(
def wait_for_upload(
pageserver_http: PageserverHttpClient,
tenant: Union[TenantId, TenantShardId],
tenant: TenantId | TenantShardId,
timeline: TimelineId,
lsn: Lsn,
):
@@ -138,7 +138,7 @@ def wait_until_all_tenants_state(
def wait_until_timeline_state(
pageserver_http: PageserverHttpClient,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
expected_state: str,
iterations: int,
@@ -188,7 +188,7 @@ def wait_until_tenant_active(
def last_record_lsn(
pageserver_http_client: PageserverHttpClient,
tenant: Union[TenantId, TenantShardId],
tenant: TenantId | TenantShardId,
timeline: TimelineId,
) -> Lsn:
detail = pageserver_http_client.timeline_detail(tenant, timeline)
@@ -200,7 +200,7 @@ def last_record_lsn(
def wait_for_last_record_lsn(
pageserver_http: PageserverHttpClient,
tenant: Union[TenantId, TenantShardId],
tenant: TenantId | TenantShardId,
timeline: TimelineId,
lsn: Lsn,
) -> Lsn:
@@ -267,10 +267,10 @@ def wait_for_upload_queue_empty(
def wait_timeline_detail_404(
pageserver_http: PageserverHttpClient,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
iterations: int,
interval: Optional[float] = None,
interval: float | None = None,
):
if interval is None:
interval = 0.25
@@ -292,10 +292,10 @@ def wait_timeline_detail_404(
def timeline_delete_wait_completed(
pageserver_http: PageserverHttpClient,
tenant_id: Union[TenantId, TenantShardId],
tenant_id: TenantId | TenantShardId,
timeline_id: TimelineId,
iterations: int = 20,
interval: Optional[float] = None,
interval: float | None = None,
**delete_args,
) -> None:
pageserver_http.timeline_delete(tenant_id=tenant_id, timeline_id=timeline_id, **delete_args)
@@ -304,9 +304,9 @@ def timeline_delete_wait_completed(
# remote_storage must not be None, but that's easier for callers to make mypy happy
def assert_prefix_empty(
remote_storage: Optional[RemoteStorage],
prefix: Optional[str] = None,
allowed_postfix: Optional[str] = None,
remote_storage: RemoteStorage | None,
prefix: str | None = None,
allowed_postfix: str | None = None,
delimiter: str = "/",
) -> None:
assert remote_storage is not None
@@ -348,8 +348,8 @@ def assert_prefix_empty(
# remote_storage must not be None, but that's easier for callers to make mypy happy
def assert_prefix_not_empty(
remote_storage: Optional[RemoteStorage],
prefix: Optional[str] = None,
remote_storage: RemoteStorage | None,
prefix: str | None = None,
delimiter: str = "/",
):
assert remote_storage is not None
@@ -358,7 +358,7 @@ def assert_prefix_not_empty(
def list_prefix(
remote: RemoteStorage, prefix: Optional[str] = None, delimiter: str = "/"
remote: RemoteStorage, prefix: str | None = None, delimiter: str = "/"
) -> ListObjectsV2OutputTypeDef:
"""
Note that this function takes into account prefix_in_bucket.

View File

@@ -11,7 +11,7 @@ from _pytest.python import Metafunc
from fixtures.pg_version import PgVersion
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
"""
@@ -20,31 +20,31 @@ Dynamically parametrize tests by different parameters
@pytest.fixture(scope="function", autouse=True)
def pg_version() -> Optional[PgVersion]:
def pg_version() -> PgVersion | None:
return None
@pytest.fixture(scope="function", autouse=True)
def build_type() -> Optional[str]:
def build_type() -> str | None:
return None
@pytest.fixture(scope="session", autouse=True)
def platform() -> Optional[str]:
def platform() -> str | None:
return None
@pytest.fixture(scope="function", autouse=True)
def pageserver_virtual_file_io_engine() -> Optional[str]:
def pageserver_virtual_file_io_engine() -> str | None:
return os.getenv("PAGESERVER_VIRTUAL_FILE_IO_ENGINE")
@pytest.fixture(scope="function", autouse=True)
def pageserver_virtual_file_io_mode() -> Optional[str]:
def pageserver_virtual_file_io_mode() -> str | None:
return os.getenv("PAGESERVER_VIRTUAL_FILE_IO_MODE")
def get_pageserver_default_tenant_config_compaction_algorithm() -> Optional[dict[str, Any]]:
def get_pageserver_default_tenant_config_compaction_algorithm() -> dict[str, Any] | None:
toml_table = os.getenv("PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM")
if toml_table is None:
return None
@@ -54,7 +54,7 @@ def get_pageserver_default_tenant_config_compaction_algorithm() -> Optional[dict
@pytest.fixture(scope="function", autouse=True)
def pageserver_default_tenant_config_compaction_algorithm() -> Optional[dict[str, Any]]:
def pageserver_default_tenant_config_compaction_algorithm() -> dict[str, Any] | None:
return get_pageserver_default_tenant_config_compaction_algorithm()
@@ -66,6 +66,7 @@ def pytest_generate_tests(metafunc: Metafunc):
metafunc.parametrize("build_type", build_types)
pg_versions: list[PgVersion]
if (v := os.getenv("DEFAULT_PG_VERSION")) is None:
pg_versions = [version for version in PgVersion if version != PgVersion.NOT_SET]
else:

View File

@@ -18,7 +18,6 @@ from fixtures.utils import allure_attach_from_dir
if TYPE_CHECKING:
from collections.abc import Iterator
from typing import Optional
BASE_DIR = Path(__file__).parents[2]
@@ -26,9 +25,7 @@ COMPUTE_CONFIG_DIR = BASE_DIR / "compute" / "etc"
DEFAULT_OUTPUT_DIR: str = "test_output"
def get_test_dir(
request: FixtureRequest, top_output_dir: Path, prefix: Optional[str] = None
) -> Path:
def get_test_dir(request: FixtureRequest, top_output_dir: Path, prefix: str | None = None) -> Path:
"""Compute the path to a working directory for an individual test."""
test_name = request.node.name
test_dir = top_output_dir / f"{prefix or ''}{test_name.replace('/', '-')}"
@@ -112,7 +109,7 @@ def compatibility_snapshot_dir() -> Iterator[Path]:
@pytest.fixture(scope="session")
def compatibility_neon_binpath() -> Iterator[Optional[Path]]:
def compatibility_neon_binpath() -> Iterator[Path | None]:
if os.getenv("REMOTE_ENV"):
return
comp_binpath = None
@@ -133,7 +130,7 @@ def pg_distrib_dir(base_dir: Path) -> Iterator[Path]:
@pytest.fixture(scope="session")
def compatibility_pg_distrib_dir() -> Iterator[Optional[Path]]:
def compatibility_pg_distrib_dir() -> Iterator[Path | None]:
compat_distrib_dir = None
if env_compat_postgres_bin := os.environ.get("COMPATIBILITY_POSTGRES_DISTRIB_DIR"):
compat_distrib_dir = Path(env_compat_postgres_bin).resolve()
@@ -197,7 +194,7 @@ class FileAndThreadLock:
def __init__(self, path: Path):
self.path = path
self.thread_lock = threading.Lock()
self.fd: Optional[int] = None
self.fd: int | None = None
def __enter__(self):
self.fd = os.open(self.path, os.O_CREAT | os.O_WRONLY)
@@ -208,9 +205,9 @@ class FileAndThreadLock:
def __exit__(
self,
exc_type: Optional[type[BaseException]],
exc_value: Optional[BaseException],
exc_traceback: Optional[TracebackType],
exc_type: type[BaseException] | None,
exc_value: BaseException | None,
exc_traceback: TracebackType | None,
):
assert self.fd is not None
assert self.thread_lock.locked() # ... by us
@@ -263,9 +260,9 @@ class SnapshotDir:
def __exit__(
self,
exc_type: Optional[type[BaseException]],
exc_value: Optional[BaseException],
exc_traceback: Optional[TracebackType],
exc_type: type[BaseException] | None,
exc_value: BaseException | None,
exc_traceback: TracebackType | None,
):
self._lock.__exit__(exc_type, exc_value, exc_traceback)
@@ -277,7 +274,7 @@ def shared_snapshot_dir(top_output_dir: Path, ident: str) -> SnapshotDir:
@pytest.fixture(scope="function")
def test_overlay_dir(request: FixtureRequest, top_output_dir: Path) -> Optional[Path]:
def test_overlay_dir(request: FixtureRequest, top_output_dir: Path) -> Path | None:
"""
Idempotently create a test's overlayfs mount state directory.
If the functionality isn't enabled via env var, returns None.

View File

@@ -1,22 +1,16 @@
from __future__ import annotations
import enum
from typing import TYPE_CHECKING
from enum import StrEnum
from typing_extensions import override
if TYPE_CHECKING:
from typing import Optional
"""
This fixture is used to determine which version of Postgres to use for tests.
"""
# Inherit PgVersion from str rather than int to make it easier to pass as a command-line argument
# TODO: use enum.StrEnum for Python >= 3.11
class PgVersion(str, enum.Enum):
class PgVersion(StrEnum):
V14 = "14"
V15 = "15"
V16 = "16"
@@ -34,7 +28,6 @@ class PgVersion(str, enum.Enum):
def __repr__(self) -> str:
return f"'{self.value}'"
# Make this explicit for Python 3.11 compatibility, which changes the behavior of enums
@override
def __str__(self) -> str:
return self.value
@@ -47,16 +40,18 @@ class PgVersion(str, enum.Enum):
@classmethod
@override
def _missing_(cls, value: object) -> Optional[PgVersion]:
known_values = {v.value for _, v in cls.__members__.items()}
def _missing_(cls, value: object) -> PgVersion | None:
if not isinstance(value, str):
return None
# Allow passing version as a string with "v" prefix (e.g. "v14")
if isinstance(value, str) and value.lower().startswith("v") and value[1:] in known_values:
return cls(value[1:])
# Allow passing version as an int (e.g. 15 or 150002, both will be converted to PgVersion.V15)
elif isinstance(value, int) and str(value)[:2] in known_values:
return cls(str(value)[:2])
known_values = set(cls.__members__.values())
# Allow passing version as v-prefixed string (e.g. "v14")
if value.lower().startswith("v") and (v := value[1:]) in known_values:
return cls(v)
# Allow passing version as an int (i.e. both "15" and "150002" matches PgVersion.V15)
if value.isdigit() and (v := value[:2]) in known_values:
return cls(v)
# Make mypy happy
# See https://github.com/python/mypy/issues/3974
return None

View File

@@ -3,13 +3,9 @@ from __future__ import annotations
import re
import socket
from contextlib import closing
from typing import TYPE_CHECKING
from fixtures.log_helper import log
if TYPE_CHECKING:
from typing import Union
def can_bind(host: str, port: int) -> bool:
"""
@@ -49,17 +45,19 @@ class PortDistributor:
"port range configured for test is exhausted, consider enlarging the range"
)
def replace_with_new_port(self, value: Union[int, str]) -> Union[int, str]:
def replace_with_new_port(self, value: int | str) -> int | str:
"""
Returns a new port for a port number in a string (like "localhost:1234") or int.
Replacements are memorised, so a substitution for the same port is always the same.
"""
# TODO: replace with structural pattern matching for Python >= 3.10
if isinstance(value, int):
return self._replace_port_int(value)
return self._replace_port_str(value)
match value:
case int():
return self._replace_port_int(value)
case str():
return self._replace_port_str(value)
case _:
raise TypeError(f"Unsupported type {type(value)}, should be int | str")
def _replace_port_int(self, value: int) -> int:
known_port = self.port_map.get(value)

View File

@@ -6,8 +6,9 @@ import json
import os
import re
from dataclasses import dataclass
from enum import StrEnum
from pathlib import Path
from typing import TYPE_CHECKING, Union
from typing import TYPE_CHECKING
import boto3
import toml
@@ -20,7 +21,7 @@ from fixtures.log_helper import log
from fixtures.pageserver.common_types import IndexPartDump
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
TIMELINE_INDEX_PART_FILE_NAME = "index_part.json"
@@ -28,7 +29,7 @@ TENANT_HEATMAP_FILE_NAME = "heatmap-v1.json"
@enum.unique
class RemoteStorageUser(str, enum.Enum):
class RemoteStorageUser(StrEnum):
"""
Instead of using strings for the users, use a more strict enum.
"""
@@ -77,21 +78,19 @@ class MockS3Server:
class LocalFsStorage:
root: Path
def tenant_path(self, tenant_id: Union[TenantId, TenantShardId]) -> Path:
def tenant_path(self, tenant_id: TenantId | TenantShardId) -> Path:
return self.root / "tenants" / str(tenant_id)
def timeline_path(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
) -> Path:
def timeline_path(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId) -> Path:
return self.tenant_path(tenant_id) / "timelines" / str(timeline_id)
def timeline_latest_generation(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
) -> Optional[int]:
self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId
) -> int | None:
timeline_files = os.listdir(self.timeline_path(tenant_id, timeline_id))
index_parts = [f for f in timeline_files if f.startswith("index_part")]
def parse_gen(filename: str) -> Optional[int]:
def parse_gen(filename: str) -> int | None:
log.info(f"parsing index_part '{filename}'")
parts = filename.split("-")
if len(parts) == 2:
@@ -104,9 +103,7 @@ class LocalFsStorage:
raise RuntimeError(f"No index_part found for {tenant_id}/{timeline_id}")
return generations[-1]
def index_path(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
) -> Path:
def index_path(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId) -> Path:
latest_gen = self.timeline_latest_generation(tenant_id, timeline_id)
if latest_gen is None:
filename = TIMELINE_INDEX_PART_FILE_NAME
@@ -120,7 +117,7 @@ class LocalFsStorage:
tenant_id: TenantId,
timeline_id: TimelineId,
local_name: str,
generation: Optional[int] = None,
generation: int | None = None,
):
if generation is None:
generation = self.timeline_latest_generation(tenant_id, timeline_id)
@@ -130,9 +127,7 @@ class LocalFsStorage:
filename = f"{local_name}-{generation:08x}"
return self.timeline_path(tenant_id, timeline_id) / filename
def index_content(
self, tenant_id: Union[TenantId, TenantShardId], timeline_id: TimelineId
) -> Any:
def index_content(self, tenant_id: TenantId | TenantShardId, timeline_id: TimelineId) -> Any:
with self.index_path(tenant_id, timeline_id).open("r") as f:
return json.load(f)
@@ -164,17 +159,17 @@ class LocalFsStorage:
class S3Storage:
bucket_name: str
bucket_region: str
access_key: Optional[str]
secret_key: Optional[str]
aws_profile: Optional[str]
access_key: str | None
secret_key: str | None
aws_profile: str | None
prefix_in_bucket: str
client: S3Client
cleanup: bool
"""Is this MOCK_S3 (false) or REAL_S3 (true)"""
real: bool
endpoint: Optional[str] = None
endpoint: str | None = None
"""formatting deserialized with humantime crate, for example "1s"."""
custom_timeout: Optional[str] = None
custom_timeout: str | None = None
def access_env_vars(self) -> dict[str, str]:
if self.aws_profile is not None:
@@ -272,12 +267,10 @@ class S3Storage:
def tenants_path(self) -> str:
return f"{self.prefix_in_bucket}/tenants"
def tenant_path(self, tenant_id: Union[TenantShardId, TenantId]) -> str:
def tenant_path(self, tenant_id: TenantShardId | TenantId) -> str:
return f"{self.tenants_path()}/{tenant_id}"
def timeline_path(
self, tenant_id: Union[TenantShardId, TenantId], timeline_id: TimelineId
) -> str:
def timeline_path(self, tenant_id: TenantShardId | TenantId, timeline_id: TimelineId) -> str:
return f"{self.tenant_path(tenant_id)}/timelines/{timeline_id}"
def get_latest_index_key(self, index_keys: list[str]) -> str:
@@ -315,11 +308,11 @@ class S3Storage:
assert self.real is False
RemoteStorage = Union[LocalFsStorage, S3Storage]
RemoteStorage = LocalFsStorage | S3Storage
@enum.unique
class RemoteStorageKind(str, enum.Enum):
class RemoteStorageKind(StrEnum):
LOCAL_FS = "local_fs"
MOCK_S3 = "mock_s3"
REAL_S3 = "real_s3"
@@ -331,8 +324,8 @@ class RemoteStorageKind(str, enum.Enum):
run_id: str,
test_name: str,
user: RemoteStorageUser,
bucket_name: Optional[str] = None,
bucket_region: Optional[str] = None,
bucket_name: str | None = None,
bucket_region: str | None = None,
) -> RemoteStorage:
if self == RemoteStorageKind.LOCAL_FS:
return LocalFsStorage(LocalFsStorage.component_path(repo_dir, user))

View File

@@ -13,7 +13,7 @@ from fixtures.metrics import Metrics, MetricsGetter, parse_metrics
from fixtures.utils import wait_until
if TYPE_CHECKING:
from typing import Any, Optional, Union
from typing import Any
# Walreceiver as returned by sk's timeline status endpoint.
@@ -72,7 +72,7 @@ class TermBumpResponse:
class SafekeeperHttpClient(requests.Session, MetricsGetter):
HTTPError = requests.HTTPError
def __init__(self, port: int, auth_token: Optional[str] = None, is_testing_enabled=False):
def __init__(self, port: int, auth_token: str | None = None, is_testing_enabled=False):
super().__init__()
self.port = port
self.auth_token = auth_token
@@ -98,7 +98,7 @@ class SafekeeperHttpClient(requests.Session, MetricsGetter):
if not self.is_testing_enabled:
pytest.skip("safekeeper was built without 'testing' feature")
def configure_failpoints(self, config_strings: Union[tuple[str, str], list[tuple[str, str]]]):
def configure_failpoints(self, config_strings: tuple[str, str] | list[tuple[str, str]]):
self.is_testing_enabled_or_skip()
if isinstance(config_strings, tuple):
@@ -195,7 +195,7 @@ class SafekeeperHttpClient(requests.Session, MetricsGetter):
assert isinstance(res_json, dict)
return res_json
def debug_dump(self, params: Optional[dict[str, str]] = None) -> dict[str, Any]:
def debug_dump(self, params: dict[str, str] | None = None) -> dict[str, Any]:
params = params or {}
res = self.get(f"http://localhost:{self.port}/v1/debug_dump", params=params)
res.raise_for_status()
@@ -204,7 +204,7 @@ class SafekeeperHttpClient(requests.Session, MetricsGetter):
return res_json
def debug_dump_timeline(
self, timeline_id: TimelineId, params: Optional[dict[str, str]] = None
self, timeline_id: TimelineId, params: dict[str, str] | None = None
) -> Any:
params = params or {}
params["timeline_id"] = str(timeline_id)
@@ -285,7 +285,7 @@ class SafekeeperHttpClient(requests.Session, MetricsGetter):
self,
tenant_id: TenantId,
timeline_id: TimelineId,
term: Optional[int],
term: int | None,
) -> TermBumpResponse:
body = {}
if term is not None:

View File

@@ -13,14 +13,14 @@ from werkzeug.wrappers.response import Response
from fixtures.log_helper import log
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
class StorageControllerProxy:
def __init__(self, server: HTTPServer):
self.server: HTTPServer = server
self.listen: str = f"http://{server.host}:{server.port}"
self.routing_to: Optional[str] = None
self.routing_to: str | None = None
def route_to(self, storage_controller_api: str):
self.routing_to = storage_controller_api

View File

@@ -8,10 +8,10 @@ import subprocess
import tarfile
import threading
import time
from collections.abc import Iterable
from collections.abc import Callable, Iterable
from hashlib import sha256
from pathlib import Path
from typing import TYPE_CHECKING, Any, Callable, TypeVar
from typing import TYPE_CHECKING, Any, TypeVar
from urllib.parse import urlencode
import allure
@@ -29,7 +29,7 @@ from fixtures.pg_version import PgVersion
if TYPE_CHECKING:
from collections.abc import Iterable
from typing import IO, Optional
from typing import IO
from fixtures.common_types import TimelineId
from fixtures.neon_fixtures import PgBin
@@ -66,10 +66,10 @@ def subprocess_capture(
echo_stderr: bool = False,
echo_stdout: bool = False,
capture_stdout: bool = False,
timeout: Optional[float] = None,
timeout: float | None = None,
with_command_header: bool = True,
**popen_kwargs: Any,
) -> tuple[str, Optional[str], int]:
) -> tuple[str, str | None, int]:
"""Run a process and bifurcate its output to files and the `log` logger
stderr and stdout are always captured in files. They are also optionally
@@ -536,7 +536,7 @@ def assert_pageserver_backups_equal(left: Path, right: Path, skip_files: set[str
"""
started_at = time.time()
def hash_extracted(reader: Optional[IO[bytes]]) -> bytes:
def hash_extracted(reader: IO[bytes] | None) -> bytes:
assert reader is not None
digest = sha256(usedforsecurity=False)
while True:
@@ -563,7 +563,7 @@ def assert_pageserver_backups_equal(left: Path, right: Path, skip_files: set[str
mismatching: set[str] = set()
for left_tuple, right_tuple in zip(left_list, right_list):
for left_tuple, right_tuple in zip(left_list, right_list, strict=False):
left_path, left_hash = left_tuple
right_path, right_hash = right_tuple
assert (
@@ -595,7 +595,7 @@ class PropagatingThread(threading.Thread):
self.exc = e
@override
def join(self, timeout: Optional[float] = None) -> Any:
def join(self, timeout: float | None = None) -> Any:
super().join(timeout)
if self.exc:
raise self.exc
@@ -674,6 +674,13 @@ def run_only_on_default_postgres(reason: str):
)
def run_only_on_postgres(versions: Iterable[PgVersion], reason: str):
return pytest.mark.skipif(
PgVersion(os.getenv("DEFAULT_PG_VERSION", PgVersion.DEFAULT)) not in versions,
reason=reason,
)
def skip_in_debug_build(reason: str):
return pytest.mark.skipif(
os.getenv("BUILD_TYPE", "debug") == "debug",

View File

@@ -15,7 +15,7 @@ from fixtures.neon_fixtures import (
from fixtures.pageserver.utils import wait_for_last_record_lsn
if TYPE_CHECKING:
from typing import Any, Optional
from typing import Any
# neon_local doesn't handle creating/modifying endpoints concurrently, so we use a mutex
# to ensure we don't do that: this enables running lots of Workloads in parallel safely.
@@ -36,8 +36,8 @@ class Workload:
env: NeonEnv,
tenant_id: TenantId,
timeline_id: TimelineId,
branch_name: Optional[str] = None,
endpoint_opts: Optional[dict[str, Any]] = None,
branch_name: str | None = None,
endpoint_opts: dict[str, Any] | None = None,
):
self.env = env
self.tenant_id = tenant_id
@@ -50,10 +50,10 @@ class Workload:
self.expect_rows = 0
self.churn_cursor = 0
self._endpoint: Optional[Endpoint] = None
self._endpoint: Endpoint | None = None
self._endpoint_opts = endpoint_opts or {}
def reconfigure(self):
def reconfigure(self) -> None:
"""
Request the endpoint to reconfigure based on location reported by storage controller
"""
@@ -61,7 +61,7 @@ class Workload:
with ENDPOINT_LOCK:
self._endpoint.reconfigure()
def endpoint(self, pageserver_id: Optional[int] = None) -> Endpoint:
def endpoint(self, pageserver_id: int | None = None) -> Endpoint:
# We may be running alongside other Workloads for different tenants. Full TTID is
# obnoxiously long for use here, but a cut-down version is still unique enough for tests.
endpoint_id = f"ep-workload-{str(self.tenant_id)[0:4]}-{str(self.timeline_id)[0:4]}"
@@ -94,16 +94,17 @@ class Workload:
def __del__(self):
self.stop()
def init(self, pageserver_id: Optional[int] = None):
def init(self, pageserver_id: int | None = None, allow_recreate=False):
endpoint = self.endpoint(pageserver_id)
if allow_recreate:
endpoint.safe_psql(f"DROP TABLE IF EXISTS {self.table};")
endpoint.safe_psql(f"CREATE TABLE {self.table} (id INTEGER PRIMARY KEY, val text);")
endpoint.safe_psql("CREATE EXTENSION IF NOT EXISTS neon_test_utils;")
last_flush_lsn_upload(
self.env, endpoint, self.tenant_id, self.timeline_id, pageserver_id=pageserver_id
)
def write_rows(self, n: int, pageserver_id: Optional[int] = None, upload: bool = True):
def write_rows(self, n: int, pageserver_id: int | None = None, upload: bool = True):
endpoint = self.endpoint(pageserver_id)
start = self.expect_rows
end = start + n - 1
@@ -125,7 +126,7 @@ class Workload:
return False
def churn_rows(
self, n: int, pageserver_id: Optional[int] = None, upload: bool = True, ingest: bool = True
self, n: int, pageserver_id: int | None = None, upload: bool = True, ingest: bool = True
):
assert self.expect_rows >= n
@@ -190,7 +191,7 @@ class Workload:
else:
log.info(f"Churn: not waiting for upload, disk LSN {last_flush_lsn}")
def validate(self, pageserver_id: Optional[int] = None):
def validate(self, pageserver_id: int | None = None):
endpoint = self.endpoint(pageserver_id)
endpoint.clear_shared_buffers()
result = endpoint.safe_psql(f"SELECT COUNT(*) FROM {self.table}")

Some files were not shown because too many files have changed in this diff Show More