Compare commits

...

78 Commits

Author SHA1 Message Date
Anastasia Lubennikova
4139270f9c Fix rebase conflicts 2022-09-22 12:24:09 +03:00
Anastasia Lubennikova
4cff9cce67 bump vendor/postgres-v14 2022-09-22 10:02:43 +03:00
Anastasia Lubennikova
3e268b1dc1 Use DEFAULT_PG_VERSION env in CI pytest 2022-09-22 09:55:14 +03:00
Anastasia Lubennikova
fc5738d21f Make test_timeline_size_metrics more stable:
Compare size with Vanilla postgres size instead of hardcoded value
2022-09-22 09:55:14 +03:00
Anastasia Lubennikova
ff7556b796 Clean up the pg_version choice code 2022-09-22 09:55:14 +03:00
Anastasia Lubennikova
0034d7dff0 Update libs/postgres_ffi/wal_craft/src/bin/wal_craft.rs
Co-authored-by: MMeent <matthias@neon.tech>
2022-09-22 09:55:14 +03:00
Anastasia Lubennikova
9491388956 use version specific find_end_of_wal function 2022-09-22 09:55:00 +03:00
Anastasia Lubennikova
c3d90e1b36 show pg_version in create_timeline info span 2022-09-22 09:53:26 +03:00
Anastasia Lubennikova
faeda27150 Update readme and openapi spec 2022-09-22 09:52:52 +03:00
Anastasia Lubennikova
3a28270157 fix clippy warnings 2022-09-22 09:52:52 +03:00
Anastasia Lubennikova
cbf655a346 use pg_version in python tests 2022-09-22 09:52:52 +03:00
Anastasia Lubennikova
fc08062a13 pass version to wal_craft.rs 2022-09-22 09:52:52 +03:00
Anastasia Lubennikova
39ad16eaa9 Use non-versioned pg_distrib dir 2022-09-22 09:52:49 +03:00
Anastasia Lubennikova
532ae81c33 update build scripts to match pg_distrib_dir versioning schema 2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
51e0cd01b0 fix clippy warning 2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
935e84f579 Rename waldecoder -> waldecoder_handler.rs. Add comments 2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
321376fdba Pass pg_version parameter to timeline import command.
Add pg_version field to LocalTimelineInfo.
Use pg_version in the export_import_between_pageservers script
2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
a2df9d650e Handle backwards-compatibility of TimelineMetadata.
This commit bumps TimelineMetadata format version and makes it independent from STORAGE_FORMAT_VERSION.
2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
e2dec100ff code cleanup after review 2022-09-22 09:52:31 +03:00
Anastasia Lubennikova
a1ce155264 Support pg 15
- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.

-  Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
   To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
   'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
 Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
2022-09-22 09:52:27 +03:00
Konstantin Knizhnik
f3073a4db9 R-Tree layer map (#2317)
Replace the layer array and linear search with R-tree

So far, the in-memory layer map that holds information about layer
files that exist, has used a simple Vec, in no particular order, to
hold information about all the layers. That obviously doesn't scale
very well; with thousands of layer files the linear search was
consuming a lot of CPU. Replace it with a two-dimensional R-tree, with
Key and LSN ranges as the dimensions.

For the R-tree, use the 'rstar' crate. To be able to use that, we
convert the Keys and LSNs into 256-bit integers. 64 bits would be
enough to represent LSNs, and 128 bits would be enough to represent
Keys. However, we use 256 bits, because rstar internally performs
multiplication to calculate the area of rectangles, and the result of
multiplying two 128 bit integers doesn't necessarily fit in 128 bits,
causing integer overflow and, if overflow-checks are enabled,
panic. To avoid that, we use 256 bit integers.

Add a performance test that creates a lot of layer files, to
demonstrate the benefit.
2022-09-22 08:35:06 +03:00
Dmitry Ivanov
e9a103c09f [proxy] Pass extra parameters to the console (#2467)
With this change we now pass additional params
to the console's auth methods.
2022-09-21 21:42:47 +03:00
Arthur Petukhovsky
7eebb45ea6 Reduce metrics footprint in safekeeper (#2491)
Fixes bugs with metrics in control_file and wal_storage, where we haven't deleted metrics for inactive timelines.
2022-09-21 19:13:30 +03:00
Alexander Bayandin
19fa410ff8 NeonCompare: switch to new pageserver HTTP API 2022-09-21 15:17:55 +01:00
Heikki Linnakangas
b82e2e3f18 Bump postgres submodules and update docs/core_changes.md.
The old change to downgrade a WARNING in postgres vacuumlazy.c was
reverted.
2022-09-21 17:14:29 +03:00
Kirill Bulatov
71c92e0db1 Use prebuilt image with Hakari for CI style checks (#2488) 2022-09-21 10:13:11 +00:00
sharnoff
6f949e1556 Improve pageserver/safekeepeer HTTP API errors (#2461)
Part of the general work on improving pageserver logs.

Brief summary of changes:

* Remove `ApiError::from_err`
* Remove `impl From<anyhow::Error> for ApiError`
* Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error`
  * Note: `NotFound` has more verbose formatting because it's more
    likely to have useful information for the receiving "user"
* Explicitly convert from `tokio::task::JoinError`s into
  `InternalServerError`s where appropriate

Also note: many of the places where errors were implicitly converted to
500s have now been updated to return a more appropriate error. Some
places where it's not yet possible to distinguish the error types have
been left as 500s.
2022-09-20 17:02:10 -07:00
Kirill Bulatov
8d7024a8c2 Move path manipulation function to utils 2022-09-20 23:43:52 +03:00
Kirill Bulatov
6b8dcad1bb Unify timeline creation steps 2022-09-20 23:43:52 +03:00
Kirill Bulatov
310c507303 Merge path retrieval methods in config.rs 2022-09-20 23:43:52 +03:00
Kirill Bulatov
6fc719db13 Merge timelines.rs with tenant.rs 2022-09-20 23:43:52 +03:00
sharnoff
4a3b3ff11d Move testing pageserver libpq cmds to HTTP api (#2429)
Closes #2422.

The APIs have been feature gated with the `testing_api!` macro so that
they return 400s when support hasn't been compiled in.
2022-09-20 11:28:12 -07:00
sharnoff
4b25b9652a Rename more zid-like idents (#2480)
Follow-up to PR #2433 (b8eb908a). There's still a few more unresolved
locations that have been left as-is for the same compatibility reasons
in the original PR.
2022-09-20 11:06:31 -07:00
Heikki Linnakangas
a5019bf771 Use a simpler way to set extra options for benchmark test.
Commit 43a4f7173e fixed the case that there are extra options in the
connection string, but broke it in the case when there are not. Fix
that. But on second thoughts, it's more straightforward set the
options with ALTER DATABASE, so change the workflow yaml file to do
that instead.
2022-09-20 13:48:50 +03:00
Kirill Bulatov
7863c4a702 Regenerate Hakari files, add a CI check for that 2022-09-20 11:39:10 +03:00
Arthur Petukhovsky
566e816298 Refactor safekeeper timelines handling (#2329)
See https://github.com/neondatabase/neon/pull/2329 for details
2022-09-20 07:42:39 +00:00
Heikki Linnakangas
e4f775436f Don't override other options than statement_timeout in test conn string.
In commit 6985f6cd6c, I tried passing extra GUCs in the 'options' part
of the connection string, but it didn't work because the pgbench test
overrode it with the statement_timeout. Change it so that it adds the
statement_timeout to any other options, instead of replacing them.
2022-09-20 09:46:15 +03:00
Alexander Bayandin
bb3c66d86f github/workflows: Make publishing perf reports more configurable (#2440) 2022-09-19 22:28:51 +00:00
Heikki Linnakangas
6985f6cd6c Add a new benchmark data series for prefetching.
Also run benchmarks with the seqscan prefetching (commit f44afbaf62)
enabled.

Renames the 'neon-captest' test to 'neon-captest-reuse', for clarity
2022-09-19 20:56:11 +03:00
Dmitry Rodionov
fcb4a61a12 Adjust spans around gc and compaction
So compaction and gc loops have their own span to always show tenant id
in log messages.
2022-09-19 20:08:38 +03:00
Dmitry Rodionov
4b5e7f2f82 Temporarily disable storage deployments
Do not update configs
Do not restart servieces
Still update binaries
2022-09-19 17:03:20 +03:00
Anastasia Lubennikova
d11cb4b2f1 Bump vendor/postgres-v15 to the latest state of REL_15_STABLE_neon branch 2022-09-19 15:12:05 +03:00
Sergey Melnikov
90ed12630e Add zenith-us-stage-ps-4 and undo changes in prefix_in_bucket in pageserver config (#2473)
* Add zenith-us-stage-ps-4

* Undo changes in prefix_in_bucket in pageserver config (Rollback #2449)
2022-09-19 12:57:44 +02:00
Konstantin Knizhnik
846d126579 Set last written lsn for created relation (#2398)
* Set last written lsn for created relation

* use current LSN for updating last written LSN of relation metadata

* Update LSN for the extended blocks even for pges without LSN (zeroed)

* Update pgxn/neon/pagestore_smgr.c

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-09-19 12:56:08 +03:00
Egor Suvorov
c9c3c77c31 Fix Docker image builds (follow-up for #2458) (#2469)
Put ninstall.sh inside Docker images for building
2022-09-16 20:51:35 +03:00
Kirill Bulatov
b46c8b4ae0 Add an alias to build test images simply 2022-09-16 18:58:41 +03:00
Egor Suvorov
65a5010e25 Use custom install command in Makefile to speed up incremental builds (#2458)
Fixes #1873: previously any run of `make` caused the `postgres-v15-headers`
target to build. It copied a bunch of headers via `install -C`. Unfortunately,
some origins were symlinks in the `./pg_install/build` directory pointing
inside `./vendor/postgres-v15` (e.g. `pg_config_os.h` pointing to `linux.h`).

GNU coreutils' `install` ignores the `-C` key for non-regular files and
always overwrites the destination if the origin is a symlink. That in turn
made Cargo rebuild the `postgres_ffi` crate and all its dependencies because
it thinks that Postgres headers changed, even if they did not. That was slow.

Now we use a custom script that wraps the `install` program. It handles one
specific case and makes sure individual headers are never copied if their
content did not change. Hence, `postgres_ffi` is not rebuilt unless there were
some changes to the C code.

One may still have slow incremental single-threaded builds because Postgres
Makefiles spawn about 2800 sub-makes even if no files have been changed.
A no-op build takes "only" 3-4 seconds on my machine now when run with `-j30`,
and 20 seconds when run with `-j1`.
2022-09-16 15:44:02 +00:00
sharnoff
9c35a09452 Improve build errors when postgres_ffi fails (#2460)
This commit does two things of note:

 1. Bumps the bindgen dependency from `0.59.1` to `0.60.1`. This gets us
    an actual error type from bindgen, so we can display what's wrong.
 2. Adds `anyhow` as a build dependency, so our error message can be
    prettier. It's already used heavily elsewhere in the crates in this
    repo, so I figured the fact it's a build dependency doesn't matter
    much.

I ran into this from running `cargo <cmd>` without running `make` first.
Here's a comparison of the compiler output in those two cases.

Before this commit:

```
error: failed to run custom build command for `postgres_ffi v0.1.0 ($repo_path/libs/postgres_ffi)`

Caused by:
  process didn't exit successfully: `$repo_path/target/debug/build/postgres_ffi-2f7253b3ad3ca840/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=bindgen_deps.h

  --- stderr
  bindgen_deps.h:7:10: fatal error: 'c.h' file not found
  bindgen_deps.h:7:10: fatal error: 'c.h' file not found, err: true
  thread 'main' panicked at 'Unable to generate bindings: ()', libs/postgres_ffi/build.rs:135:14
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

After this commit:

```
error: failed to run custom build command for `postgres_ffi v0.1.0 ($repo_path/libs/postgres_ffi)`

Caused by:
  process didn't exit successfully: `$repo_path/target/debug/build/postgres_ffi-e01fb59602596748/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=bindgen_deps.h

  --- stderr
  bindgen_deps.h:7:10: fatal error: 'c.h' file not found
  Error: Unable to generate bindings

  Caused by:
      clang diagnosed error: bindgen_deps.h:7:10: fatal error: 'c.h' file not found
```
2022-09-16 08:37:44 -07:00
Dmitry Rodionov
44fd4e3c9f add more logs 2022-09-16 18:14:05 +03:00
Dmitry Rodionov
4db15d3c7c change prefix_in_bucket in pageserver config 2022-09-16 18:14:05 +03:00
Alexander Bayandin
72b33997c7 Nightly Benchmarks: trigger tests earlier (#2463) 2022-09-16 09:09:54 +00:00
Kirill Bulatov
74312e268f Tidy up storege artifact build flags
* Simplify test build features handling
* Build only necessary binaries during the release build
2022-09-16 11:17:41 +03:00
sharnoff
db5ec0dae7 Cleanup/simplify logical size calculation (#2459)
Should produce identical results; replaces an error case that shouldn't
be possible with `expect`.
2022-09-15 23:50:46 -07:00
Kirill Bulatov
031e57a973 Disable failpoints by default 2022-09-16 09:26:29 +03:00
bojanserafimov
96e867642f Validate tenant create options (#2450)
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2022-09-15 18:20:23 -04:00
Egor Suvorov
e968b5e502 tests: do not set num_safekeepers = 1, it's the default (#2457)
Also get rid if `with_safekeepers` parameter in tests.
Its meaning has changed: `False` meant "no safekeepers" which is not
supported anymore, so we assume it's always `True`.

See #1648
2022-09-15 21:43:51 +03:00
Egor Suvorov
9d9d8e9519 docs/sourcetree: update CLion set up instructions (#2454)
After #2325 the old method no longer works as our Makefile does not print compilation commands when run with --dry-run, see https://github.com/neondatabase/neon/issues/2378#issuecomment-1241421325

This method is much slower but is hopefully robust.

Add some more notes while we're here.
2022-09-15 17:16:07 +00:00
Heikki Linnakangas
1062e57fee Don't run codestyle checks separately for Postgres v14 and v15.
Previously, we compiled neon separately for Postgres v14 and v15, for
the codestyle checks. But that was bogus; we actually just ran "make
postgres", which always compiled both versions. The version really only
affected the caching.

Fix that, by copying the build steps from the main build_and_test.yml
workflow.
2022-09-15 16:33:42 +03:00
dependabot[bot]
a8d9732529 Bump axum-core from 0.2.7 to 0.2.8
Bumps [axum-core](https://github.com/tokio-rs/axum) from 0.2.7 to 0.2.8.
- [Release notes](https://github.com/tokio-rs/axum/releases)
- [Changelog](https://github.com/tokio-rs/axum/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/axum/compare/axum-core-v0.2.7...axum-core-v0.2.8)

---
updated-dependencies:
- dependency-name: axum-core
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-15 14:07:00 +01:00
Alexey Kondratov
757e2147c1 Follow-up for neondatabase/neon#2448 (#2452)
* remove `legacy` mode from the proxy readme
* explicitly specify `authBackend` in the link auth proxy helm-values
  for all envs
2022-09-15 15:21:22 +03:00
Dmitry Ivanov
87bf7be537 [proxy] Drop support for legacy cloud API (#2448)
Apparently, it no longer exists in the cloud.
2022-09-14 21:27:47 +03:00
Heikki Linnakangas
f86ea09323 Avoid recompiling postgres_ffi every time you run "make".
Running "make" at the top level calls "make install" to install the
PostgreSQL headers into the pg_install/ directory. That always updated
the modification time of the headers even if there were no changes,
triggering recompilation of the postgres_ffi bindings. To avoid that,
use 'install -C', to install the PostgreSQL headers.

However, there was an upstream PostgreSQL issue that the
src/include/Makefile didn't respect the INSTALL configure option. That
was just fixed in upstream PostgreSQL, so cherry-pick that fix to our
vendor/postgres repositories.

Fixes https://github.com/neondatabase/neon/issues/1873.
2022-09-14 15:18:18 +03:00
Alexander Bayandin
d87c9e62d6 Nightly Benchmarks: perform tests on both pre-created and fresh projects (#2443) 2022-09-14 10:53:34 +00:00
Heikki Linnakangas
c3096532f9 Fix vendor/postgres-v15 to point to correct v15 branch.
Commit f44afbaf62 updated vendor/postgres-v15 to point to a commit that
was built on top of PostgreSQL 14 rather than 15. So we accidentally had
two copies of PostgreSQL v14 in the repository. Oops. This updates
it to point to the correct version.
2022-09-14 09:23:51 +03:00
Kirill Bulatov
6db6e7ddda Use backward-compatible safekeeper code 2022-09-14 08:14:05 +03:00
Kirill Bulatov
b8eb908a3d Rename old project name references 2022-09-14 08:14:05 +03:00
Kirill Bulatov
260ec20a02 Refotmat pgxn code, add typedefs.list that was used 2022-09-14 08:14:05 +03:00
Dmitry Rodionov
ba8698bbcb update neon_local output in readme 2022-09-14 01:41:27 +03:00
Egor Suvorov
35761ac6b6 docs/sourcetree: add info about IDE config (#2332) 2022-09-13 21:55:18 +00:00
Kirill Bulatov
32b7259d5e Timeline data management RFC (#2152) 2022-09-13 22:37:20 +03:00
Dmitry Rodionov
1d53173e62 update openapi spec (tenant state has changed) 2022-09-13 21:53:59 +03:00
Alexander Bayandin
d4d57ea2dd github/workflows: fix project creation via API (#2437) 2022-09-13 20:26:26 +02:00
Dmitry Rodionov
db0c49148d clean up metrics in handle_pagerequests 2022-09-13 21:21:45 +03:00
Alexander Bayandin
59d04ab66a test_runner: redact passwords from log messages (#2434) 2022-09-13 17:24:11 +00:00
Kirill Bulatov
1a8c8b04d7 Merge Repository and Tenant entities, rework tenant background jobs 2022-09-13 15:39:39 +03:00
Konstantin Knizhnik
f44afbaf62 Changes of neon extension to support local prefetch (#2369)
* Changes of neon extension to support local prefetch

* Catch exceptions in pageserver_receive

* Bump posgres version

* Bump posgres version

* Bump posgres version

* Bump posgres version
2022-09-13 12:26:20 +03:00
Alexander Bayandin
4f7557fb58 github/workflows: Create projects using API (#2403)
* github/actions: add neon projects related actions
* workflows/benchmarking: create projects using API
* workflows/pg_clients: create projects using API
2022-09-13 09:45:45 +01:00
Kirill Bulatov
2a837d7de7 Create tenants in temporary directory first (#2426) 2022-09-12 21:04:33 +00:00
220 changed files with 12088 additions and 5482 deletions

View File

@@ -11,3 +11,6 @@ opt-level = 3
[profile.dev]
# Turn on a small amount of optimization in Development mode.
opt-level = 1
[alias]
build_testing = ["build", "--features", "testing"]

View File

@@ -18,3 +18,4 @@
!vendor/postgres-v15/
!workspace_hack/
!neon_local/
!scripts/ninstall.sh

View File

@@ -0,0 +1,82 @@
name: 'Create Neon Project'
description: 'Create Neon Project using API'
inputs:
api_key:
desctiption: 'Neon API key'
required: true
environment:
desctiption: 'dev (aka captest) or stage'
required: true
region_id:
desctiption: 'Region ID, if not set the project will be created in the default region'
required: false
outputs:
dsn:
description: 'Created Project DSN (for main database)'
value: ${{ steps.create-neon-project.outputs.dsn }}
project_id:
description: 'Created Project ID'
value: ${{ steps.create-neon-project.outputs.project_id }}
runs:
using: "composite"
steps:
- name: Parse Input
id: parse-input
shell: bash -euxo pipefail {0}
run: |
case "${ENVIRONMENT}" in
dev)
API_HOST=console.dev.neon.tech
REGION_ID=${REGION_ID:-eu-west-1}
;;
staging)
API_HOST=console.stage.neon.tech
REGION_ID=${REGION_ID:-us-east-1}
;;
*)
echo 2>&1 "Unknown environment=${ENVIRONMENT}. Allowed 'dev' or 'staging' only"
exit 1
;;
esac
echo "::set-output name=api_host::${API_HOST}"
echo "::set-output name=region_id::${REGION_ID}"
env:
ENVIRONMENT: ${{ inputs.environment }}
REGION_ID: ${{ inputs.region_id }}
- name: Create Neon Project
id: create-neon-project
# A shell without `set -x` to not to expose password/dsn in logs
shell: bash -euo pipefail {0}
run: |
project=$(curl \
"https://${API_HOST}/api/v1/projects" \
--fail \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--data "{
\"project\": {
\"name\": \"Created by actions/neon-project-create; GITHUB_RUN_ID=${GITHUB_RUN_ID}\",
\"platform_id\": \"aws\",
\"region_id\": \"${REGION_ID}\",
\"settings\": { }
}
}")
# Mask password
echo "::add-mask::$(echo $project | jq --raw-output '.roles[] | select(.name != "web_access") | .password')"
dsn=$(echo $project | jq --raw-output '.roles[] | select(.name != "web_access") | .dsn')/main
echo "::add-mask::${dsn}"
echo "::set-output name=dsn::${dsn}"
project_id=$(echo $project | jq --raw-output '.id')
echo "::set-output name=project_id::${project_id}"
env:
API_KEY: ${{ inputs.api_key }}
API_HOST: ${{ steps.parse-input.outputs.api_host }}
REGION_ID: ${{ steps.parse-input.outputs.region_id }}

View File

@@ -0,0 +1,54 @@
name: 'Delete Neon Project'
description: 'Delete Neon Project using API'
inputs:
api_key:
desctiption: 'Neon API key'
required: true
environment:
desctiption: 'dev (aka captest) or stage'
required: true
project_id:
desctiption: 'ID of the Project to delete'
required: true
runs:
using: "composite"
steps:
- name: Parse Input
id: parse-input
shell: bash -euxo pipefail {0}
run: |
case "${ENVIRONMENT}" in
dev)
API_HOST=console.dev.neon.tech
;;
staging)
API_HOST=console.stage.neon.tech
;;
*)
echo 2>&1 "Unknown environment=${ENVIRONMENT}. Allowed 'dev' or 'staging' only"
exit 1
;;
esac
echo "::set-output name=api_host::${API_HOST}"
env:
ENVIRONMENT: ${{ inputs.environment }}
- name: Delete Neon Project
shell: bash -euxo pipefail {0}
run: |
# Allow PROJECT_ID to be empty/null for cases when .github/actions/neon-project-create failed
if [ -n "${PROJECT_ID}" ]; then
curl -X "POST" \
"https://${API_HOST}/api/v1/projects/${PROJECT_ID}/delete" \
--fail \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}"
fi
env:
API_KEY: ${{ inputs.api_key }}
PROJECT_ID: ${{ inputs.project_id }}
API_HOST: ${{ steps.parse-input.outputs.api_host }}

View File

@@ -85,7 +85,8 @@ runs:
# PLATFORM will be embedded in the perf test report
# and it is needed to distinguish different environments
export PLATFORM=${PLATFORM:-github-actions-selfhosted}
export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install/v14}
export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install}
export DEFAULT_PG_VERSION=${DEFAULT_PG_VERSION:-14}
if [ "${BUILD_TYPE}" = "remote" ]; then
export REMOTE_ENV=1
@@ -112,10 +113,8 @@ runs:
fi
if [[ "${{ inputs.save_perf_report }}" == "true" ]]; then
if [[ "$GITHUB_REF" == "refs/heads/main" ]]; then
mkdir -p "$PERF_REPORT_DIR"
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
mkdir -p "$PERF_REPORT_DIR"
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
if [[ "${{ inputs.build_type }}" == "debug" ]]; then
@@ -128,7 +127,7 @@ runs:
# Wake up the cluster if we use remote neon instance
if [ "${{ inputs.build_type }}" = "remote" ] && [ -n "${BENCHMARK_CONNSTR}" ]; then
${POSTGRES_DISTRIB_DIR}/bin/psql ${BENCHMARK_CONNSTR} -c "SELECT version();"
${POSTGRES_DISTRIB_DIR}/v14/bin/psql ${BENCHMARK_CONNSTR} -c "SELECT version();"
fi
# Run the tests.
@@ -150,11 +149,9 @@ runs:
-rA $TEST_SELECTION $EXTRA_PARAMS
if [[ "${{ inputs.save_perf_report }}" == "true" ]]; then
if [[ "$GITHUB_REF" == "refs/heads/main" ]]; then
export REPORT_FROM="$PERF_REPORT_DIR"
export REPORT_TO="$PLATFORM"
scripts/generate_and_push_perf_report.sh
fi
export REPORT_FROM="$PERF_REPORT_DIR"
export REPORT_TO="$PLATFORM"
scripts/generate_and_push_perf_report.sh
fi
- name: Create Allure report

View File

@@ -63,18 +63,18 @@
tags:
- pageserver
- name: update remote storage (s3) config
lineinfile:
path: /storage/pageserver/data/pageserver.toml
line: "{{ item }}"
loop:
- "[remote_storage]"
- "bucket_name = '{{ bucket_name }}'"
- "bucket_region = '{{ bucket_region }}'"
- "prefix_in_bucket = '{{ inventory_hostname }}'"
become: true
tags:
- pageserver
# - name: update remote storage (s3) config
# lineinfile:
# path: /storage/pageserver/data/pageserver.toml
# line: "{{ item }}"
# loop:
# - "[remote_storage]"
# - "bucket_name = '{{ bucket_name }}'"
# - "bucket_region = '{{ bucket_region }}'"
# - "prefix_in_bucket = '{{ inventory_hostname }}'"
# become: true
# tags:
# - pageserver
- name: upload systemd service definition
ansible.builtin.template:
@@ -87,15 +87,15 @@
tags:
- pageserver
- name: start systemd service
ansible.builtin.systemd:
daemon_reload: yes
name: pageserver
enabled: yes
state: restarted
become: true
tags:
- pageserver
# - name: start systemd service
# ansible.builtin.systemd:
# daemon_reload: yes
# name: pageserver
# enabled: yes
# state: restarted
# become: true
# tags:
# - pageserver
- name: post version to console
when: console_mgmt_base_url is defined

View File

@@ -2,6 +2,7 @@
#zenith-us-stage-ps-1 console_region_id=27
zenith-us-stage-ps-2 console_region_id=27
zenith-us-stage-ps-3 console_region_id=27
zenith-us-stage-ps-4 console_region_id=27
[safekeepers]
zenith-us-stage-sk-4 console_region_id=27

View File

@@ -1,6 +1,7 @@
fullnameOverride: "neon-stress-proxy"
settings:
authBackend: "link"
authEndpoint: "https://console.dev.neon.tech/authenticate_proxy_request/"
uri: "https://console.dev.neon.tech/psql_session/"

View File

@@ -1,4 +1,5 @@
settings:
authBackend: "link"
authEndpoint: "https://console.neon.tech/authenticate_proxy_request/"
uri: "https://console.neon.tech/psql_session/"

View File

@@ -5,6 +5,7 @@ image:
repository: neondatabase/neon
settings:
authBackend: "link"
authEndpoint: "https://console.stage.neon.tech/authenticate_proxy_request/"
uri: "https://console.stage.neon.tech/psql_session/"

View File

@@ -11,9 +11,20 @@ on:
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '36 4 * * *' # run once a day, timezone is utc
- cron: '0 3 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
inputs:
environment:
description: 'Environment to run remote tests on (dev or staging)'
required: false
region_id:
description: 'Use a particular region. If not set the default region will be used'
required: false
save_perf_report:
type: boolean
description: 'Publish perf report or not. If not set, the report is published only for the main branch'
required: false
defaults:
run:
@@ -62,19 +73,12 @@ jobs:
echo Pgbench
$POSTGRES_DISTRIB_DIR/bin/pgbench --version
# FIXME cluster setup is skipped due to various changes in console API
# for now pre created cluster is used. When API gain some stability
# after massive changes dynamic cluster setup will be revived.
# So use pre created cluster. It needs to be started manually, but stop is automatic after 5 minutes of inactivity
- name: Setup cluster
env:
BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
run: |
set -e
echo "Starting cluster"
# wake up the cluster
$POSTGRES_DISTRIB_DIR/bin/psql $BENCHMARK_CONNSTR -c "SELECT 1"
- name: Create Neon Project
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
environment: ${{ github.event.inputs.environment || 'staging' }}
api_key: ${{ ( github.event.inputs.environment || 'staging' ) == 'staging' && secrets.NEON_STAGING_API_KEY || secrets.NEON_CAPTEST_API_KEY }}
- name: Run benchmark
# pgbench is installed system wide from official repo
@@ -97,7 +101,7 @@ jobs:
TEST_PG_BENCH_DURATIONS_MATRIX: "300"
TEST_PG_BENCH_SCALES_MATRIX: "10,100"
PLATFORM: "neon-staging"
BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
REMOTE_ENV: "1" # indicate to test harness that we do not have zenith binaries locally
run: |
# just to be sure that no data was cached on self hosted runner
@@ -115,6 +119,14 @@ jobs:
run: |
REPORT_FROM=$(realpath perf-report-staging) REPORT_TO=staging scripts/generate_and_push_perf_report.sh
- name: Delete Neon Project
if: ${{ always() }}
uses: ./.github/actions/neon-project-delete
with:
environment: staging
project_id: ${{ steps.create-neon-project.outputs.project_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
@@ -131,11 +143,15 @@ jobs:
POSTGRES_DISTRIB_DIR: /usr
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
strategy:
fail-fast: false
matrix:
connstr: [ BENCHMARK_CAPTEST_CONNSTR, BENCHMARK_RDS_CONNSTR ]
# neon-captest-new: Run pgbench in a freshly created project
# neon-captest-reuse: Same, but reusing existing project
# neon-captest-prefetch: Same, with prefetching enabled (new project)
platform: [ neon-captest-new, neon-captest-reuse, neon-captest-prefetch, rds-aurora ]
runs-on: dev
container:
@@ -147,38 +163,63 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Calculate platform
id: calculate-platform
env:
CONNSTR: ${{ matrix.connstr }}
run: |
if [ "${CONNSTR}" = "BENCHMARK_CAPTEST_CONNSTR" ]; then
PLATFORM=neon-captest
elif [ "${CONNSTR}" = "BENCHMARK_RDS_CONNSTR" ]; then
PLATFORM=rds-aurora
else
echo 2>&1 "Unknown CONNSTR=${CONNSTR}. Allowed are BENCHMARK_CAPTEST_CONNSTR, and BENCHMARK_RDS_CONNSTR only"
exit 1
fi
echo "::set-output name=PLATFORM::${PLATFORM}"
- name: Install Deps
run: |
sudo apt -y update
sudo apt install -y postgresql-14
- name: Create Neon Project
if: matrix.platform != 'neon-captest-reuse'
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
environment: ${{ github.event.inputs.environment || 'dev' }}
api_key: ${{ ( github.event.inputs.environment || 'dev' ) == 'staging' && secrets.NEON_STAGING_API_KEY || secrets.NEON_CAPTEST_API_KEY }}
- name: Set up Connection String
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}
;;
neon-captest-new | neon-captest-prefetch)
CONNSTR=${{ steps.create-neon-project.outputs.dsn }}
;;
rds-aurora)
CONNSTR=${{ secrets.BENCHMARK_RDS_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'neon-captest-new', 'neon-captest-prefetch' or 'rds-aurora'"
exit 1
;;
esac
echo "::set-output name=connstr::${CONNSTR}"
psql ${CONNSTR} -c "SELECT version();"
env:
PLATFORM: ${{ matrix.platform }}
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE main SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE main SET seqscan_prefetch_buffers=10"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Benchmark init
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_init
env:
PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
PLATFORM: ${{ matrix.platform }}
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -188,39 +229,48 @@ jobs:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_simple_update
env:
PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
PLATFORM: ${{ matrix.platform }}
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: Benchmark simple-update
- name: Benchmark select-only
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_select_only
env:
PLATFORM: ${{ steps.calculate-platform.outputs.PLATFORM }}
BENCHMARK_CONNSTR: ${{ secrets[matrix.connstr] }}
PLATFORM: ${{ matrix.platform }}
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: Create Allure report
if: always()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
- name: Delete Neon Project
if: ${{ matrix.platform != 'neon-captest-reuse' && always() }}
uses: ./.github/actions/neon-project-delete
with:
environment: dev
project_id: ${{ steps.create-neon-project.outputs.project_id }}
api_key: ${{ secrets.NEON_CAPTEST_API_KEY }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C033QLM5P7D" # dev-staging-stream
slack-message: "Periodic perf testing: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
slack-message: "Periodic perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

View File

@@ -94,15 +94,17 @@ jobs:
# CARGO_FEATURES is passed to "cargo metadata". It is separate from CARGO_FLAGS,
# because "cargo metadata" doesn't accept --release or --debug options
#
# We run tests with addtional features, that are turned off by default (e.g. in release builds), see
# corresponding Cargo.toml files for their descriptions.
- name: Set env variables
run: |
if [[ $BUILD_TYPE == "debug" ]]; then
cov_prefix="scripts/coverage --profraw-prefix=$GITHUB_JOB --dir=/tmp/coverage run"
CARGO_FEATURES=""
CARGO_FLAGS="--locked --timings"
CARGO_FEATURES="--features testing"
CARGO_FLAGS="--locked --timings $CARGO_FEATURES"
elif [[ $BUILD_TYPE == "release" ]]; then
cov_prefix=""
CARGO_FEATURES="--features profiling"
CARGO_FEATURES="--features testing,profiling"
CARGO_FLAGS="--locked --timings --release $CARGO_FEATURES"
fi
echo "cov_prefix=${cov_prefix}" >> $GITHUB_ENV
@@ -158,7 +160,7 @@ jobs:
- name: Run cargo build
run: |
${cov_prefix} mold -run cargo build $CARGO_FLAGS --features failpoints --bins --tests
${cov_prefix} mold -run cargo build $CARGO_FLAGS --bins --tests
shell: bash -euxo pipefail {0}
- name: Run cargo test
@@ -290,7 +292,7 @@ jobs:
build_type: ${{ matrix.build_type }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
save_perf_report: ${{ github.ref == 'refs/heads/main' }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -322,6 +324,7 @@ jobs:
build_type: ${{ matrix.build_type }}
- name: Store Allure test stat in the DB
if: ${{ steps.create-allure-report.outputs.report-url }}
env:
BUILD_TYPE: ${{ matrix.build_type }}
SHA: ${{ github.event.pull_request.head.sha || github.sha }}

View File

@@ -30,9 +30,7 @@ jobs:
# this is all we need to install our toolchain later via rust-toolchain.toml
# so don't install any toolchain explicitly.
os: [ubuntu-latest, macos-latest]
# To support several Postgres versions, add them here.
postgres_version: [v14, v15]
timeout-minutes: 60
timeout-minutes: 90
name: check codestyle rust and postgres
runs-on: ${{ matrix.os }}
@@ -56,17 +54,29 @@ jobs:
if: matrix.os == 'macos-latest'
run: brew install flex bison openssl
- name: Set pg revision for caching
id: pg_ver
run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres-${{matrix.postgres_version}})
- name: Set pg 14 revision for caching
id: pg_v14_rev
run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres-v14)
shell: bash -euxo pipefail {0}
- name: Cache postgres ${{matrix.postgres_version}} build
id: cache_pg
- name: Set pg 15 revision for caching
id: pg_v15_rev
run: echo ::set-output name=pg_rev::$(git rev-parse HEAD:vendor/postgres-v15)
shell: bash -euxo pipefail {0}
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v3
with:
path: |
pg_install/${{matrix.postgres_version}}
key: ${{ runner.os }}-pg-${{ steps.pg_ver.outputs.pg_rev }}
path: pg_install/v14
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v3
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Set extra env for macOS
if: matrix.os == 'macos-latest'
@@ -74,24 +84,19 @@ jobs:
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Build postgres
if: steps.cache_pg.outputs.cache-hit != 'true'
run: make postgres
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: make postgres-v14
shell: bash -euxo pipefail {0}
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: make postgres-v15
shell: bash -euxo pipefail {0}
- name: Build neon extensions
run: make neon-pg-ext
# Plain configure output can contain weird errors like 'error: C compiler cannot create executables'
# and the real cause will be inside config.log
- name: Print configure logs in case of failure
if: failure()
continue-on-error: true
run: |
echo '' && echo '=== Postgres ${{matrix.postgres_version}} config.log ===' && echo ''
cat pg_install/build/${{matrix.postgres_version}}/config.log
echo '' && echo '=== Postgres ${{matrix.postgres_version}} configure.log ===' && echo ''
cat pg_install/build/${{matrix.postgres_version}}/configure.log
- name: Cache cargo deps
id: cache_cargo
uses: actions/cache@v3
@@ -109,6 +114,26 @@ jobs:
- name: Ensure all project builds
run: cargo build --locked --all --all-targets
check-rust-dependencies:
runs-on: dev
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: false
fetch-depth: 1
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check every project module is covered by Hakari
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack
shell: bash -euxo pipefail {0}
check-codestyle-python:
runs-on: [ self-hosted, Linux, k8s-runner ]
steps:

View File

@@ -47,17 +47,23 @@ jobs:
shell: bash -euxo pipefail {0}
run: ./scripts/pysync
- name: Create Neon Project
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
environment: staging
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Run pytest
env:
REMOTE_ENV: 1
BENCHMARK_CONNSTR: "${{ secrets.BENCHMARK_STAGING_CONNSTR }}"
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install/v14
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
shell: bash -euxo pipefail {0}
run: |
# Test framework expects we have psql binary;
# but since we don't really need it in this test, let's mock it
mkdir -p "$POSTGRES_DISTRIB_DIR/bin" && touch "$POSTGRES_DISTRIB_DIR/bin/psql";
mkdir -p "$POSTGRES_DISTRIB_DIR/v14/bin" && touch "$POSTGRES_DISTRIB_DIR/v14/bin/psql";
./scripts/pytest \
--junitxml=$TEST_OUTPUT/junit.xml \
--tb=short \
@@ -65,6 +71,14 @@ jobs:
-m "remote_cluster" \
-rA "test_runner/pg_clients"
- name: Delete Neon Project
if: ${{ always() }}
uses: ./.github/actions/neon-project-delete
with:
environment: staging
project_id: ${{ steps.create-neon-project.outputs.project_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
# We use GitHub's action upload-artifact because `ubuntu-latest` doesn't have configured AWS CLI.
# It will be fixed after switching to gen2 runner
- name: Upload python test logs

279
Cargo.lock generated
View File

@@ -37,6 +37,12 @@ dependencies = [
"memchr",
]
[[package]]
name = "amplify_num"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f27d3d00d3d115395a7a8a4dc045feb7aa82b641e485f7e15f4e67ac16f4f56d"
[[package]]
name = "ansi_term"
version = "0.12.1"
@@ -135,6 +141,15 @@ dependencies = [
"syn",
]
[[package]]
name = "atomic-polyfill"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9c041a8d9751a520ee19656232a18971f18946a7900f1520ee4400002244dd89"
dependencies = [
"critical-section",
]
[[package]]
name = "atty"
version = "0.2.14"
@@ -183,9 +198,9 @@ dependencies = [
[[package]]
name = "axum-core"
version = "0.2.7"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e4f44a0e6200e9d11a1cdc989e4b358f6e3d354fbf48478f345a17f4e43f8635"
checksum = "d9f0c0a60006f2a293d82d571f635042a72edf927539b7685bd62d361963839b"
dependencies = [
"async-trait",
"bytes",
@@ -193,6 +208,8 @@ dependencies = [
"http",
"http-body",
"mime",
"tower-layer",
"tower-service",
]
[[package]]
@@ -210,6 +227,21 @@ dependencies = [
"rustc-demangle",
]
[[package]]
name = "bare-metal"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5deb64efa5bd81e31fcd1938615a6d98c82eafcbcd787162b6f63b91d6bac5b3"
dependencies = [
"rustc_version 0.2.3",
]
[[package]]
name = "bare-metal"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8fe8f5a8a398345e52358e18ff07cc17a568fbca5c6f73873d3a62056309603"
[[package]]
name = "base64"
version = "0.13.0"
@@ -227,14 +259,14 @@ dependencies = [
[[package]]
name = "bindgen"
version = "0.59.2"
version = "0.60.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2bd2a9a458e8f4304c52c43ebb0cfbd520289f8379a52e329a38afda99bf8eb8"
checksum = "062dddbc1ba4aca46de6338e2bf87771414c335f7b2f2036e8f3e9befebf88e6"
dependencies = [
"bitflags",
"cexpr",
"clang-sys",
"clap 2.34.0",
"clap 3.2.16",
"env_logger",
"lazy_static",
"lazycell",
@@ -248,6 +280,18 @@ dependencies = [
"which",
]
[[package]]
name = "bit_field"
version = "0.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dcb6dd1c2376d2e096796e234a70e17e94cc2d5d54ff8ce42b28cef1d0d359a4"
[[package]]
name = "bitfield"
version = "0.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "46afbd2983a5d5a7bd740ccb198caf5b82f45c40c09c0eed36052d91cb92e719"
[[package]]
name = "bitflags"
version = "1.3.2"
@@ -375,13 +419,9 @@ version = "2.34.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a0610544180c38b88101fecf2dd634b174a62eef6946f84dfc6a7127512b381c"
dependencies = [
"ansi_term",
"atty",
"bitflags",
"strsim 0.8.0",
"textwrap 0.11.0",
"unicode-width",
"vec_map",
]
[[package]]
@@ -394,7 +434,7 @@ dependencies = [
"bitflags",
"clap_lex",
"indexmap",
"strsim 0.10.0",
"strsim",
"termcolor",
"textwrap 0.15.0",
]
@@ -530,6 +570,18 @@ version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5827cebf4670468b8772dd191856768aedcb1b0278a04f989f7766351917b9dc"
[[package]]
name = "cortex-m"
version = "0.7.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "70858629a458fdfd39f9675c4dc309411f2a3f83bede76988d81bf1a0ecee9e0"
dependencies = [
"bare-metal 0.2.5",
"bitfield",
"embedded-hal",
"volatile-register",
]
[[package]]
name = "cpp_demangle"
version = "0.3.5"
@@ -554,7 +606,7 @@ version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3dfea2db42e9927a3845fb268a10a72faed6d416065f77873f05e411457c363e"
dependencies = [
"rustc_version",
"rustc_version 0.4.0",
]
[[package]]
@@ -602,6 +654,18 @@ dependencies = [
"itertools",
]
[[package]]
name = "critical-section"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "95da181745b56d4bd339530ec393508910c909c784e8962d15d722bacf0bcbcd"
dependencies = [
"bare-metal 1.0.0",
"cfg-if",
"cortex-m",
"riscv",
]
[[package]]
name = "crossbeam-channel"
version = "0.5.6"
@@ -744,7 +808,7 @@ dependencies = [
"ident_case",
"proc-macro2",
"quote",
"strsim 0.10.0",
"strsim",
"syn",
]
@@ -846,6 +910,16 @@ version = "1.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f107b87b6afc2a64fd13cac55fe06d6c8859f12d4b14cbcdd2c67d0976781be"
[[package]]
name = "embedded-hal"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "35949884794ad573cf46071e41c9b60efb0cb311e3ca01f7af807af1debc66ff"
dependencies = [
"nb 0.1.3",
"void",
]
[[package]]
name = "encoding_rs"
version = "0.8.31"
@@ -1167,6 +1241,15 @@ version = "1.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eabb4a44450da02c90444cf74558da904edde8fb4e9035a9a6a4e15445af0bd7"
[[package]]
name = "hash32"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b0c35f58762feb77d74ebe43bdbc3210f09be9fe6742234d573bacc26ed92b67"
dependencies = [
"byteorder",
]
[[package]]
name = "hashbrown"
version = "0.12.3"
@@ -1176,6 +1259,19 @@ dependencies = [
"ahash",
]
[[package]]
name = "heapless"
version = "0.7.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "db04bc24a18b9ea980628ecf00e6c0264f3c1426dac36c00cb49b6fbad8b0743"
dependencies = [
"atomic-polyfill",
"hash32",
"rustc_version 0.4.0",
"spin 0.9.4",
"stable_deref_trait",
]
[[package]]
name = "heck"
version = "0.3.3"
@@ -1493,6 +1589,12 @@ dependencies = [
"winapi",
]
[[package]]
name = "libm"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "292a948cd991e376cf75541fe5b97a1081d713c618b4f1b9500f8844e49eb565"
[[package]]
name = "lock_api"
version = "0.4.7"
@@ -1651,6 +1753,21 @@ dependencies = [
"tempfile",
]
[[package]]
name = "nb"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "801d31da0513b6ec5214e9bf433a77966320625a37860f910be265be6e18d06f"
dependencies = [
"nb 1.0.0",
]
[[package]]
name = "nb"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "546c37ac5d9e56f55e73b677106873d9d9f5190605e41a856503623648488cae"
[[package]]
name = "nix"
version = "0.23.1"
@@ -1718,6 +1835,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "578ede34cf02f8924ab9447f50c28075b4d3e5b269972345e7e0372b38c6cdcd"
dependencies = [
"autocfg",
"libm",
]
[[package]]
@@ -1830,6 +1948,7 @@ checksum = "648001efe5d5c0102d8cea768e348da85d90af8ba91f0bea908f157951493cd4"
name = "pageserver"
version = "0.1.0"
dependencies = [
"amplify_num",
"anyhow",
"async-stream",
"async-trait",
@@ -1854,6 +1973,7 @@ dependencies = [
"itertools",
"metrics",
"nix",
"num-traits",
"once_cell",
"postgres",
"postgres-protocol",
@@ -1863,6 +1983,7 @@ dependencies = [
"rand",
"regex",
"remote_storage",
"rstar",
"scopeguard",
"serde",
"serde_json",
@@ -2048,7 +2169,7 @@ dependencies = [
[[package]]
name = "postgres"
version = "0.19.2"
source = "git+https://github.com/zenithdb/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
source = "git+https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
dependencies = [
"bytes",
"fallible-iterator",
@@ -2061,7 +2182,7 @@ dependencies = [
[[package]]
name = "postgres-protocol"
version = "0.6.4"
source = "git+https://github.com/zenithdb/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
source = "git+https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
dependencies = [
"base64",
"byteorder",
@@ -2079,7 +2200,7 @@ dependencies = [
[[package]]
name = "postgres-types"
version = "0.2.3"
source = "git+https://github.com/zenithdb/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
source = "git+https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
dependencies = [
"bytes",
"fallible-iterator",
@@ -2285,6 +2406,7 @@ dependencies = [
"tokio-rustls",
"url",
"utils",
"uuid",
"workspace_hack",
"x509-parser",
]
@@ -2446,6 +2568,7 @@ dependencies = [
"tokio-util",
"toml_edit",
"tracing",
"utils",
"workspace_hack",
]
@@ -2515,12 +2638,33 @@ dependencies = [
"cc",
"libc",
"once_cell",
"spin",
"spin 0.5.2",
"untrusted",
"web-sys",
"winapi",
]
[[package]]
name = "riscv"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6907ccdd7a31012b70faf2af85cd9e5ba97657cc3987c4f13f8e4d2c2a088aba"
dependencies = [
"bare-metal 1.0.0",
"bit_field",
"riscv-target",
]
[[package]]
name = "riscv-target"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "88aa938cda42a0cf62a20cfe8d139ff1af20c2e681212b5b34adb5a58333f222"
dependencies = [
"lazy_static",
"regex",
]
[[package]]
name = "routerify"
version = "3.0.0"
@@ -2534,6 +2678,17 @@ dependencies = [
"regex",
]
[[package]]
name = "rstar"
version = "0.9.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b40f1bfe5acdab44bc63e6699c28b74f75ec43afb59f3eda01e145aff86a25fa"
dependencies = [
"heapless",
"num-traits",
"smallvec",
]
[[package]]
name = "rstest"
version = "0.12.0"
@@ -2543,7 +2698,7 @@ dependencies = [
"cfg-if",
"proc-macro2",
"quote",
"rustc_version",
"rustc_version 0.4.0",
"syn",
]
@@ -2565,7 +2720,7 @@ dependencies = [
"log",
"rusoto_credential",
"rusoto_signature",
"rustc_version",
"rustc_version 0.4.0",
"serde",
"serde_json",
"tokio",
@@ -2623,7 +2778,7 @@ dependencies = [
"percent-encoding",
"pin-project-lite",
"rusoto_credential",
"rustc_version",
"rustc_version 0.4.0",
"serde",
"sha2 0.9.9",
"tokio",
@@ -2641,13 +2796,22 @@ version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08d43f7aa6b08d49f382cde6a7982047c3426db949b1424bc4b7ec9ae12c6ce2"
[[package]]
name = "rustc_version"
version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "138e3e0acb6c9fb258b19b67cb8abd63c00679d2851805ea151465464fe9030a"
dependencies = [
"semver 0.9.0",
]
[[package]]
name = "rustc_version"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bfa0f585226d2e68097d4f95d113b15b83a82e819ab25717ec0590d9584ef366"
dependencies = [
"semver",
"semver 1.0.13",
]
[[package]]
@@ -2721,6 +2885,7 @@ dependencies = [
"hyper",
"metrics",
"once_cell",
"parking_lot 0.12.1",
"postgres",
"postgres-protocol",
"postgres_ffi",
@@ -2731,6 +2896,7 @@ dependencies = [
"serde_with",
"signal-hook",
"tempfile",
"thiserror",
"tokio",
"tokio-postgres",
"toml_edit",
@@ -2798,12 +2964,27 @@ dependencies = [
"libc",
]
[[package]]
name = "semver"
version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d7eb9ef2c18661902cc47e535f9bc51b78acd254da71d375c2f6720d9a40403"
dependencies = [
"semver-parser",
]
[[package]]
name = "semver"
version = "1.0.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "93f6841e709003d68bb2deee8c343572bf446003ec20a583e76f7b15cebf3711"
[[package]]
name = "semver-parser"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "388a1df253eca08550bef6c72392cfe7c30914bf41df5269b68cbd6ff8f570a3"
[[package]]
name = "serde"
version = "1.0.142"
@@ -2997,6 +3178,15 @@ version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e63cff320ae2c57904679ba7cb63280a3dc4613885beafb148ee7bf9aa9042d"
[[package]]
name = "spin"
version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f6002a767bff9e83f8eeecf883ecb8011875a21ae8da43bffb817a57e78cc09"
dependencies = [
"lock_api",
]
[[package]]
name = "stable_deref_trait"
version = "1.2.0"
@@ -3019,12 +3209,6 @@ dependencies = [
"unicode-normalization",
]
[[package]]
name = "strsim"
version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]]
name = "strsim"
version = "0.10.0"
@@ -3295,7 +3479,7 @@ dependencies = [
[[package]]
name = "tokio-postgres"
version = "0.7.6"
source = "git+https://github.com/zenithdb/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
source = "git+https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b86fff9897c77b0fe89ea9daba0e1fa38"
dependencies = [
"async-trait",
"byteorder",
@@ -3668,6 +3852,10 @@ name = "uuid"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bc5cf98d8186244414c848017f0e2676b3fcb46807f6668a97dfe67359a3c4b7"
dependencies = [
"getrandom",
"serde",
]
[[package]]
name = "valuable"
@@ -3675,24 +3863,39 @@ version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "830b7e5d4d90034032940e4ace0d9a9a057e7a45cd94e6c007832e39edb82f6d"
[[package]]
name = "vcell"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77439c1b53d2303b20d9459b1ade71a83c716e3f9c34f3228c00e6f185d6c002"
[[package]]
name = "vcpkg"
version = "0.2.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "accd4ea62f7bb7a82fe23066fb0957d48ef677f6eeb8215f372f52e48bb32426"
[[package]]
name = "vec_map"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1bddf1187be692e79c5ffeab891132dfb0f236ed36a43c7ed39f1165ee20191"
[[package]]
name = "version_check"
version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "49874b5167b65d7193b8aba1567f5c7d93d001cafc34600cee003eda787e483f"
[[package]]
name = "void"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d"
[[package]]
name = "volatile-register"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9ee8f19f9d74293faf70901bc20ad067dc1ad390d2cbf1e3f75f721ffee908b6"
dependencies = [
"vcell",
]
[[package]]
name = "wal_craft"
version = "0.1.0"
@@ -3705,6 +3908,7 @@ dependencies = [
"postgres",
"postgres_ffi",
"tempfile",
"workspace_hack",
]
[[package]]
@@ -3938,16 +4142,9 @@ dependencies = [
"bstr",
"bytes",
"chrono",
"clap 2.34.0",
"either",
"fail",
"futures-channel",
"futures-task",
"futures-util",
"generic-array",
"hashbrown",
"hex",
"hyper",
"indexmap",
"itoa 0.4.8",
"libc",
@@ -3964,12 +4161,14 @@ dependencies = [
"regex-syntax",
"scopeguard",
"serde",
"stable_deref_trait",
"syn",
"time 0.3.12",
"tokio",
"tokio-util",
"tracing",
"tracing-core",
"uuid",
]
[[package]]

View File

@@ -70,4 +70,4 @@ lto = true
# This is only needed for proxy's tests.
# TODO: we should probably fork `tokio-postgres-rustls` instead.
[patch.crates-io]
tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }

View File

@@ -14,6 +14,7 @@ COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14
COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15
COPY --chown=nonroot pgxn pgxn
COPY --chown=nonroot Makefile Makefile
COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh
ENV BUILD_TYPE release
RUN set -e \
@@ -22,7 +23,7 @@ RUN set -e \
&& rm -rf pg_install/v15/build \
&& tar -C pg_install/v14 -czf /home/nonroot/postgres_install.tar.gz .
# Build zenith binaries
# Build neon binaries
FROM $REPOSITORY/$IMAGE:$TAG AS build
WORKDIR /home/nonroot
ARG GIT_VERSION=local
@@ -44,7 +45,7 @@ COPY . .
# Show build caching stats to check if it was used in the end.
# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.
RUN set -e \
&& mold -run cargo build --locked --release \
&& mold -run cargo build --bin pageserver --bin safekeeper --bin proxy --locked --release \
&& cachepot -s
# Build final image
@@ -60,29 +61,29 @@ RUN set -e \
openssl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
&& useradd -d /data zenith \
&& chown -R zenith:zenith /data
&& useradd -d /data neon \
&& chown -R neon:neon /data
COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/pageserver /usr/local/bin
COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/safekeeper /usr/local/bin
COPY --from=build --chown=zenith:zenith /home/nonroot/target/release/proxy /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/safekeeper /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy /usr/local/bin
# v14 is default for now
COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/
COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/
COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/
COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/
# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.
# Now, when `docker run ... pageserver` is run, it can start without errors, yet will have some default dummy values.
RUN mkdir -p /data/.neon/ && chown -R zenith:zenith /data/.neon/ \
RUN mkdir -p /data/.neon/ && chown -R neon:neon /data/.neon/ \
&& /usr/local/bin/pageserver -D /data/.neon/ --init \
-c "id=1234" \
-c "broker_endpoints=['http://etcd:2379']" \
-c "pg_distrib_dir='/usr/local'" \
-c "pg_distrib_dir='/usr/local/'" \
-c "listen_pg_addr='0.0.0.0:6400'" \
-c "listen_http_addr='0.0.0.0:9898'"
VOLUME ["/data"]
USER zenith
USER neon
EXPOSE 6400
EXPOSE 9898
CMD ["/bin/bash"]

View File

@@ -34,6 +34,12 @@ ifeq ($(UNAME_S),Darwin)
PG_CONFIGURE_OPTS += --with-includes=$(OPENSSL_PREFIX)/include --with-libraries=$(OPENSSL_PREFIX)/lib
endif
# Use -C option so that when PostgreSQL "make install" installs the
# headers, the mtime of the headers are not changed when there have
# been no changes to the files. Changing the mtime triggers an
# unnecessary rebuild of 'postgres_ffi'.
PG_CONFIGURE_OPTS += INSTALL='$(ROOT_PROJECT_DIR)/scripts/ninstall.sh -C'
# Choose whether we should be silent or verbose
CARGO_BUILD_FLAGS += --$(if $(filter s,$(MAKEFLAGS)),quiet,verbose)
# Fix for a corner case when make doesn't pass a jobserver

View File

@@ -125,16 +125,18 @@ Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (r
# Create repository in .neon with proper paths to binaries and data
# Later that would be responsibility of a package install script
> ./target/debug/neon_local init
initializing tenantid 9ef87a5bf0d92544f6fafeeb3239695c
created initial timeline de200bd42b49cc1814412c7e592dd6e9 timeline.lsn 0/16B5A50
initial timeline de200bd42b49cc1814412c7e592dd6e9 created
pageserver init succeeded
Starting pageserver at '127.0.0.1:64000' in '.neon'
Pageserver started
Successfully initialized timeline 7dd0907914ac399ff3be45fb252bfdb7
Stopping pageserver gracefully...done!
# start pageserver and safekeeper
> ./target/debug/neon_local start
Starting etcd broker using /usr/bin/etcd
Starting pageserver at '127.0.0.1:64000' in '.neon'
Pageserver started
initializing for sk 1 for 7676
Starting safekeeper at '127.0.0.1:5454' in '.neon/safekeepers/sk1'
Safekeeper started
@@ -220,7 +222,12 @@ Ensure your dependencies are installed as described [here](https://github.com/ne
```sh
git clone --recursive https://github.com/neondatabase/neon.git
make # builds also postgres and installs it to ./pg_install
# either:
CARGO_BUILD_FLAGS="--features=testing" make
# or:
make debug
./scripts/pytest
```

View File

@@ -10,12 +10,12 @@ clap = "3.0"
env_logger = "0.9"
hyper = { version = "0.14", features = ["full"] }
log = { version = "0.4", features = ["std", "serde"] }
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
regex = "1"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
tar = "0.4"
tokio = { version = "1.17", features = ["macros", "rt", "rt-multi-thread"] }
tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
url = "2.2.2"
workspace_hack = { version = "0.1", path = "../workspace_hack" }

View File

@@ -8,7 +8,7 @@ clap = "3.0"
comfy-table = "5.0.1"
git-version = "0.3.5"
tar = "0.4.38"
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
serde = { version = "1.0", features = ["derive"] }
serde_with = "1.12.0"
toml = "0.5"

View File

@@ -1,4 +1,4 @@
# Minimal zenith environment with one safekeeper. This is equivalent to the built-in
# Minimal neon environment with one safekeeper. This is equivalent to the built-in
# defaults that you get with no --config
[pageserver]
listen_pg_addr = '127.0.0.1:64000'

View File

@@ -27,10 +27,10 @@ use std::process::exit;
use std::str::FromStr;
use utils::{
auth::{Claims, Scope},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
project_git_version,
zid::{NodeId, ZTenantId, ZTenantTimelineId, ZTimelineId},
};
// Default id of a safekeeper node, if not specified on the command line.
@@ -39,6 +39,8 @@ const DEFAULT_PAGESERVER_ID: NodeId = NodeId(1);
const DEFAULT_BRANCH_NAME: &str = "main";
project_git_version!(GIT_VERSION);
const DEFAULT_PG_VERSION: &str = "14";
fn default_conf(etcd_binary_path: &Path) -> String {
format!(
r#"
@@ -72,7 +74,7 @@ struct TimelineTreeEl {
/// Name, recovered from neon config mappings
pub name: Option<String>,
/// Holds all direct children of this timeline referenced using `timeline_id`.
pub children: BTreeSet<ZTimelineId>,
pub children: BTreeSet<TimelineId>,
}
// Main entry point for the 'neon_local' CLI utility
@@ -105,6 +107,13 @@ fn main() -> Result<()> {
.takes_value(true)
.required(false);
let pg_version_arg = Arg::new("pg-version")
.long("pg-version")
.help("Postgres version to use for the initial tenant")
.required(false)
.takes_value(true)
.default_value(DEFAULT_PG_VERSION);
let port_arg = Arg::new("port")
.long("port")
.required(false)
@@ -146,6 +155,7 @@ fn main() -> Result<()> {
.required(false)
.value_name("config"),
)
.arg(pg_version_arg.clone())
)
.subcommand(
App::new("timeline")
@@ -164,7 +174,9 @@ fn main() -> Result<()> {
.subcommand(App::new("create")
.about("Create a new blank timeline")
.arg(tenant_id_arg.clone())
.arg(branch_name_arg.clone()))
.arg(branch_name_arg.clone())
.arg(pg_version_arg.clone())
)
.subcommand(App::new("import")
.about("Import timeline from basebackup directory")
.arg(tenant_id_arg.clone())
@@ -178,7 +190,9 @@ fn main() -> Result<()> {
.arg(Arg::new("wal-tarfile").long("wal-tarfile").takes_value(true)
.help("Wal to add after base"))
.arg(Arg::new("end-lsn").long("end-lsn").takes_value(true)
.help("Lsn the basebackup ends at")))
.help("Lsn the basebackup ends at"))
.arg(pg_version_arg.clone())
)
).subcommand(
App::new("tenant")
.setting(AppSettings::ArgRequiredElseHelp)
@@ -188,6 +202,7 @@ fn main() -> Result<()> {
.arg(tenant_id_arg.clone())
.arg(timeline_id_arg.clone().help("Use a specific timeline id when creating a tenant and its initial timeline"))
.arg(Arg::new("config").short('c').takes_value(true).multiple_occurrences(true).required(false))
.arg(pg_version_arg.clone())
)
.subcommand(App::new("config")
.arg(tenant_id_arg.clone())
@@ -239,8 +254,9 @@ fn main() -> Result<()> {
Arg::new("config-only")
.help("Don't do basebackup, create compute node with only config files")
.long("config-only")
.required(false)
))
.required(false))
.arg(pg_version_arg.clone())
)
.subcommand(App::new("start")
.about("Start a postgres compute node.\n This command actually creates new node from scratch, but preserves existing config files")
.arg(pg_node_arg.clone())
@@ -248,7 +264,9 @@ fn main() -> Result<()> {
.arg(branch_name_arg.clone())
.arg(timeline_id_arg.clone())
.arg(lsn_arg.clone())
.arg(port_arg.clone()))
.arg(port_arg.clone())
.arg(pg_version_arg.clone())
)
.subcommand(
App::new("stop")
.arg(pg_node_arg.clone())
@@ -321,7 +339,7 @@ fn main() -> Result<()> {
///
fn print_timelines_tree(
timelines: Vec<TimelineInfo>,
mut timeline_name_mappings: HashMap<ZTenantTimelineId, String>,
mut timeline_name_mappings: HashMap<TenantTimelineId, String>,
) -> Result<()> {
let mut timelines_hash = timelines
.iter()
@@ -332,7 +350,7 @@ fn print_timelines_tree(
info: t.clone(),
children: BTreeSet::new(),
name: timeline_name_mappings
.remove(&ZTenantTimelineId::new(t.tenant_id, t.timeline_id)),
.remove(&TenantTimelineId::new(t.tenant_id, t.timeline_id)),
},
)
})
@@ -374,7 +392,7 @@ fn print_timeline(
nesting_level: usize,
is_last: &[bool],
timeline: &TimelineTreeEl,
timelines: &HashMap<ZTimelineId, TimelineTreeEl>,
timelines: &HashMap<TimelineId, TimelineTreeEl>,
) -> Result<()> {
let local_remote = match (timeline.info.local.as_ref(), timeline.info.remote.as_ref()) {
(None, None) => unreachable!("in this case no info for a timeline is found"),
@@ -452,8 +470,8 @@ fn print_timeline(
/// Connects to the pageserver to query this information.
fn get_timeline_infos(
env: &local_env::LocalEnv,
tenant_id: &ZTenantId,
) -> Result<HashMap<ZTimelineId, TimelineInfo>> {
tenant_id: &TenantId,
) -> Result<HashMap<TimelineId, TimelineInfo>> {
Ok(PageServerNode::from_env(env)
.timeline_list(tenant_id)?
.into_iter()
@@ -462,7 +480,7 @@ fn get_timeline_infos(
}
// Helper function to parse --tenant_id option, or get the default from config file
fn get_tenant_id(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> anyhow::Result<ZTenantId> {
fn get_tenant_id(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> anyhow::Result<TenantId> {
if let Some(tenant_id_from_arguments) = parse_tenant_id(sub_match).transpose() {
tenant_id_from_arguments
} else if let Some(default_id) = env.default_tenant_id {
@@ -472,18 +490,18 @@ fn get_tenant_id(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> anyhow::R
}
}
fn parse_tenant_id(sub_match: &ArgMatches) -> anyhow::Result<Option<ZTenantId>> {
fn parse_tenant_id(sub_match: &ArgMatches) -> anyhow::Result<Option<TenantId>> {
sub_match
.value_of("tenant-id")
.map(ZTenantId::from_str)
.map(TenantId::from_str)
.transpose()
.context("Failed to parse tenant id from the argument string")
}
fn parse_timeline_id(sub_match: &ArgMatches) -> anyhow::Result<Option<ZTimelineId>> {
fn parse_timeline_id(sub_match: &ArgMatches) -> anyhow::Result<Option<TimelineId>> {
sub_match
.value_of("timeline-id")
.map(ZTimelineId::from_str)
.map(TimelineId::from_str)
.transpose()
.context("Failed to parse timeline id from the argument string")
}
@@ -501,12 +519,19 @@ fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
default_conf(&EtcdBroker::locate_etcd()?)
};
let pg_version = init_match
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
let mut env =
LocalEnv::parse_config(&toml_file).context("Failed to create neon configuration")?;
env.init().context("Failed to initialize neon repository")?;
// default_tenantid was generated by the `env.init()` call above
let initial_tenant_id = env.default_tenant_id.unwrap();
env.init(pg_version)
.context("Failed to initialize neon repository")?;
let initial_tenant_id = env
.default_tenant_id
.expect("default_tenant_id should be generated by the `env.init()` call above");
// Initialize pageserver, create initial tenant and timeline.
let pageserver = PageServerNode::from_env(&env);
@@ -515,6 +540,7 @@ fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
Some(initial_tenant_id),
initial_timeline_id_arg,
&pageserver_config_overrides(init_match),
pg_version,
)
.unwrap_or_else(|e| {
eprintln!("pageserver init failed: {e}");
@@ -543,13 +569,7 @@ fn handle_tenant(tenant_match: &ArgMatches, env: &mut local_env::LocalEnv) -> an
match tenant_match.subcommand() {
Some(("list", _)) => {
for t in pageserver.tenant_list()? {
println!(
"{} {}",
t.id,
t.state
.map(|s| s.to_string())
.unwrap_or_else(|| String::from(""))
);
println!("{} {:?}", t.id, t.state);
}
}
Some(("create", create_match)) => {
@@ -563,8 +583,19 @@ fn handle_tenant(tenant_match: &ArgMatches, env: &mut local_env::LocalEnv) -> an
// Create an initial timeline for the new tenant
let new_timeline_id = parse_timeline_id(create_match)?;
let timeline_info =
pageserver.timeline_create(new_tenant_id, new_timeline_id, None, None)?;
let pg_version = create_match
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
let timeline_info = pageserver.timeline_create(
new_tenant_id,
new_timeline_id,
None,
None,
Some(pg_version),
)?;
let new_timeline_id = timeline_info.timeline_id;
let last_record_lsn = timeline_info
.local
@@ -613,7 +644,15 @@ fn handle_timeline(timeline_match: &ArgMatches, env: &mut local_env::LocalEnv) -
let new_branch_name = create_match
.value_of("branch-name")
.ok_or_else(|| anyhow!("No branch name provided"))?;
let timeline_info = pageserver.timeline_create(tenant_id, None, None, None)?;
let pg_version = create_match
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
let timeline_info =
pageserver.timeline_create(tenant_id, None, None, None, Some(pg_version))?;
let new_timeline_id = timeline_info.timeline_id;
let last_record_lsn = timeline_info
@@ -656,12 +695,19 @@ fn handle_timeline(timeline_match: &ArgMatches, env: &mut local_env::LocalEnv) -
// TODO validate both or none are provided
let pg_wal = end_lsn.zip(wal_tarfile);
let pg_version = import_match
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
let mut cplane = ComputeControlPlane::load(env.clone())?;
println!("Importing timeline into pageserver ...");
pageserver.timeline_import(tenant_id, timeline_id, base, pg_wal)?;
pageserver.timeline_import(tenant_id, timeline_id, base, pg_wal, pg_version)?;
println!("Creating node for imported timeline ...");
env.register_branch_mapping(name.to_string(), tenant_id, timeline_id)?;
cplane.new_node(tenant_id, name, timeline_id, None, None)?;
cplane.new_node(tenant_id, name, timeline_id, None, None, pg_version)?;
println!("Done");
}
Some(("branch", branch_match)) => {
@@ -688,6 +734,7 @@ fn handle_timeline(timeline_match: &ArgMatches, env: &mut local_env::LocalEnv) -
None,
start_lsn,
Some(ancestor_timeline_id),
None,
)?;
let new_timeline_id = timeline_info.timeline_id;
@@ -765,7 +812,7 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
};
let branch_name = timeline_name_mappings
.get(&ZTenantTimelineId::new(tenant_id, node.timeline_id))
.get(&TenantTimelineId::new(tenant_id, node.timeline_id))
.map(|name| name.as_str())
.unwrap_or("?");
@@ -803,7 +850,14 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
Some(p) => Some(p.parse()?),
None => None,
};
cplane.new_node(tenant_id, &node_name, timeline_id, lsn, port)?;
let pg_version = sub_args
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
cplane.new_node(tenant_id, &node_name, timeline_id, lsn, port, pg_version)?;
}
"start" => {
let port: Option<u16> = match sub_args.value_of("port") {
@@ -816,7 +870,7 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
let node = cplane.nodes.get(&(tenant_id, node_name.to_owned()));
let auth_token = if matches!(env.pageserver.auth_type, AuthType::ZenithJWT) {
let auth_token = if matches!(env.pageserver.auth_type, AuthType::NeonJWT) {
let claims = Claims::new(Some(tenant_id), Scope::Tenant);
Some(env.generate_auth_token(&claims)?)
@@ -841,16 +895,23 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
.map(Lsn::from_str)
.transpose()
.context("Failed to parse Lsn from the request")?;
let pg_version = sub_args
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?;
// when used with custom port this results in non obvious behaviour
// port is remembered from first start command, i e
// start --port X
// stop
// start <-- will also use port X even without explicit port argument
println!(
"Starting new postgres {} on timeline {} ...",
node_name, timeline_id
"Starting new postgres (v{}) {} on timeline {} ...",
pg_version, node_name, timeline_id
);
let node = cplane.new_node(tenant_id, node_name, timeline_id, lsn, port)?;
let node =
cplane.new_node(tenant_id, node_name, timeline_id, lsn, port, pg_version)?;
node.start(&auth_token)?;
}
}

View File

@@ -13,12 +13,12 @@ use std::time::Duration;
use anyhow::{Context, Result};
use utils::{
connstring::connection_host_port,
id::{TenantId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
zid::{ZTenantId, ZTimelineId},
};
use crate::local_env::LocalEnv;
use crate::local_env::{LocalEnv, DEFAULT_PG_VERSION};
use crate::postgresql_conf::PostgresConf;
use crate::storage::PageServerNode;
@@ -28,7 +28,7 @@ use crate::storage::PageServerNode;
pub struct ComputeControlPlane {
base_port: u16,
pageserver: Arc<PageServerNode>,
pub nodes: BTreeMap<(ZTenantId, String), Arc<PostgresNode>>,
pub nodes: BTreeMap<(TenantId, String), Arc<PostgresNode>>,
env: LocalEnv,
}
@@ -76,11 +76,12 @@ impl ComputeControlPlane {
pub fn new_node(
&mut self,
tenant_id: ZTenantId,
tenant_id: TenantId,
name: &str,
timeline_id: ZTimelineId,
timeline_id: TimelineId,
lsn: Option<Lsn>,
port: Option<u16>,
pg_version: u32,
) -> Result<Arc<PostgresNode>> {
let port = port.unwrap_or_else(|| self.get_port());
let node = Arc::new(PostgresNode {
@@ -93,6 +94,7 @@ impl ComputeControlPlane {
lsn,
tenant_id,
uses_wal_proposer: false,
pg_version,
});
node.create_pgdata()?;
@@ -114,10 +116,11 @@ pub struct PostgresNode {
pub env: LocalEnv,
pageserver: Arc<PageServerNode>,
is_test: bool,
pub timeline_id: ZTimelineId,
pub timeline_id: TimelineId,
pub lsn: Option<Lsn>, // if it's a read-only node. None for primary
pub tenant_id: ZTenantId,
pub tenant_id: TenantId,
uses_wal_proposer: bool,
pg_version: u32,
}
impl PostgresNode {
@@ -148,10 +151,18 @@ impl PostgresNode {
// Read a few options from the config file
let context = format!("in config file {}", cfg_path_str);
let port: u16 = conf.parse_field("port", &context)?;
let timeline_id: ZTimelineId = conf.parse_field("neon.timeline_id", &context)?;
let tenant_id: ZTenantId = conf.parse_field("neon.tenant_id", &context)?;
let timeline_id: TimelineId = conf.parse_field("neon.timeline_id", &context)?;
let tenant_id: TenantId = conf.parse_field("neon.tenant_id", &context)?;
let uses_wal_proposer = conf.get("neon.safekeepers").is_some();
// Read postgres version from PG_VERSION file to determine which postgres version binary to use.
// If it doesn't exist, assume broken data directory and use default pg version.
let pg_version_path = entry.path().join("PG_VERSION");
let pg_version_str =
fs::read_to_string(pg_version_path).unwrap_or_else(|_| DEFAULT_PG_VERSION.to_string());
let pg_version = u32::from_str(&pg_version_str)?;
// parse recovery_target_lsn, if any
let recovery_target_lsn: Option<Lsn> =
conf.parse_field_optional("recovery_target_lsn", &context)?;
@@ -167,17 +178,24 @@ impl PostgresNode {
lsn: recovery_target_lsn,
tenant_id,
uses_wal_proposer,
pg_version,
})
}
fn sync_safekeepers(&self, auth_token: &Option<String>) -> Result<Lsn> {
let pg_path = self.env.pg_bin_dir().join("postgres");
fn sync_safekeepers(&self, auth_token: &Option<String>, pg_version: u32) -> Result<Lsn> {
let pg_path = self.env.pg_bin_dir(pg_version).join("postgres");
let mut cmd = Command::new(&pg_path);
cmd.arg("--sync-safekeepers")
.env_clear()
.env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
.env("DYLD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
.env(
"LD_LIBRARY_PATH",
self.env.pg_lib_dir(pg_version).to_str().unwrap(),
)
.env(
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(pg_version).to_str().unwrap(),
)
.env("PGDATA", self.pgdata().to_str().unwrap())
.stdout(Stdio::piped())
// Comment this to avoid capturing stderr (useful if command hangs)
@@ -259,8 +277,8 @@ impl PostgresNode {
})
}
// Connect to a page server, get base backup, and untar it to initialize a
// new data directory
// Write postgresql.conf with default configuration
// and PG_VERSION file to the data directory of a new node.
fn setup_pg_conf(&self, auth_type: AuthType) -> Result<()> {
let mut conf = PostgresConf::new();
conf.append("max_wal_senders", "10");
@@ -292,7 +310,7 @@ impl PostgresNode {
// variable during compute pg startup. It is done this way because
// otherwise user will be able to retrieve the value using SHOW
// command or pg_settings
let password = if let AuthType::ZenithJWT = auth_type {
let password = if let AuthType::NeonJWT = auth_type {
"$ZENITH_AUTH_TOKEN"
} else {
""
@@ -301,7 +319,7 @@ impl PostgresNode {
// Also note that not all parameters are supported here. Because in compute we substitute $ZENITH_AUTH_TOKEN
// We parse this string and build it back with token from env var, and for simplicity rebuild
// uses only needed variables namely host, port, user, password.
format!("postgresql://no_user:{}@{}:{}", password, host, port)
format!("postgresql://no_user:{password}@{host}:{port}")
};
conf.append("shared_preload_libraries", "neon");
conf.append_line("");
@@ -357,6 +375,9 @@ impl PostgresNode {
let mut file = File::create(self.pgdata().join("postgresql.conf"))?;
file.write_all(conf.to_string().as_bytes())?;
let mut file = File::create(self.pgdata().join("PG_VERSION"))?;
file.write_all(self.pg_version.to_string().as_bytes())?;
Ok(())
}
@@ -368,7 +389,7 @@ impl PostgresNode {
// latest data from the pageserver. That is a bit clumsy but whole bootstrap
// procedure evolves quite actively right now, so let's think about it again
// when things would be more stable (TODO).
let lsn = self.sync_safekeepers(auth_token)?;
let lsn = self.sync_safekeepers(auth_token, self.pg_version)?;
if lsn == Lsn(0) {
None
} else {
@@ -401,7 +422,7 @@ impl PostgresNode {
}
fn pg_ctl(&self, args: &[&str], auth_token: &Option<String>) -> Result<()> {
let pg_ctl_path = self.env.pg_bin_dir().join("pg_ctl");
let pg_ctl_path = self.env.pg_bin_dir(self.pg_version).join("pg_ctl");
let mut cmd = Command::new(pg_ctl_path);
cmd.args(
[
@@ -417,8 +438,14 @@ impl PostgresNode {
.concat(),
)
.env_clear()
.env("LD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap())
.env("DYLD_LIBRARY_PATH", self.env.pg_lib_dir().to_str().unwrap());
.env(
"LD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version).to_str().unwrap(),
)
.env(
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version).to_str().unwrap(),
);
if let Some(token) = auth_token {
cmd.env("ZENITH_AUTH_TOKEN", token);
}

View File

@@ -14,12 +14,14 @@ use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use utils::{
auth::{encode_from_key_file, Claims, Scope},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
postgres_backend::AuthType,
zid::{NodeId, ZTenantId, ZTenantTimelineId, ZTimelineId},
};
use crate::safekeeper::SafekeeperNode;
pub const DEFAULT_PG_VERSION: u32 = 14;
//
// This data structures represents neon_local CLI config
//
@@ -48,13 +50,13 @@ pub struct LocalEnv {
// Path to pageserver binary.
#[serde(default)]
pub zenith_distrib_dir: PathBuf,
pub neon_distrib_dir: PathBuf,
// Default tenant ID to use with the 'zenith' command line utility, when
// --tenantid is not explicitly specified.
// Default tenant ID to use with the 'neon_local' command line utility, when
// --tenant_id is not explicitly specified.
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub default_tenant_id: Option<ZTenantId>,
pub default_tenant_id: Option<TenantId>,
// used to issue tokens during e.g pg start
#[serde(default)]
@@ -69,11 +71,11 @@ pub struct LocalEnv {
/// Keep human-readable aliases in memory (and persist them to config), to hide ZId hex strings from the user.
#[serde(default)]
// A `HashMap<String, HashMap<ZTenantId, ZTimelineId>>` would be more appropriate here,
// A `HashMap<String, HashMap<TenantId, TimelineId>>` would be more appropriate here,
// but deserialization into a generic toml object as `toml::Value::try_from` fails with an error.
// https://toml.io/en/v1.0.0 does not contain a concept of "a table inside another table".
#[serde_as(as = "HashMap<_, Vec<(DisplayFromStr, DisplayFromStr)>>")]
branch_name_mappings: HashMap<String, Vec<(ZTenantId, ZTimelineId)>>,
branch_name_mappings: HashMap<String, Vec<(TenantId, TimelineId)>>,
}
/// Etcd broker config for cluster internal communication.
@@ -195,29 +197,50 @@ impl Default for SafekeeperConf {
}
impl LocalEnv {
// postgres installation paths
pub fn pg_bin_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("bin")
pub fn pg_distrib_dir_raw(&self) -> PathBuf {
self.pg_distrib_dir.clone()
}
pub fn pg_lib_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("lib")
pub fn pg_distrib_dir(&self, pg_version: u32) -> PathBuf {
let path = self.pg_distrib_dir.clone();
match pg_version {
14 => path.join(format!("v{pg_version}")),
15 => path.join(format!("v{pg_version}")),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
pub fn pg_bin_dir(&self, pg_version: u32) -> PathBuf {
match pg_version {
14 => self.pg_distrib_dir(pg_version).join("bin"),
15 => self.pg_distrib_dir(pg_version).join("bin"),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
pub fn pg_lib_dir(&self, pg_version: u32) -> PathBuf {
match pg_version {
14 => self.pg_distrib_dir(pg_version).join("lib"),
15 => self.pg_distrib_dir(pg_version).join("lib"),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
pub fn pageserver_bin(&self) -> anyhow::Result<PathBuf> {
Ok(self.zenith_distrib_dir.join("pageserver"))
Ok(self.neon_distrib_dir.join("pageserver"))
}
pub fn safekeeper_bin(&self) -> anyhow::Result<PathBuf> {
Ok(self.zenith_distrib_dir.join("safekeeper"))
Ok(self.neon_distrib_dir.join("safekeeper"))
}
pub fn pg_data_dirs_path(&self) -> PathBuf {
self.base_data_dir.join("pgdatadirs").join("tenants")
}
pub fn pg_data_dir(&self, tenantid: &ZTenantId, branch_name: &str) -> PathBuf {
pub fn pg_data_dir(&self, tenant_id: &TenantId, branch_name: &str) -> PathBuf {
self.pg_data_dirs_path()
.join(tenantid.to_string())
.join(tenant_id.to_string())
.join(branch_name)
}
@@ -233,8 +256,8 @@ impl LocalEnv {
pub fn register_branch_mapping(
&mut self,
branch_name: String,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
) -> anyhow::Result<()> {
let existing_values = self
.branch_name_mappings
@@ -260,22 +283,22 @@ impl LocalEnv {
pub fn get_branch_timeline_id(
&self,
branch_name: &str,
tenant_id: ZTenantId,
) -> Option<ZTimelineId> {
tenant_id: TenantId,
) -> Option<TimelineId> {
self.branch_name_mappings
.get(branch_name)?
.iter()
.find(|(mapped_tenant_id, _)| mapped_tenant_id == &tenant_id)
.map(|&(_, timeline_id)| timeline_id)
.map(ZTimelineId::from)
.map(TimelineId::from)
}
pub fn timeline_name_mappings(&self) -> HashMap<ZTenantTimelineId, String> {
pub fn timeline_name_mappings(&self) -> HashMap<TenantTimelineId, String> {
self.branch_name_mappings
.iter()
.flat_map(|(name, tenant_timelines)| {
tenant_timelines.iter().map(|&(tenant_id, timeline_id)| {
(ZTenantTimelineId::new(tenant_id, timeline_id), name.clone())
(TenantTimelineId::new(tenant_id, timeline_id), name.clone())
})
})
.collect()
@@ -289,24 +312,26 @@ impl LocalEnv {
let mut env: LocalEnv = toml::from_str(toml)?;
// Find postgres binaries.
// Follow POSTGRES_DISTRIB_DIR if set, otherwise look in "pg_install/v14".
// Follow POSTGRES_DISTRIB_DIR if set, otherwise look in "pg_install".
// Note that later in the code we assume, that distrib dirs follow the same pattern
// for all postgres versions.
if env.pg_distrib_dir == Path::new("") {
if let Some(postgres_bin) = env::var_os("POSTGRES_DISTRIB_DIR") {
env.pg_distrib_dir = postgres_bin.into();
} else {
let cwd = env::current_dir()?;
env.pg_distrib_dir = cwd.join("pg_install/v14")
env.pg_distrib_dir = cwd.join("pg_install")
}
}
// Find zenith binaries.
if env.zenith_distrib_dir == Path::new("") {
env.zenith_distrib_dir = env::current_exe()?.parent().unwrap().to_owned();
// Find neon binaries.
if env.neon_distrib_dir == Path::new("") {
env.neon_distrib_dir = env::current_exe()?.parent().unwrap().to_owned();
}
// If no initial tenant ID was given, generate it.
if env.default_tenant_id.is_none() {
env.default_tenant_id = Some(ZTenantId::generate());
env.default_tenant_id = Some(TenantId::generate());
}
env.base_data_dir = base_path();
@@ -320,12 +345,12 @@ impl LocalEnv {
if !repopath.exists() {
bail!(
"Zenith config is not found in {}. You need to run 'neon_local init' first",
"Neon config is not found in {}. You need to run 'neon_local init' first",
repopath.to_str().unwrap()
);
}
// TODO: check that it looks like a zenith repository
// TODO: check that it looks like a neon repository
// load and parse file
let config = fs::read_to_string(repopath.join("config"))?;
@@ -384,7 +409,7 @@ impl LocalEnv {
//
// Initialize a new Neon repository
//
pub fn init(&mut self) -> anyhow::Result<()> {
pub fn init(&mut self, pg_version: u32) -> anyhow::Result<()> {
// check if config already exists
let base_path = &self.base_data_dir;
ensure!(
@@ -397,17 +422,17 @@ impl LocalEnv {
"directory '{}' already exists. Perhaps already initialized?",
base_path.display()
);
if !self.pg_distrib_dir.join("bin/postgres").exists() {
if !self.pg_bin_dir(pg_version).join("postgres").exists() {
bail!(
"Can't find postgres binary at {}",
self.pg_distrib_dir.display()
self.pg_bin_dir(pg_version).display()
);
}
for binary in ["pageserver", "safekeeper"] {
if !self.zenith_distrib_dir.join(binary).exists() {
if !self.neon_distrib_dir.join(binary).exists() {
bail!(
"Can't find binary '{binary}' in zenith distrib dir '{}'",
self.zenith_distrib_dir.display()
"Can't find binary '{binary}' in neon distrib dir '{}'",
self.neon_distrib_dir.display()
);
}
}

View File

@@ -2,7 +2,7 @@
/// Module for parsing postgresql.conf file.
///
/// NOTE: This doesn't implement the full, correct postgresql.conf syntax. Just
/// enough to extract a few settings we need in Zenith, assuming you don't do
/// enough to extract a few settings we need in Neon, assuming you don't do
/// funny stuff like include-directives or funny escaping.
use anyhow::{bail, Context, Result};
use once_cell::sync::Lazy;

View File

@@ -17,7 +17,7 @@ use thiserror::Error;
use utils::{
connstring::connection_address,
http::error::HttpErrorBody,
zid::{NodeId, ZTenantId, ZTimelineId},
id::{NodeId, TenantId, TimelineId},
};
use crate::local_env::{LocalEnv, SafekeeperConf};
@@ -269,7 +269,7 @@ impl SafekeeperNode {
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
// TODO: authentication
//if self.env.auth_type == AuthType::ZenithJWT {
//if self.env.auth_type == AuthType::NeonJWT {
// builder = builder.bearer_auth(&self.env.safekeeper_auth_token)
//}
self.http_client.request(method, url)
@@ -284,8 +284,8 @@ impl SafekeeperNode {
pub fn timeline_create(
&self,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
peer_ids: Vec<NodeId>,
) -> Result<()> {
Ok(self

View File

@@ -21,9 +21,9 @@ use thiserror::Error;
use utils::{
connstring::connection_address,
http::error::HttpErrorBody,
id::{TenantId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
zid::{ZTenantId, ZTimelineId},
};
use crate::local_env::LocalEnv;
@@ -83,7 +83,7 @@ pub struct PageServerNode {
impl PageServerNode {
pub fn from_env(env: &LocalEnv) -> PageServerNode {
let password = if env.pageserver.auth_type == AuthType::ZenithJWT {
let password = if env.pageserver.auth_type == AuthType::NeonJWT {
&env.pageserver.auth_token
} else {
""
@@ -109,14 +109,18 @@ impl PageServerNode {
pub fn initialize(
&self,
create_tenant: Option<ZTenantId>,
initial_timeline_id: Option<ZTimelineId>,
create_tenant: Option<TenantId>,
initial_timeline_id: Option<TimelineId>,
config_overrides: &[&str],
) -> anyhow::Result<ZTimelineId> {
pg_version: u32,
) -> anyhow::Result<TimelineId> {
let id = format!("id={}", self.env.pageserver.id);
// FIXME: the paths should be shell-escaped to handle paths with spaces, quotas etc.
let pg_distrib_dir_param =
format!("pg_distrib_dir='{}'", self.env.pg_distrib_dir.display());
let pg_distrib_dir_param = format!(
"pg_distrib_dir='{}'",
self.env.pg_distrib_dir_raw().display()
);
let authg_type_param = format!("auth_type='{}'", self.env.pageserver.auth_type);
let listen_http_addr_param = format!(
"listen_http_addr='{}'",
@@ -159,7 +163,7 @@ impl PageServerNode {
self.start_node(&init_config_overrides, &self.env.base_data_dir, true)?;
let init_result = self
.try_init_timeline(create_tenant, initial_timeline_id)
.try_init_timeline(create_tenant, initial_timeline_id, pg_version)
.context("Failed to create initial tenant and timeline for pageserver");
match &init_result {
Ok(initial_timeline_id) => {
@@ -173,12 +177,18 @@ impl PageServerNode {
fn try_init_timeline(
&self,
new_tenant_id: Option<ZTenantId>,
new_timeline_id: Option<ZTimelineId>,
) -> anyhow::Result<ZTimelineId> {
new_tenant_id: Option<TenantId>,
new_timeline_id: Option<TimelineId>,
pg_version: u32,
) -> anyhow::Result<TimelineId> {
let initial_tenant_id = self.tenant_create(new_tenant_id, HashMap::new())?;
let initial_timeline_info =
self.timeline_create(initial_tenant_id, new_timeline_id, None, None)?;
let initial_timeline_info = self.timeline_create(
initial_tenant_id,
new_timeline_id,
None,
None,
Some(pg_version),
)?;
Ok(initial_timeline_info.timeline_id)
}
@@ -345,7 +355,7 @@ impl PageServerNode {
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
let mut builder = self.http_client.request(method, url);
if self.env.pageserver.auth_type == AuthType::ZenithJWT {
if self.env.pageserver.auth_type == AuthType::NeonJWT {
builder = builder.bearer_auth(&self.env.pageserver.auth_token)
}
builder
@@ -368,46 +378,53 @@ impl PageServerNode {
pub fn tenant_create(
&self,
new_tenant_id: Option<ZTenantId>,
new_tenant_id: Option<TenantId>,
settings: HashMap<&str, &str>,
) -> anyhow::Result<ZTenantId> {
) -> anyhow::Result<TenantId> {
let mut settings = settings.clone();
let request = TenantCreateRequest {
new_tenant_id,
checkpoint_distance: settings
.remove("checkpoint_distance")
.map(|x| x.parse::<u64>())
.transpose()?,
checkpoint_timeout: settings.remove("checkpoint_timeout").map(|x| x.to_string()),
compaction_target_size: settings
.remove("compaction_target_size")
.map(|x| x.parse::<u64>())
.transpose()?,
compaction_period: settings.remove("compaction_period").map(|x| x.to_string()),
compaction_threshold: settings
.remove("compaction_threshold")
.map(|x| x.parse::<usize>())
.transpose()?,
gc_horizon: settings
.remove("gc_horizon")
.map(|x| x.parse::<u64>())
.transpose()?,
gc_period: settings.remove("gc_period").map(|x| x.to_string()),
image_creation_threshold: settings
.remove("image_creation_threshold")
.map(|x| x.parse::<usize>())
.transpose()?,
pitr_interval: settings.remove("pitr_interval").map(|x| x.to_string()),
walreceiver_connect_timeout: settings
.remove("walreceiver_connect_timeout")
.map(|x| x.to_string()),
lagging_wal_timeout: settings
.remove("lagging_wal_timeout")
.map(|x| x.to_string()),
max_lsn_wal_lag: settings
.remove("max_lsn_wal_lag")
.map(|x| x.parse::<NonZeroU64>())
.transpose()
.context("Failed to parse 'max_lsn_wal_lag' as non zero integer")?,
};
if !settings.is_empty() {
bail!("Unrecognized tenant settings: {settings:?}")
}
self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))
.json(&TenantCreateRequest {
new_tenant_id,
checkpoint_distance: settings
.get("checkpoint_distance")
.map(|x| x.parse::<u64>())
.transpose()?,
checkpoint_timeout: settings.get("checkpoint_timeout").map(|x| x.to_string()),
compaction_target_size: settings
.get("compaction_target_size")
.map(|x| x.parse::<u64>())
.transpose()?,
compaction_period: settings.get("compaction_period").map(|x| x.to_string()),
compaction_threshold: settings
.get("compaction_threshold")
.map(|x| x.parse::<usize>())
.transpose()?,
gc_horizon: settings
.get("gc_horizon")
.map(|x| x.parse::<u64>())
.transpose()?,
gc_period: settings.get("gc_period").map(|x| x.to_string()),
image_creation_threshold: settings
.get("image_creation_threshold")
.map(|x| x.parse::<usize>())
.transpose()?,
pitr_interval: settings.get("pitr_interval").map(|x| x.to_string()),
walreceiver_connect_timeout: settings
.get("walreceiver_connect_timeout")
.map(|x| x.to_string()),
lagging_wal_timeout: settings.get("lagging_wal_timeout").map(|x| x.to_string()),
max_lsn_wal_lag: settings
.get("max_lsn_wal_lag")
.map(|x| x.parse::<NonZeroU64>())
.transpose()
.context("Failed to parse 'max_lsn_wal_lag' as non zero integer")?,
})
.json(&request)
.send()?
.error_from_body()?
.json::<Option<String>>()
@@ -422,7 +439,7 @@ impl PageServerNode {
})
}
pub fn tenant_config(&self, tenant_id: ZTenantId, settings: HashMap<&str, &str>) -> Result<()> {
pub fn tenant_config(&self, tenant_id: TenantId, settings: HashMap<&str, &str>) -> Result<()> {
self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))
.json(&TenantConfigRequest {
tenant_id,
@@ -471,7 +488,7 @@ impl PageServerNode {
Ok(())
}
pub fn timeline_list(&self, tenant_id: &ZTenantId) -> anyhow::Result<Vec<TimelineInfo>> {
pub fn timeline_list(&self, tenant_id: &TenantId) -> anyhow::Result<Vec<TimelineInfo>> {
let timeline_infos: Vec<TimelineInfo> = self
.http_request(
Method::GET,
@@ -486,10 +503,11 @@ impl PageServerNode {
pub fn timeline_create(
&self,
tenant_id: ZTenantId,
new_timeline_id: Option<ZTimelineId>,
tenant_id: TenantId,
new_timeline_id: Option<TimelineId>,
ancestor_start_lsn: Option<Lsn>,
ancestor_timeline_id: Option<ZTimelineId>,
ancestor_timeline_id: Option<TimelineId>,
pg_version: Option<u32>,
) -> anyhow::Result<TimelineInfo> {
self.http_request(
Method::POST,
@@ -499,6 +517,7 @@ impl PageServerNode {
new_timeline_id,
ancestor_start_lsn,
ancestor_timeline_id,
pg_version,
})
.send()?
.error_from_body()?
@@ -524,10 +543,11 @@ impl PageServerNode {
/// * `pg_wal` - if there's any wal to import: (end lsn, path to `pg_wal.tar`)
pub fn timeline_import(
&self,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
base: (Lsn, PathBuf),
pg_wal: Option<(Lsn, PathBuf)>,
pg_version: u32,
) -> anyhow::Result<()> {
let mut client = self.pg_connection_config.connect(NoTls).unwrap();
@@ -546,8 +566,9 @@ impl PageServerNode {
};
// Import base
let import_cmd =
format!("import basebackup {tenant_id} {timeline_id} {start_lsn} {end_lsn}");
let import_cmd = format!(
"import basebackup {tenant_id} {timeline_id} {start_lsn} {end_lsn} {pg_version}"
);
let mut writer = client.copy_in(&import_cmd)?;
io::copy(&mut base_reader, &mut writer)?;
writer.finish()?;

View File

@@ -79,4 +79,5 @@
- [014-storage-lsm](rfcs/014-storage-lsm.md)
- [015-storage-messaging](rfcs/015-storage-messaging.md)
- [016-connection-routing](rfcs/016-connection-routing.md)
- [017-timeline-data-management](rfcs/017-timeline-data-management.md)
- [cluster-size-limits](rfcs/cluster-size-limits.md)

View File

@@ -2,14 +2,14 @@
### Overview
Current state of authentication includes usage of JWT tokens in communication between compute and pageserver and between CLI and pageserver. JWT token is signed using RSA keys. CLI generates a key pair during call to `zenith init`. Using following openssl commands:
Current state of authentication includes usage of JWT tokens in communication between compute and pageserver and between CLI and pageserver. JWT token is signed using RSA keys. CLI generates a key pair during call to `neon_local init`. Using following openssl commands:
```bash
openssl genrsa -out private_key.pem 2048
openssl rsa -in private_key.pem -pubout -outform PEM -out public_key.pem
```
CLI also generates signed token and saves it in the config for later access to pageserver. Now authentication is optional. Pageserver has two variables in config: `auth_validation_public_key_path` and `auth_type`, so when auth type present and set to `ZenithJWT` pageserver will require authentication for connections. Actual JWT is passed in password field of connection string. There is a caveat for psql, it silently truncates passwords to 100 symbols, so to correctly pass JWT via psql you have to either use PGPASSWORD environment variable, or store password in psql config file.
CLI also generates signed token and saves it in the config for later access to pageserver. Now authentication is optional. Pageserver has two variables in config: `auth_validation_public_key_path` and `auth_type`, so when auth type present and set to `NeonJWT` pageserver will require authentication for connections. Actual JWT is passed in password field of connection string. There is a caveat for psql, it silently truncates passwords to 100 symbols, so to correctly pass JWT via psql you have to either use PGPASSWORD environment variable, or store password in psql config file.
Currently there is no authentication between compute and safekeepers, because this communication layer is under heavy refactoring. After this refactoring support for authentication will be added there too. Now safekeeper supports "hardcoded" token passed via environment variable to be able to use callmemaybe command in pageserver.

View File

@@ -148,31 +148,6 @@ relcache? (I think we do cache nblocks in relcache already, check why that's not
Neon)
## Misc change in vacuumlazy.c
```
index 8aab6e324e..c684c4fbee 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1487,7 +1487,10 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
&& VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
{
- elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ /* ZENITH-XXX: all visible hint is not wal-logged
+ * FIXME: Replay visibilitymap changes in pageserver
+ */
+ elog(DEBUG1, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
vacrel->relname, blkno);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
```
Is this still needed? If that WARNING happens, it looks like potential corruption that we should
fix!
## Use buffer manager when extending VM or FSM
```

View File

@@ -2,26 +2,26 @@
### Overview
Zenith supports multitenancy. One pageserver can serve multiple tenants at once. Tenants can be managed via zenith CLI. During page server setup tenant can be created using ```zenith init --create-tenant``` Also tenants can be added into the system on the fly without pageserver restart. This can be done using the following cli command: ```zenith tenant create``` Tenants use random identifiers which can be represented as a 32 symbols hexadecimal string. So zenith tenant create accepts desired tenant id as an optional argument. The concept of timelines/branches is working independently per tenant.
Neon supports multitenancy. One pageserver can serve multiple tenants at once. Tenants can be managed via neon_local CLI. During page server setup tenant can be created using ```neon_local init --create-tenant``` Also tenants can be added into the system on the fly without pageserver restart. This can be done using the following cli command: ```neon_local tenant create``` Tenants use random identifiers which can be represented as a 32 symbols hexadecimal string. So neon_local tenant create accepts desired tenant id as an optional argument. The concept of timelines/branches is working independently per tenant.
### Tenants in other commands
By default during `zenith init` new tenant is created on the pageserver. Newly created tenant's id is saved to cli config, so other commands can use it automatically if no direct argument `--tenantid=<tenantid>` is provided. So generally tenantid more frequently appears in internal pageserver interface. Its commands take tenantid argument to distinguish to which tenant operation should be applied. CLI support creation of new tenants.
By default during `neon_local init` new tenant is created on the pageserver. Newly created tenant's id is saved to cli config, so other commands can use it automatically if no direct argument `--tenant_id=<tenant_id>` is provided. So generally tenant_id more frequently appears in internal pageserver interface. Its commands take tenant_id argument to distinguish to which tenant operation should be applied. CLI support creation of new tenants.
Examples for cli:
```sh
zenith tenant list
neon_local tenant list
zenith tenant create // generates new id
neon_local tenant create // generates new id
zenith tenant create ee6016ec31116c1b7c33dfdfca38892f
neon_local tenant create ee6016ec31116c1b7c33dfdfca38892f
zenith pg create main // default tenant from zenith init
neon_local pg create main // default tenant from neon init
zenith pg create main --tenantid=ee6016ec31116c1b7c33dfdfca38892f
neon_local pg create main --tenant_id=ee6016ec31116c1b7c33dfdfca38892f
zenith branch --tenantid=ee6016ec31116c1b7c33dfdfca38892f
neon_local branch --tenant_id=ee6016ec31116c1b7c33dfdfca38892f
```
### Data layout
@@ -56,4 +56,4 @@ Tenant id is passed to postgres via GUC the same way as the timeline. Tenant id
### Safety
For now particular tenant can only appear on a particular pageserver. Set of safekeepers are also pinned to particular (tenantid, timeline) pair so there can only be one writer for particular (tenantid, timeline).
For now particular tenant can only appear on a particular pageserver. Set of safekeepers are also pinned to particular (tenant_id, timeline_id) pair so there can only be one writer for particular (tenant_id, timeline_id).

View File

@@ -109,7 +109,7 @@ Repository
The repository stores all the page versions, or WAL records needed to
reconstruct them. Each tenant has a separate Repository, which is
stored in the .neon/tenants/<tenantid> directory.
stored in the .neon/tenants/<tenant_id> directory.
Repository is an abstract trait, defined in `repository.rs`. It is
implemented by the LayeredRepository object in

View File

@@ -123,7 +123,7 @@ The files are called "layer files". Each layer file covers a range of keys, and
a range of LSNs (or a single LSN, in case of image layers). You can think of it
as a rectangle in the two-dimensional key-LSN space. The layer files for each
timeline are stored in the timeline's subdirectory under
`.neon/tenants/<tenantid>/timelines`.
`.neon/tenants/<tenant_id>/timelines`.
There are two kind of layer files: images, and delta layers. An image file
contains a snapshot of all keys at a particular LSN, whereas a delta file
@@ -351,7 +351,7 @@ branch.
Note: It doesn't make any difference if the child branch is created
when the end of the main branch was at LSN 250, or later when the tip of
the main branch had already moved on. The latter case, creating a
branch at a historic LSN, is how we support PITR in Zenith.
branch at a historic LSN, is how we support PITR in Neon.
# Garbage collection
@@ -396,9 +396,9 @@ table:
main/orders_200_300 DELETE
main/orders_300 STILL NEEDED BY orders_300_400
main/orders_300_400 KEEP, NEWER THAN GC HORIZON
main/orders_400 ..
main/orders_400_500 ..
main/orders_500 ..
main/orders_400 ..
main/orders_400_500 ..
main/orders_500 ..
main/customers_100 DELETE
main/customers_100_200 DELETE
main/customers_200 KEEP, NO NEWER VERSION

View File

@@ -9,7 +9,7 @@ This feature allows to migrate a timeline from one pageserver to another by util
Pageserver implements two new http handlers: timeline attach and timeline detach.
Timeline migration is performed in a following way:
1. Timeline attach is called on a target pageserver. This asks pageserver to download latest checkpoint uploaded to s3.
2. For now it is necessary to manually initialize replication stream via callmemaybe call so target pageserver initializes replication from safekeeper (it is desired to avoid this and initialize replication directly in attach handler, but this requires some refactoring (probably [#997](https://github.com/zenithdb/zenith/issues/997)/[#1049](https://github.com/zenithdb/zenith/issues/1049))
2. For now it is necessary to manually initialize replication stream via callmemaybe call so target pageserver initializes replication from safekeeper (it is desired to avoid this and initialize replication directly in attach handler, but this requires some refactoring (probably [#997](https://github.com/neondatabase/neon/issues/997)/[#1049](https://github.com/neondatabase/neon/issues/1049))
3. Replication state can be tracked via timeline detail pageserver call.
4. Compute node should be restarted with new pageserver connection string. Issue with multiple compute nodes for one timeline is handled on the safekeeper consensus level. So this is not a problem here.Currently responsibility for rescheduling the compute with updated config lies on external coordinator (console).
5. Timeline is detached from old pageserver. On disk data is removed.
@@ -18,5 +18,5 @@ Timeline migration is performed in a following way:
### Implementation details
Now safekeeper needs to track which pageserver it is replicating to. This introduces complications into replication code:
* We need to distinguish different pageservers (now this is done by connection string which is imperfect and is covered here: https://github.com/zenithdb/zenith/issues/1105). Callmemaybe subscription management also needs to track that (this is already implemented).
* We need to distinguish different pageservers (now this is done by connection string which is imperfect and is covered here: https://github.com/neondatabase/neon/issues/1105). Callmemaybe subscription management also needs to track that (this is already implemented).
* We need to track which pageserver is the primary. This is needed to avoid reconnections to non primary pageservers. Because we shouldn't reconnect to them when they decide to stop their walreceiver. I e this can appear when there is a load on the compute and we are trying to detach timeline from old pageserver. In this case callmemaybe will try to reconnect to it because replication termination condition is not met (page server with active compute could never catch up to the latest lsn, so there is always some wal tail)

View File

@@ -70,7 +70,7 @@ two options.
...start sending WAL conservatively since the horizon (1.1), and truncate
obsolete part of WAL only when recovery is finished, i.e. epochStartLsn (4) is
reached, i.e. 2.3 transferred -- that's what https://github.com/zenithdb/zenith/pull/505 proposes.
reached, i.e. 2.3 transferred -- that's what https://github.com/neondatabase/neon/pull/505 proposes.
Then the following is possible:

View File

@@ -0,0 +1,413 @@
# Name
Tenant and timeline data management in pageserver
## Summary
This RFC attempts to describe timeline-related data management as it's done now in pageserver, highlight current complexities caused by this and propose a set of changes to mitigate them.
The main goal is to prepare for future [on-demand layer downloads](https://github.com/neondatabase/neon/issues/2029), yet timeline data is one of the core primitive of pageserver, so a number of other RFCs are affected either.
Due to that, this document won't have a single implementation, rather requiring a set of code changes to achieve the final state.
RFC considers the repository at the `main` branch, commit [`28243d68e60ffc7e69f158522f589f7d2e09186d`](https://github.com/neondatabase/neon/tree/28243d68e60ffc7e69f158522f589f7d2e09186d) on the time of writing.
## Motivation
In recent discussions, it became more clear that timeline-related code becomes harder to change: it consists of multiple disjoint modules, each requiring a synchronization to access.
The lower the code is, the complex the sync gets since many concurrent processes are involved and require orchestration to keep the data consistent.
As the number of modules and isolated data grows per timeline, more questions and corner cases arise:
- https://github.com/neondatabase/neon/issues/1559
right now it's not straightened out what to do when the synchronization task fails for too many times: every separate module's data has to be treated differently.
- https://github.com/neondatabase/neon/issues/1751
GC and compaction file activities are not well known outside their tasks code, causing race bugs
- https://github.com/neondatabase/neon/issues/2003
Even the tenant management gets affected: we have to alter its state based on timeline state, yet the data for making the decision is separated and the synchronisation logic has bugs
- more issues were brought in discussions, but apparently they were too specific to the code to mention them in the issues.
For instance, `tenant_mgr` itself is a static object that we can not mock anyhow, which reduces our capabilities to test the data synchronization logic.
In fact, we have zero Rust tests that cover the case of synchronizing more than one module's data.
On demand layer downloads would require us to dynamically manage the layer files, which we almost not doing at all on the module level, resulting in the most of their APIs dealing with timelines, rather than the layer files.
The disjoint data that would require data synchronization with possibly a chain of lock acquisitions, some async and some sync, and it would be hard to unit test it with the current code state.
Neither this helps to easy start the on-demand download epic, nor it's easy to add more timeline-related code on top, whatever the task is.
We have to develop a vision on a number of topics before progressing safely:
- timeline and tenant data structure and how should we access it
- sync and async worlds and in what way that should evolve
- unit tests for the complex logic
This RFC aims to provide a general overview of the existing situation and propose ways to improve it.
The changes proposed are quite big and no single PR is expected to do the adjustments, they should gradually be done during the on-demand download work later.
## What is a timeline and its data
First, we need to define what data we want to manage per timeline.
Currently, the data every timeline operates is:
- a set of layer files, on the FS
Never updated files, created after pageserver's checkpoints and compaction runs, can be removed from the local FS due to compaction, gc or timeline deletion.
- a set of layer files, on the remote storage
Identically named and placed in tenant subdirectories files on the remote storage (S3), copied by a special background sync thread
- a `metadata` file, on the FS
Updated after every checkpoint with the never `disk_consistent_lsn` and `latest_gc_cutoff_lsn` values. Used to quickly restore timeline's basic metadata on pageserver restart.
Also contains data about the ancestor, if the timeline was branched off another timeline.
- an `index_part.json` file, on the remote storage
Contains `metadata` file contents and a list of layer files, available in the current S3 "directory" for the timeline.
Used to avoid potentially slow and expensive `S3 list` command, updated by the remotes storage sync thread after every operation with the remote layer files.
- LayerMap and PageCache, in memory
Dynamic, used to store and retrieve the page data to users.
- timeline info, in memory
LSNs, walreceiver data, `RemoteTimelineIndex` and other data to share via HTTP API and internal processes.
- metrics data, in memory
Data to push or provide to Prometheus, Opentelemetry, etc.
Besides the data, every timeline currently needs an etcd connection to receive WAL events and connect to safekeepers.
Timeline could be an ancestor to another one, forming a dependency tree, which is implicit right now: every time relations are looked up in place, based on the corresponding `TimelineMetadata` struct contents.
Yet, there's knowledge on a tenant as a group of timelines, belonging to a single user which is used in GC and compaction tasks, run on every tenant.
`tenant_mgr` manages tenant creation and its task startup, along with the remote storage sync for timeline layers.
Last file being managed per-tenant is the tenant config file, created and updated on the local FS to hold tenant-specific configuration between restarts.
It's not yet anyhow synchronized with the remote storage, so only exists on the local FS.
### How the data is stored
We have multiple places where timeline data is stored:
- `tenant_mgr` [holds](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/tenant_mgr.rs#L43) a static `static ref TENANTS: RwLock<HashMap<ZTenantId, Tenant>>` with the `Tenant` having the `local_timelines: HashMap<ZTimelineId, Arc<DatadirTimelineImpl>>` inside
- same `Tenant` above has actually two references to timelines: another via its `repo: Arc<RepositoryImpl>` with `pub type RepositoryImpl = LayeredRepository;` that [holds](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/layered_repository.rs#L178) `Mutex<HashMap<ZTimelineId, LayeredTimelineEntry>>`
- `RemoteTimelineIndex` [contains](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/storage_sync/index.rs#L84) the metadata about timelines on the remote storage (S3) for sync reasons and possible HTTP API queries
- `walreceiver` [stores](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/walreceiver.rs#L60) the metadata for possible HTTP API queries and its [internal state](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/walreceiver/connection_manager.rs#L245) with a reference to the timeline, its current connections and etcd subscription (if any)
- `PageCache` contains timeline-related data, and is created globally for the whole pageserver
- implicitly, we also have files on local FS, that contain timeline state. We operate on those files and for some operations (GC, compaction) yet we don't anyhow synchronize the access to the files per se: there are more high-level locks, ensuring only one of a group of operations is running at a time.
On practice though, `LayerMap` and layer files are tightly coupled together: current low-level code requires a timeline to be loaded into the memory to work with it, and the code removes the layer files after removing the entry from the `LayerMap` first.
Based on this, a high-level pageserver's module diagram with data and entities could be:
![timeline tenant state diagram](./images/017-timeline-data-management/timeline_tenant_state.svg)
A few comments on the diagram:
- the diagram does not show all the data and replaces a few newtypes and type aliases (for example, completely ignores "unloaded" timelines due to reasons described below)
It aims to show main data and means of synchronizing it.
- modules tend to isolate their data inside and provide access to it via API
Due to multitenancy, that results in a common pattern for storing both tenant and timeline data: `RwLock` or `Mutex` around the `HashMap<Id, Data>`, gc and compaction tasks also use the same lock pattern to ensure no concurrent runs are happening.
- part of the modules is asynchronous, while the other is not, that complicates the data access
Currently, anything that's not related to tasks (walreceiver, storage sync, GC, compaction) is blocking.
Async tasks that try to access the data in the sync world, have to call `std::sync::Mutex::lock` method, which blocks the thread the callee async task runs on, also blocking other async tasks running in the same thread. Methods of `std::sync::RwLock` have the same issues, forcing async tasks either to block or spawn another, "blocking" task on a separate thread.
Sync tasks that try to access the data in the async world, cannot use `.await` hence have to have some `Runtime` doing those calls for them. [`tokio::sync::Mutex`](https://docs.rs/tokio/1.19.2/tokio/sync/struct.Mutex.html#method.blocking_lock) and [`tokio::sync::RwLock`](https://docs.rs/tokio/1.19.2/tokio/sync/struct.RwLock.html#method.blocking_read) provide an API to simplify such calls. Similarly, both `std::sync` and `tokio::sync` have channels that are able to communicate into one direction without blocking and requiring `.await` calls, hence can be used to connect both worlds without locking.
Some modules are in transition, started as async "blocking" tasks and being fully synchronous in their entire code below the start. Current idea is to transfer them to the async further, but it's not yet done.
- locks are used in two different ways:
- `RwLock<HashMap<..>>` ones to hold the shared data and ensure its atomic updates
- `Mutex<()>` for synchronizing the tasks, used to implicitly order the data access
The "shared data" locks of the first kind are mainly accessed briefly to either look up or alter the data, yet there are a few notable exceptions, such as
`latest_gc_cutoff_lsn: RwLock<Lsn>` that is explicitly held in a few places to prevent GC thread from progressing. Those are covered later in the data access diagrams.
- some synchronizations are not yet implemented
E.g. asynchronous storage sync module does not synchronize with almost synchronous GC and compaction tasks when the layer files are uploaded to the remote storage.
That occasionally results in the files being deleted before the storage upload task is run for this layer, but due to the incremental nature of the layer files, we can handle such situations without issues.
- `LayeredRepository` covers lots of responsibilities: GC and compaction task synchronisation, timeline access (`local_timelines` in `Tenant` is not used directly before the timeline from the repository is accessed), layer flushing to FS, layer sync to remote storage scheduling, etc.
### How is this data accessed?
There are multiple ways the data is accessed, from different sources:
1. [HTTP requests](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/http/routes.rs)
High-level CRUD API for managing tenants, timelines and getting data about them.
Current API list (modified for readability):
```rust
.get("/v1/status", status_handler) // pageserver status
.get("/v1/tenant", tenant_list_handler)
.post("/v1/tenant", tenant_create_handler) // can create "empty" timelines or branch off the existing ones
.get("/v1/tenant/:tenant_id", tenant_status) // the only tenant public metadata
.put("/v1/tenant/config", tenant_config_handler) // tenant config data and local file manager
.get("/v1/tenant/:tenant_id/timeline", timeline_list_handler)
.post("/v1/tenant/:tenant_id/timeline", timeline_create_handler)
.post("/v1/tenant/:tenant_id/attach", tenant_attach_handler) // download entire tenant from the remote storage and load its timelines memory
.post("/v1/tenant/:tenant_id/detach", tenant_detach_handler) // delete all tenant timelines from memory, remote corresponding storage and local FS files
.get("/v1/tenant/:tenant_id/timeline/:timeline_id", timeline_detail_handler)
.delete("/v1/tenant/:tenant_id/timeline/:timeline_id", timeline_delete_handler)
.get("/v1/tenant/:tenant_id/timeline/:timeline_id/wal_receiver", wal_receiver_get_handler) // get walreceiver stats metadata
```
Overall, neither HTTP operation goes below `LayeredRepository` level and does not interact with layers: instead, they manage tenant and timeline entities, their configuration and metadata.
`GET` data is small (relative to layer files contents), updated via brief `.write()/.lock()` calls and read via copying/cloning the data to release the lock soon.
It does not mean that the operations themselves are short, e.g. `tenant_attach_handler` downloads multiple files from the remote storage which might take time, yet the final data is inserted in memory via one brief write under the lock.
Non-`GET` operations mostly follow the same rule, with two differences:
- `tenant_detach_handler` has to wait for its background tasks to stop before shutting down, which requires more work with locks
- `timeline_create_handler` currently requires GC to be paused before branching the timeline, which requires orchestrating too.
This is the only HTTP operation, able to load the timeline into memory: rest of the operations are reading the metadata or, as in `tenant_attach_handler`, schedule a deferred task to download timeline and load it into memory.
"Timeline data synchronization" section below describes both complex cases in more details.
2. [libpq requests](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/page_service.rs)
Is the main interface of pageserver, intended to handle libpq (and similar) requests.
Operates on `LayeredTimeline` and, lower, `LayerMap` modules; all timelines accessed during the operation are loaded into memory immediately (if not loaded already), operations bail on timeline load errors.
- `pagestream`
Page requests: `get_rel_exists`, `get_rel_size`, `get_page_at_lsn`, `get_db_size`
Main API points, intended to be used by `compute` to show the data to the user. All require requests to be made at certain Lsn, if this Lsn is not available in the memory, request processing is paused until that happens or bails after a timeout.
- `basebackup` and `fullbackup`
Options to generate postgres-compatible backup archives.
- `import basebackup`
- `import wal`
Import the `pg_wal` section of the basebackup archive.
- `get_last_record_rlsn`, `get_lsn_by_timestamp`
"Metadata" retrieval methods, that still requires internal knowledge about layers.
- `set`, `fallpoints`, `show`
Utility methods to support various edge cases or help with debugging/testing.
- `do_gc`, `compact`, `checkpoint`
Manual triggers for corresponding tenant tasks (GC, compaction) and inmemory layer flushing on disk (checkpointing), with upload task scheduling as a follow-up.
Apart from loading into memory, every timeline layer has to be accessed using specific set of locking primitives, especially if a write operations happens: otherwise, GC or compaction might spoil the data. User API is implicitly affected by this synchronization during branching, when a GC has to be orchestrated properly before the new timeline could be branched off the existing one.
See "Timeline data synchronization" section for the united synchronization diagram on the topic.
3. internal access
Entities within pageserver that update files on local FS and remote storage, metadata in memory; has to use internal data for those operations.
Places that access internal, lower data are also required to have the corresponding timeline successfully loaded into memory and accessed with corresponding synchronization.
If ancestors' data is accessed via its child branch, it means more than one timeline has to be loaded into memory entirely and more locking primitives usage involved.
Right now, all ancestors are resolved in-place: every place that has to check timeline's ancestor has to lock the timelines map, check if one is loaded into the memory, load it there or bail if it's not present, and get the information required and so on.
- periodic GC and compaction tasks
Alter metadata (GC info), in-memory data (layer relations, page caches, etc.) and layer files on disk.
Same as its libpq counterparts, needs full synchronization with the low level layer management code.
- storage sync task
Alters metadata (`RemoteTimelineIndex`), layer files on remote storage (upload, delete) and local FS (download) and in-memory data (registers downloaded timelines in the repository).
Currently, does not know anything about layer files contents, rather focusing on the file structure and metadata file updates: due to the fact that the layer files cannot be updated (only created or deleted), storage sync is able to back up the files to the remote storage without further low-level synchronizations: only when the timeline is downloaded, a load operation is needed to run, possibly pausing GC and compaction tasks.
- walreceiver and walingest task
Per timeline, subscribes for etcd events from safekeeper and eventually spawns a walreceiver connection task to receive WAL from a safekeeper node.
Fills memory with data, eventually triggering a checkpoint task that creates a new layer file in the local FS and schedules a remote storage sync upload task.
During WAL receiving, also updates a separate in-memory data structure with the walreceiver stats, used later via HTTP API.
Layer updates require low-level set of sync primitives used to preserve the data consistency.
- checkpoint (layer freeze) task
Periodic, short-lived tasks to generate a new layer file in the FS. Requires low level synchronization in the end, when the layer is being registered after creating and has additional mode to ensure only one concurrent compaction happens at a time.
### Timeline data synchronization
Here's a high-level timeline data access diagram, considering the synchronization locks, based on the state diagram above.
For brevity, diagrams do not show `RwLock<HashMap<..>>` data accesses, considering them almost instant to happen.
`RwLock<LayerMap>` is close to be an exception to the previous rule, since it's taken in a multiple places to ensure all layers are inserted correctly.
Yet the only long operation in the current code is a `.write()` lock on the map during its creation, while all other lock usages tend to be short in the current code.
Note though, that due to current "working with loaded timeline only", prevailing amount of the locks taken on the struct are `.write()` locks, not the `.read()` ones.
To simplify the diagrams, these accesses are now considered "fast" data access, not the synchronization attempts.
`write_lock` synchronization diagram:
![timeline data access synchronization(1)](./images/017-timeline-data-management/timeline_data_access_sync_1.svg)
Comments:
- `write_lock: Mutex<()>` ensures that all timeline data being written into **in-memory layers** is done without races, one concurrent write at a time
- `layer_flush_lock: Mutex<()>` and layer flushing seems to be slightly bloated with various ways to create a layer on disk and write it in memory
The lock itself seem to repeat `write_lock` purpose when it touches in-memory layers, and also to limit the on-disk layer creations.
Yet the latter is not really done consistently, since remote storage sync manages to download and register the new layers without touching the locks
- `freeze_inmem_layer(true)` that touches both `write_lock` and `layer_flush_lock` seems not very aligned with the rest of the locks to those primitives; it also now restricts the layer creation concurrency even more, yet there are various `freeze_inmem_layer(false)` that are ignoring those restrictions at the same time
![timeline data access synchronization(2)](./images/017-timeline-data-management/timeline_data_access_sync_2.svg)
Comments:
- `partitioning: Mutex<(KeyPartitioning, Lsn)>` lock is a data sync lock that's not used to synchronize the tasks (all other such kinds were considered "almost instant" and omitted on the diagram), yet is very similar to what `write_lock` and `layer_flush_lock` do: it ensures the timeline in-memory data is up-to-date with the layer files state on disk, which is what `LayerMap` is for.
- there are multiple locks that do similar task management operations:
- `gc_cs: Mutex<()>` and `latest_gc_cutoff_lsn: RwLock<Lsn>` ensures that branching and gc are not run concurrently
- `layer_removal_cs: Mutex<()>` lock ensure gc, compaction and timeline deletion via HTTP API do not run concurrently
- `file_lock: RwLock<()>` is used as a semaphore, to ensure "all" gc and compaction tasks are shut down and do not start
Yet that lock does take only gc and compaction from internal loops: libpq call is not cancelled and waited upon.
Those operations do not seem to belong to a timeline. Moreover, some of those could be eliminated entirely due to duplication of their tasks.
## Proposed implementation
### How to structure timeline data access better
- adjust tenant state handling
Current [`TenantState`](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/tenant_mgr.rs#L108) [changes](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/tenant_mgr.rs#L317) mainly indicates whether GC and compaction tasks are running or not; another state, `Broken` shows only in case any timeline does not load during startup.
We could start both GC and compaction tasks at the time the tenant is created and adjust the tasks to throttle/sleep on timeline absence and wake up when the first one is added.
The latter becomes more important on download on demand, since we won't have the entire timeline in reach to verify its correctness. Moreover, if any network connection happens, the timeline could fail temporarily and entire tenant should be marked as broken due to that.
Since nothing verifies the `TenantState` via HTTP API currently, it makes sense to remove the whole state entirely and don't write the code to synchronize its changes.
Instead, we could indicate internal issues for every timeline and have a better API to "stop" timeline processing without deleting its data, making our API less restrictive.
- remove the "unloaded" status for the timeline
Current approach to timeline management [assumes](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/layered_repository.rs#L486-L493)
```rust
#[derive(Clone)]
enum LayeredTimelineEntry {
Loaded(Arc<LayeredTimeline>),
Unloaded {
id: ZTimelineId,
metadata: TimelineMetadata,
},
}
```
supposes that timelines have to be in `Unloaded` state.
The difference between both variants is whether its layer map was loaded from disk and kept in memory (Loaded) or not (Unloaded).
The idea behind such separation was to lazy load timelines in memory with all their layers only after its first access and potentially unload them later.
Yet now there's no public API methods, that deal with unloaded timelines' layers: all of them either bail when such timeline is worked on, or load it into memory and continue working.
Moreover, every timeline in the local FS is loaded on pageserver startup now, so only two places where `Unloaded` variant is used are branching and timeline attach, with both loading the timeline into memory before the end of the operation.
Even if that loading into memory bails for some reason, next GC or compaction task periodic run would load such timeline into memory.
There are a few timeline methods that return timeline metadata without loading its layers, but such metadata also comes from the `metadata` FS file, not the layer files (so no page info could be retrieved without loading the entire layer map first).
With the layer on-demand download, it's not feasible anymore to wait for the entire layer map to be loaded into the memory, since it might not even be available on the local FS when requested: `LayerMap` needs to be changed to contain metadata to retrieve the missing layers and handle partially present on the local FS timeline state.
To accommodate to that and move away from the redundant status, a timeline should always be "loaded" with its metadata read from the disk and its layer map prepared to be downloaded when requested, per layer.
Layers in the layer map, on the other hand, could be in various state: loaded, unloaded, downloading, downloading failed, etc. and their state has to be handled instead, if we want to support on-demand download in the future.
This way, tenants and timelines could always try to serve requests and do their internal tasks periodically, trying to recover.
- scale down the remote storage sync to per layer file, not per timeline as now
Due to the reasons from the previous bullet, current remote storage model needs its timeline download approach to be changed.
Right now, a timeline is marked as "ready" only after all its layers on the remote storage are downloaded on the local storage.
With the on-demand download approach, only remote storage timeline metadata should be downloaded from S3, leaving the rest of the layers ready for download if/when it's requested.
Note: while the remote storage sync should operate per layer, it should stay global for all tenants, to better manage S3 limits and sync queue priorities.
Yet the only place using remote storage should be the layer map.
- encapsulate `tenant_mgr` logic into a regular Rust struct, unite with part of the `Repository` and anything else needed to manage the timeline data in a single place and to test it independently
[`Repository`](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/repository.rs#L187) trait gets closer to `tenant_mgr` in terms of functionality: there are two background task-related functions, that are run on all timelines of a tenant: `gc_iteration` (it does allow running on a single timeline, but GC task runs it on all timelines) and `compaction_iteration` that are related to service tasks, not the data storage; and the metadata management functions, also not really related to the timeline contents.
`tenant_mgr` proxies some of the `Repository` calls, yet both service tasks use `tenant_mgr` to access the data they need, creating a circular dependency between their APIs.
To avoid excessive synchronization between components, taking multiple locks for that and static state, we can organize the data access and updates in one place.
One potential benefit Rust gets from this is the ability to track and manage timeline resources, if all the related data is located in one place.
- move `RemoteStorage` usage from `LayeredRepository` into `LayerMap`, as the rest of the layer-based entities (layer files, etc.)
Layer == file in our model, since pageserver always either tries to load the LayerMap from disk for the timeline not in memory, or assumes the file contents matches its memory.
`LayeredRepository` is one of the most loaded objects currently and not everything from it deserves unification with the `tenant_mgr`.
In particular, layer files need to be better prepared for future download on demand functionality, where every layer could be dynamically loaded and unloaded from memory and local FS.
Current amount of locks and sync-async separation would make it hard to implement truly dynamic (un)loading; moreover, we would need retries with backoffs, since the unloaded layer files are most probably not available on the local FS either and network is not always reliable.
One of the solutions to the issue is already being developed for the remote storage sync: [SyncQueue](https://github.com/neondatabase/neon/blob/28243d68e60ffc7e69f158522f589f7d2e09186d/pageserver/src/storage_sync.rs#L463)
The queue is able to batch CRUD layer operations (both for local and remote FS contexts) and reorder them to increase the sync speed.
Similar approach could be generalized for all layer modifications, including in-memory ones such as GC or compaction: this way, we could manage all layer modifications and reads in one place with lesser locks and tests that are closer to unit tests.
- change the approach to locking synchronization
A number of locks in the timeline seem to be used to coordinate gc, compaction tasks and related processes.
It should be done in a task manager or other place, external to the timeline.
Timeline contents still needs to be synchronized, considering the task work, so fields like `latest_gc_cutoff_lsn: RwLock<Lsn>` are expected to stay for that purpose, but general amount of locks should be reduced.
### Putting it all together
If the proposal bullets applied to the diagrams above, the state could be represented as:
![timeline timeline tenant state](./images/017-timeline-data-management/proposed_timeline_tenant_state.svg)
The reorders aim to put all tasks into separated modules, with strictly defined interfaces and as less knowledge about other components, as possible.
This way, all timeline data is now in the `data_storage`, including the GC, walreceiver, `RemoteTimelineIndex`, `LayerMap`, etc. with some API to get the data in the way,
more convenient for the data sync system inside.
So far, it seems that a few maps with `Arc<RwLock<SeparateData>>` with actual data operations added inside each `SeparateData` struct, if needed.
`page_cache` is proposed to placed into the same `data_storage` since it contains tenant timelines' data: this way, all metadata and data is in the same struct, simplifying things with Rust's borrow checker and allowing us to share internals between data modules and later might simplify timeline in-memory size tracking.
`task_manager` is related to data storage and manages all tenant and timeline tasks, manages shared resources (runtimes, thread pools, etcd connection, etc.) and synchronizes tasks.
All locks such as `gc_cs` belong to this module tree, as primitives inherently related to the task synchronization.
Tasks have to access timelines and their metadata, but should do that through `data_storage` API and similar.
`task_manager` should (re)start, stop and track all tasks that are run in it, selecting an appropriate runtime depending on a task kind (we have async/sync task separation, CPU and IO bound tasks separation, ...)
Some locks such as `layer_removal_cs` one are not needed, if the only component that starts the tasks ensures they don't run concurrently.
`LayeredTimeline` is still split into two parts, more high-level with whatever primitives needed to sync its state, and the actual state storage with `LayerMap` and other low level entities.
Only `LayerMap` knows what storage it's layer files are taken from (inmem, local FS, etc.), and it's responsible for synchronizing the layers when needed, as also reacting to sync events, successful or not.
Last but not least, `tenant config file` has to be backed into a remote storage, as tenant-specific information for all timelines.
Tenant and timelines have volatile information that's now partially mixed with constant information (e.g. fields in `metadata` file), that model should be better split and handled, in case we want to properly support its backups and synchronization.
![proposed timeline data access synchronization(1)](./images/017-timeline-data-management/proposed_timeline_data_access_sync_1.svg)
There's still a need to keep inmemory layer buffer synchronized during layer freezing, yet that could happen on a layer level, not on a timeline level, as `write_lock` used to be, so we could lower the sync primitives one layer deeper, preparing us for download on demand feature, where multiple layers could be concurrently streamed and written from various data sources.
Flushing the frozen layer requires creating a new layer on disk and further remote storage upload, so `LayerMap` has to get those flushed bytes and queue them later: no need to block in the timeline itself for anything again, rather locking on the layer level, if needed.
![proposed timeline data access synchronization(2)](./images/017-timeline-data-management/proposed_timeline_data_access_sync_2.svg)
Lock diagrams legend:
![lock diagrams legend](./images/017-timeline-data-management/lock_legend.svg)
After the frozen layers are flushed, something has to ensure that the layer structure is intact, so a repartitioning lock is needed still, and could also guard the layer map structure changes, since both are needed either way.
This locking belongs to the `LowLevelLayeredTimeline` from the proposed data structure diagram, as the place with all such data being held.
Similarly, branching is still required to be done after certain Lsn in our current model, but this needs only one lock to synchronize and that could be the `gc_cs: Mutex<()>` lock.
It raises the question of where this lock has to be placed, it's the only place that requires pausing a GC task during external, HTTP request handling.
The right place for the lock seems to be the `task_manager` that could manage GC in more fine-grained way to accommodate the incoming branching request.
There's no explicit lock sync between GC, compaction or other mutually exclusive tasks: it is a job of the `task_manager` to ensure those are not run concurrently.

View File

@@ -15,7 +15,7 @@ The stateless compute node that performs validation is separate from the storage
Limit the maximum size of a PostgreSQL instance to limit free tier users (and other tiers in the future).
First of all, this is needed to control our free tier production costs.
Another reason to limit resources is risk management — we haven't (fully) tested and optimized zenith for big clusters,
Another reason to limit resources is risk management — we haven't (fully) tested and optimized neon for big clusters,
so we don't want to give users access to the functionality that we don't think is ready.
## Components
@@ -43,20 +43,20 @@ Then this size should be reported to compute node.
`current_timeline_size` value is included in the walreceiver's custom feedback message: `ReplicationFeedback.`
(PR about protocol changes https://github.com/zenithdb/zenith/pull/1037).
(PR about protocol changes https://github.com/neondatabase/neon/pull/1037).
This message is received by the safekeeper and propagated to compute node as a part of `AppendResponse`.
Finally, when compute node receives the `current_timeline_size` from safekeeper (or from pageserver directly), it updates the global variable.
And then every zenith_extend() operation checks if limit is reached `(current_timeline_size > neon.max_cluster_size)` and throws `ERRCODE_DISK_FULL` error if so.
And then every neon_extend() operation checks if limit is reached `(current_timeline_size > neon.max_cluster_size)` and throws `ERRCODE_DISK_FULL` error if so.
(see Postgres error codes [https://www.postgresql.org/docs/devel/errcodes-appendix.html](https://www.postgresql.org/docs/devel/errcodes-appendix.html))
TODO:
We can allow autovacuum processes to bypass this check, simply checking `IsAutoVacuumWorkerProcess()`.
It would be nice to allow manual VACUUM and VACUUM FULL to bypass the check, but it's uneasy to distinguish these operations at the low level.
See issues https://github.com/neondatabase/neon/issues/1245
https://github.com/zenithdb/zenith/issues/1445
https://github.com/neondatabase/neon/issues/1445
TODO:
We should warn users if the limit is soon to be reached.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 15 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 62 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 33 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 34 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 70 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 49 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 33 KiB

View File

@@ -155,6 +155,8 @@ for other files and for sockets for incoming connections.
#### pg_distrib_dir
A directory with Postgres installation to use during pageserver activities.
Since pageserver supports several postgres versions, `pg_distrib_dir` contains
a subdirectory for each version with naming convention `v{PG_MAJOR_VERSION}/`.
Inside that dir, a `bin/postgres` binary should be present.
The default distrib dir is `./pg_install/`.

View File

@@ -10,7 +10,7 @@ Intended to be used in integration tests and in CLI tools for local installation
`/docs`:
Documentation of the Zenith features and concepts.
Documentation of the Neon features and concepts.
Now it is mostly dev documentation.
`/monitoring`:
@@ -19,7 +19,7 @@ TODO
`/pageserver`:
Zenith storage service.
Neon storage service.
The pageserver has a few different duties:
- Store and manage the data.
@@ -54,7 +54,7 @@ PostgreSQL extension that contains functions needed for testing and debugging.
`/safekeeper`:
The zenith WAL service that receives WAL from a primary compute nodes and streams it to the pageserver.
The neon WAL service that receives WAL from a primary compute nodes and streams it to the pageserver.
It acts as a holding area and redistribution center for recently generated WAL.
For more detailed info, see [walservice.md](./walservice.md)
@@ -64,11 +64,6 @@ The workspace_hack crate exists only to pin down some dependencies.
We use [cargo-hakari](https://crates.io/crates/cargo-hakari) for automation.
`/zenith`
Main entry point for the 'zenith' CLI utility.
TODO: Doesn't it belong to control_plane?
`/libs`:
Unites granular neon helper crates under the hood.
@@ -134,3 +129,52 @@ Also consider:
To add new package or change an existing one you can use `poetry add` or `poetry update` or edit `pyproject.toml` manually. Do not forget to run `poetry lock` in the latter case.
More details are available in poetry's [documentation](https://python-poetry.org/docs/).
## Configuring IDEs
Neon consists of three projects in different languages which use different project models.
* A bunch of Rust crates, all available from the root `Cargo.toml`.
* Integration tests in Python in the `test_runner` directory. Some stand-alone Python scripts exist as well.
* Postgres and our Postgres extensions in C built with Makefiles under `vendor/postgres` and `pgxn`.
### CLion
You can use CLion with the [Rust plugin](https://plugins.jetbrains.com/plugin/8182-rust) to develop Neon. It should pick up Rust and Python projects whenever you open Neon's repository as a project. We have not tried setting up a debugger, though.
C code requires some extra care, as it's built via Make, not CMake. Some of our developers have successfully used [compilation database](https://www.jetbrains.com/help/clion/compilation-database.html#compdb_generate) for CLion. It is a JSON file which lists all C source files and corresponding compilation keys. CLion can use it instead of `CMakeLists.txt`. To set up a project with a compilation database:
1. Clone the Neon repository and install all dependencies, including Python. Do not open it with CLion just yet.
2. Run the following commands in the repository's root:
```bash
# Install a `compiledb` tool which can parse make's output and generate the compilation database.
poetry add -D compiledb
# Clean the build tree so we can rebuild from scratch.
# Unfortunately, our and Postgres Makefiles do not work well with either --dry-run or --assume-new,
# so we don't know a way to generate the compilation database without recompiling everything,
# see https://github.com/neondatabase/neon/issues/2378#issuecomment-1241421325
make distclean
# Rebuild the Postgres parts from scratch and save the compilation commands to the compilation database.
# You can alter the -j parameter to your liking.
# Note that we only build for a specific version of Postgres. The extension code is shared, but headers are
# different, so we set up CLion to only use a specific version of the headers.
make -j$(nproc) --print-directory postgres-v15 neon-pg-ext-v15 | poetry run compiledb --verbose --no-build
# Uninstall the tool
poetry remove -D compiledb
# Make sure the compile_commands.json file is not committed.
echo /compile_commands.json >>.git/info/exclude
```
3. Open CLion, click "Open File or Project" and choose the generated `compile_commands.json` file to be opened "as a project". You cannot add a compilation database into an existing CLion project, you have to create a new one. _Do not_ open the directory as a project, open the file.
4. The newly created project should start indexing Postgres source code in C, as well as the C standard library. You may have to [configure the C compiler for the compilation database](https://www.jetbrains.com/help/clion/compilation-database.html#compdb_toolchain).
5. Open the `Cargo.toml` file in an editor in the same project. CLion should pick up the hint and start indexing Rust code.
6. Now you have a CLion project which knows about C files, Rust files. It should pick up Python files automatically as well.
7. Set up correct code indentation in CLion's settings: Editor > Code Style > C/C++, choose the "Project" scheme on the top, and tick the "Use tab character" on the "Tabs and Indents" tab. Ensure that "Tab size" is 4.
You can also enable Cargo Clippy diagnostics and enable Rustfmt instead of built-in code formatter.
Whenever you change layout of C files, you may need to regenerate the compilation database. No need to re-create the CLion project, changes should be picked up automatically.
Known issues (fixes and suggestions are welcome):
* Test results may be hard to read in CLion, both for unit tests in Rust and integration tests in Python. Use command line to run them instead.
* CLion does not support non-local Python interpreters, unlike PyCharm. E.g. if you use WSL, CLion does not see `poetry` and installed dependencies. Python support is limited.
* Cargo Clippy diagnostics in CLion may take a lot of resources.
* `poetry add -D` updates some packages and changes `poetry.lock` drastically even when followed by `poetry remove -D`. Feel free to `git checkout poetry.lock` and `./scripts/pysync` to revert these changes.

View File

@@ -11,7 +11,7 @@ use std::{fmt::Display, str::FromStr};
use once_cell::sync::Lazy;
use regex::{Captures, Regex};
use utils::zid::{NodeId, ZTenantId, ZTenantTimelineId};
use utils::id::{NodeId, TenantId, TenantTimelineId};
/// The subscription kind to the timeline updates from safekeeper.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
@@ -30,13 +30,13 @@ pub enum SubscriptionKind {
/// Get every update in etcd.
All,
/// Get etcd updates for any timeiline of a certain tenant, affected by any operation from any node kind.
TenantTimelines(ZTenantId),
TenantTimelines(TenantId),
/// Get etcd updates for a certain timeline of a tenant, affected by any operation from any node kind.
Timeline(ZTenantTimelineId),
Timeline(TenantTimelineId),
/// Get etcd timeline updates, specific to a certain node kind.
Node(ZTenantTimelineId, NodeKind),
Node(TenantTimelineId, NodeKind),
/// Get etcd timeline updates for a certain operation on specific nodes.
Operation(ZTenantTimelineId, NodeKind, OperationKind),
Operation(TenantTimelineId, NodeKind, OperationKind),
}
/// All kinds of nodes, able to write into etcd.
@@ -67,7 +67,7 @@ static SUBSCRIPTION_FULL_KEY_REGEX: Lazy<Regex> = Lazy::new(|| {
/// No other etcd keys are considered during system's work.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct SubscriptionFullKey {
pub id: ZTenantTimelineId,
pub id: TenantTimelineId,
pub node_kind: NodeKind,
pub operation: OperationKind,
pub node_id: NodeId,
@@ -83,7 +83,7 @@ impl SubscriptionKey {
}
/// Subscribes to a given timeline info updates from safekeepers.
pub fn sk_timeline_info(cluster_prefix: String, timeline: ZTenantTimelineId) -> Self {
pub fn sk_timeline_info(cluster_prefix: String, timeline: TenantTimelineId) -> Self {
Self {
cluster_prefix,
kind: SubscriptionKind::Operation(
@@ -97,7 +97,7 @@ impl SubscriptionKey {
/// Subscribes to all timeine updates during specific operations, running on the corresponding nodes.
pub fn operation(
cluster_prefix: String,
timeline: ZTenantTimelineId,
timeline: TenantTimelineId,
node_kind: NodeKind,
operation: OperationKind,
) -> Self {
@@ -175,7 +175,7 @@ impl FromStr for SubscriptionFullKey {
};
Ok(Self {
id: ZTenantTimelineId::new(
id: TenantTimelineId::new(
parse_capture(&key_captures, 1)?,
parse_capture(&key_captures, 2)?,
),
@@ -247,7 +247,7 @@ impl FromStr for SkOperationKind {
#[cfg(test)]
mod tests {
use utils::zid::ZTimelineId;
use utils::id::TimelineId;
use super::*;
@@ -256,9 +256,9 @@ mod tests {
let prefix = "neon";
let node_kind = NodeKind::Safekeeper;
let operation_kind = OperationKind::Safekeeper(SkOperationKind::WalBackup);
let tenant_id = ZTenantId::generate();
let timeline_id = ZTimelineId::generate();
let id = ZTenantTimelineId::new(tenant_id, timeline_id);
let tenant_id = TenantId::generate();
let timeline_id = TimelineId::generate();
let id = TenantTimelineId::new(tenant_id, timeline_id);
let node_id = NodeId(1);
let timeline_subscription_keys = [

View File

@@ -21,8 +21,9 @@ workspace_hack = { version = "0.1", path = "../../workspace_hack" }
[dev-dependencies]
env_logger = "0.9"
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
wal_craft = { path = "wal_craft" }
[build-dependencies]
bindgen = "0.59.1"
anyhow = "1.0"
bindgen = "0.60.1"

View File

@@ -4,6 +4,7 @@ use std::env;
use std::path::PathBuf;
use std::process::Command;
use anyhow::{anyhow, Context};
use bindgen::callbacks::ParseCallbacks;
#[derive(Debug)]
@@ -42,7 +43,7 @@ impl ParseCallbacks for PostgresFfiCallbacks {
}
}
fn main() {
fn main() -> anyhow::Result<()> {
// Tell cargo to invalidate the built crate whenever the wrapper changes
println!("cargo:rerun-if-changed=bindgen_deps.h");
@@ -58,7 +59,7 @@ fn main() {
for pg_version in &["v14", "v15"] {
let mut pg_install_dir_versioned = pg_install_dir.join(pg_version);
if pg_install_dir_versioned.is_relative() {
let cwd = env::current_dir().unwrap();
let cwd = env::current_dir().context("Failed to get current_dir")?;
pg_install_dir_versioned = cwd.join("..").join("..").join(pg_install_dir_versioned);
}
@@ -70,21 +71,25 @@ fn main() {
let output = Command::new(pg_config_bin)
.arg("--includedir-server")
.output()
.expect("failed to execute `pg_config --includedir-server`");
.context("failed to execute `pg_config --includedir-server`")?;
if !output.status.success() {
panic!("`pg_config --includedir-server` failed")
}
String::from_utf8(output.stdout).unwrap().trim_end().into()
String::from_utf8(output.stdout)
.context("pg_config output is not UTF-8")?
.trim_end()
.into()
} else {
pg_install_dir_versioned
let server_path = pg_install_dir_versioned
.join("include")
.join("postgresql")
.join("server")
.into_os_string()
.into_os_string();
server_path
.into_string()
.unwrap()
.map_err(|s| anyhow!("Bad postgres server path {s:?}"))?
};
// The bindgen::Builder is the main entry point
@@ -132,14 +137,18 @@ fn main() {
// Finish the builder and generate the bindings.
//
.generate()
.expect("Unable to generate bindings");
.context("Unable to generate bindings")?;
// Write the bindings to the $OUT_DIR/bindings_$pg_version.rs file.
let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());
let out_path: PathBuf = env::var("OUT_DIR")
.context("Couldn't read OUT_DIR environment variable var")?
.into();
let filename = format!("bindings_{pg_version}.rs");
bindings
.write_to_file(out_path.join(filename))
.expect("Couldn't write bindings!");
.context("Couldn't write bindings")?;
}
Ok(())
}

View File

@@ -7,6 +7,8 @@
// https://github.com/rust-lang/rust-bindgen/issues/1651
#![allow(deref_nullptr)]
use bytes::Bytes;
use utils::bin_ser::SerializeError;
use utils::lsn::Lsn;
macro_rules! postgres_ffi {
@@ -24,12 +26,12 @@ macro_rules! postgres_ffi {
stringify!($version),
".rs"
));
include!(concat!("pg_constants_", stringify!($version), ".rs"));
}
pub mod controlfile_utils;
pub mod nonrelfile_utils;
pub mod pg_constants;
pub mod relfile_utils;
pub mod waldecoder;
pub mod waldecoder_handler;
pub mod xlog_utils;
pub const PG_MAJORVERSION: &str = stringify!($version);
@@ -44,6 +46,9 @@ macro_rules! postgres_ffi {
postgres_ffi!(v14);
postgres_ffi!(v15);
pub mod pg_constants;
pub mod relfile_utils;
// Export some widely used datatypes that are unlikely to change across Postgres versions
pub use v14::bindings::{uint32, uint64, Oid};
pub use v14::bindings::{BlockNumber, OffsetNumber};
@@ -52,8 +57,11 @@ pub use v14::bindings::{TimeLineID, TimestampTz, XLogRecPtr, XLogSegNo};
// Likewise for these, although the assumption that these don't change is a little more iffy.
pub use v14::bindings::{MultiXactOffset, MultiXactStatus};
pub use v14::bindings::{PageHeaderData, XLogRecord};
pub use v14::xlog_utils::{XLOG_SIZE_OF_XLOG_RECORD, XLOG_SIZE_OF_XLOG_SHORT_PHD};
pub use v14::bindings::{CheckPoint, ControlFileData};
// from pg_config.h. These can be changed with configure options --with-blocksize=BLOCKSIZE and
// --with-segsize=SEGSIZE, but assume the defaults for now.
pub const BLCKSZ: u16 = 8192;
@@ -63,6 +71,49 @@ pub const WAL_SEGMENT_SIZE: usize = 16 * 1024 * 1024;
pub const MAX_SEND_SIZE: usize = XLOG_BLCKSZ * 16;
// Export some version independent functions that are used outside of this mod
pub use v14::xlog_utils::encode_logical_message;
pub use v14::xlog_utils::get_current_timestamp;
pub use v14::xlog_utils::to_pg_timestamp;
pub use v14::xlog_utils::XLogFileName;
pub use v14::bindings::DBState_DB_SHUTDOWNED;
pub fn bkpimage_is_compressed(bimg_info: u8, version: u32) -> anyhow::Result<bool> {
match version {
14 => Ok(bimg_info & v14::bindings::BKPIMAGE_IS_COMPRESSED != 0),
15 => Ok(bimg_info & v15::bindings::BKPIMAGE_COMPRESS_PGLZ != 0
|| bimg_info & v15::bindings::BKPIMAGE_COMPRESS_LZ4 != 0
|| bimg_info & v15::bindings::BKPIMAGE_COMPRESS_ZSTD != 0),
_ => anyhow::bail!("Unknown version {}", version),
}
}
pub fn generate_wal_segment(
segno: u64,
system_id: u64,
pg_version: u32,
) -> Result<Bytes, SerializeError> {
match pg_version {
14 => v14::xlog_utils::generate_wal_segment(segno, system_id),
15 => v15::xlog_utils::generate_wal_segment(segno, system_id),
_ => Err(SerializeError::BadInput),
}
}
pub fn generate_pg_control(
pg_control_bytes: &[u8],
checkpoint_bytes: &[u8],
lsn: Lsn,
pg_version: u32,
) -> anyhow::Result<(Bytes, u64)> {
match pg_version {
14 => v14::xlog_utils::generate_pg_control(pg_control_bytes, checkpoint_bytes, lsn),
15 => v15::xlog_utils::generate_pg_control(pg_control_bytes, checkpoint_bytes, lsn),
_ => anyhow::bail!("Unknown version {}", pg_version),
}
}
// PG timeline is always 1, changing it doesn't have any useful meaning in Neon.
//
// NOTE: this is not to be confused with Neon timelines; different concept!
@@ -74,7 +125,7 @@ pub const PG_TLI: u32 = 1;
// See TransactionIdIsNormal in transam.h
pub const fn transaction_id_is_normal(id: TransactionId) -> bool {
id > v14::pg_constants::FIRST_NORMAL_TRANSACTION_ID
id > pg_constants::FIRST_NORMAL_TRANSACTION_ID
}
// See TransactionIdPrecedes in transam.c
@@ -109,3 +160,76 @@ pub fn page_set_lsn(pg: &mut [u8], lsn: Lsn) {
pg[0..4].copy_from_slice(&((lsn.0 >> 32) as u32).to_le_bytes());
pg[4..8].copy_from_slice(&(lsn.0 as u32).to_le_bytes());
}
pub mod waldecoder {
use crate::{v14, v15};
use bytes::{Buf, Bytes, BytesMut};
use std::num::NonZeroU32;
use thiserror::Error;
use utils::lsn::Lsn;
pub enum State {
WaitingForRecord,
ReassemblingRecord {
recordbuf: BytesMut,
contlen: NonZeroU32,
},
SkippingEverything {
skip_until_lsn: Lsn,
},
}
pub struct WalStreamDecoder {
pub lsn: Lsn,
pub pg_version: u32,
pub inputbuf: BytesMut,
pub state: State,
}
#[derive(Error, Debug, Clone)]
#[error("{msg} at {lsn}")]
pub struct WalDecodeError {
pub msg: String,
pub lsn: Lsn,
}
impl WalStreamDecoder {
pub fn new(lsn: Lsn, pg_version: u32) -> WalStreamDecoder {
WalStreamDecoder {
lsn,
pg_version,
inputbuf: BytesMut::new(),
state: State::WaitingForRecord,
}
}
// The latest LSN position fed to the decoder.
pub fn available(&self) -> Lsn {
self.lsn + self.inputbuf.remaining() as u64
}
pub fn feed_bytes(&mut self, buf: &[u8]) {
self.inputbuf.extend_from_slice(buf);
}
pub fn poll_decode(&mut self) -> Result<Option<(Lsn, Bytes)>, WalDecodeError> {
match self.pg_version {
// This is a trick to support both versions simultaneously.
// See WalStreamDecoderHandler comments.
14 => {
use self::v14::waldecoder_handler::WalStreamDecoderHandler;
self.poll_decode_internal()
}
15 => {
use self::v15::waldecoder_handler::WalStreamDecoderHandler;
self.poll_decode_internal()
}
_ => Err(WalDecodeError {
msg: format!("Unknown version {}", self.pg_version),
lsn: self.lsn,
}),
}
}
}
}

View File

@@ -1,7 +1,7 @@
//!
//! Common utilities for dealing with PostgreSQL non-relation files.
//!
use super::pg_constants;
use crate::pg_constants;
use crate::transaction_id_precedes;
use bytes::BytesMut;
use log::*;

View File

@@ -1,14 +1,16 @@
//!
//! Misc constants, copied from PostgreSQL headers.
//!
//! Only place version-independent constants here.
//!
//! TODO: These probably should be auto-generated using bindgen,
//! rather than copied by hand. Although on the other hand, it's nice
//! to have them all here in one place, and have the ability to add
//! comments on them.
//!
use super::bindings::{PageHeaderData, XLogRecord};
use crate::BLCKSZ;
use crate::{PageHeaderData, XLogRecord};
//
// From pg_tablespace_d.h
@@ -16,14 +18,6 @@ use crate::BLCKSZ;
pub const DEFAULTTABLESPACE_OID: u32 = 1663;
pub const GLOBALTABLESPACE_OID: u32 = 1664;
//
// Fork numbers, from relpath.h
//
pub const MAIN_FORKNUM: u8 = 0;
pub const FSM_FORKNUM: u8 = 1;
pub const VISIBILITYMAP_FORKNUM: u8 = 2;
pub const INIT_FORKNUM: u8 = 3;
// From storage_xlog.h
pub const XLOG_SMGR_CREATE: u8 = 0x10;
pub const XLOG_SMGR_TRUNCATE: u8 = 0x20;
@@ -114,7 +108,6 @@ pub const XLOG_NEXTOID: u8 = 0x30;
pub const XLOG_SWITCH: u8 = 0x40;
pub const XLOG_FPI_FOR_HINT: u8 = 0xA0;
pub const XLOG_FPI: u8 = 0xB0;
pub const DB_SHUTDOWNED: u32 = 1;
// From multixact.h
pub const FIRST_MULTIXACT_ID: u32 = 1;
@@ -169,10 +162,6 @@ pub const RM_HEAP_ID: u8 = 10;
pub const XLR_INFO_MASK: u8 = 0x0F;
pub const XLR_RMGR_INFO_MASK: u8 = 0xF0;
// from dbcommands_xlog.h
pub const XLOG_DBASE_CREATE: u8 = 0x00;
pub const XLOG_DBASE_DROP: u8 = 0x10;
pub const XLOG_TBLSPC_CREATE: u8 = 0x00;
pub const XLOG_TBLSPC_DROP: u8 = 0x10;
@@ -197,8 +186,6 @@ pub const BKPBLOCK_SAME_REL: u8 = 0x80; /* RelFileNode omitted, same as previous
/* Information stored in bimg_info */
pub const BKPIMAGE_HAS_HOLE: u8 = 0x01; /* page image has "hole" */
pub const BKPIMAGE_IS_COMPRESSED: u8 = 0x02; /* page image is compressed */
pub const BKPIMAGE_APPLY: u8 = 0x04; /* page image should be restored during replay */
/* From transam.h */
pub const FIRST_NORMAL_TRANSACTION_ID: u32 = 3;

View File

@@ -0,0 +1,5 @@
pub const XLOG_DBASE_CREATE: u8 = 0x00;
pub const XLOG_DBASE_DROP: u8 = 0x10;
pub const BKPIMAGE_IS_COMPRESSED: u8 = 0x02; /* page image is compressed */
pub const BKPIMAGE_APPLY: u8 = 0x04; /* page image should be restored during replay */

View File

@@ -0,0 +1,10 @@
pub const XACT_XINFO_HAS_DROPPED_STATS: u32 = 1u32 << 8;
pub const XLOG_DBASE_CREATE_FILE_COPY: u8 = 0x00;
pub const XLOG_DBASE_CREATE_WAL_LOG: u8 = 0x00;
pub const XLOG_DBASE_DROP: u8 = 0x20;
pub const BKPIMAGE_APPLY: u8 = 0x02; /* page image should be restored during replay */
pub const BKPIMAGE_COMPRESS_PGLZ: u8 = 0x04; /* page image is compressed */
pub const BKPIMAGE_COMPRESS_LZ4: u8 = 0x08; /* page image is compressed */
pub const BKPIMAGE_COMPRESS_ZSTD: u8 = 0x10; /* page image is compressed */

View File

@@ -1,10 +1,17 @@
//!
//! Common utilities for dealing with PostgreSQL relation files.
//!
use super::pg_constants;
use once_cell::sync::OnceCell;
use regex::Regex;
//
// Fork numbers, from relpath.h
//
pub const MAIN_FORKNUM: u8 = 0;
pub const FSM_FORKNUM: u8 = 1;
pub const VISIBILITYMAP_FORKNUM: u8 = 2;
pub const INIT_FORKNUM: u8 = 3;
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq)]
pub enum FilePathError {
#[error("invalid relation fork name")]
@@ -23,10 +30,10 @@ impl From<core::num::ParseIntError> for FilePathError {
pub fn forkname_to_number(forkname: Option<&str>) -> Result<u8, FilePathError> {
match forkname {
// "main" is not in filenames, it's implicit if the fork name is not present
None => Ok(pg_constants::MAIN_FORKNUM),
Some("fsm") => Ok(pg_constants::FSM_FORKNUM),
Some("vm") => Ok(pg_constants::VISIBILITYMAP_FORKNUM),
Some("init") => Ok(pg_constants::INIT_FORKNUM),
None => Ok(MAIN_FORKNUM),
Some("fsm") => Ok(FSM_FORKNUM),
Some("vm") => Ok(VISIBILITYMAP_FORKNUM),
Some("init") => Ok(INIT_FORKNUM),
Some(_) => Err(FilePathError::InvalidForkName),
}
}
@@ -34,10 +41,10 @@ pub fn forkname_to_number(forkname: Option<&str>) -> Result<u8, FilePathError> {
/// Convert Postgres fork number to the right suffix of the relation data file.
pub fn forknumber_to_name(forknum: u8) -> Option<&'static str> {
match forknum {
pg_constants::MAIN_FORKNUM => None,
pg_constants::FSM_FORKNUM => Some("fsm"),
pg_constants::VISIBILITYMAP_FORKNUM => Some("vm"),
pg_constants::INIT_FORKNUM => Some("init"),
MAIN_FORKNUM => None,
FSM_FORKNUM => Some("fsm"),
VISIBILITYMAP_FORKNUM => Some("vm"),
INIT_FORKNUM => Some("init"),
_ => Some("UNKNOWN FORKNUM"),
}
}

View File

@@ -8,6 +8,7 @@
//! to look deeper into the WAL records to also understand which blocks they modify, the code
//! for that is in pageserver/src/walrecord.rs
//!
use super::super::waldecoder::{State, WalDecodeError, WalStreamDecoder};
use super::bindings::{XLogLongPageHeaderData, XLogPageHeaderData, XLogRecord, XLOG_PAGE_MAGIC};
use super::xlog_utils::*;
use crate::WAL_SEGMENT_SIZE;
@@ -16,55 +17,26 @@ use crc32c::*;
use log::*;
use std::cmp::min;
use std::num::NonZeroU32;
use thiserror::Error;
use utils::lsn::Lsn;
enum State {
WaitingForRecord,
ReassemblingRecord {
recordbuf: BytesMut,
contlen: NonZeroU32,
},
SkippingEverything {
skip_until_lsn: Lsn,
},
}
pub struct WalStreamDecoder {
lsn: Lsn,
inputbuf: BytesMut,
state: State,
}
#[derive(Error, Debug, Clone)]
#[error("{msg} at {lsn}")]
pub struct WalDecodeError {
msg: String,
lsn: Lsn,
pub trait WalStreamDecoderHandler {
fn validate_page_header(&self, hdr: &XLogPageHeaderData) -> Result<(), WalDecodeError>;
fn poll_decode_internal(&mut self) -> Result<Option<(Lsn, Bytes)>, WalDecodeError>;
fn complete_record(&mut self, recordbuf: Bytes) -> Result<(Lsn, Bytes), WalDecodeError>;
}
//
// WalRecordStream is a Stream that returns a stream of WAL records
// FIXME: This isn't a proper rust stream
// This is a trick to support several postgres versions simultaneously.
//
impl WalStreamDecoder {
pub fn new(lsn: Lsn) -> WalStreamDecoder {
WalStreamDecoder {
lsn,
inputbuf: BytesMut::new(),
state: State::WaitingForRecord,
}
}
// The latest LSN position fed to the decoder.
pub fn available(&self) -> Lsn {
self.lsn + self.inputbuf.remaining() as u64
}
pub fn feed_bytes(&mut self, buf: &[u8]) {
self.inputbuf.extend_from_slice(buf);
}
// Page decoding code depends on postgres bindings, so it is compiled for each version.
// Thus WalStreamDecoder implements several WalStreamDecoderHandler traits.
// WalStreamDecoder poll_decode() method dispatches to the right handler based on the postgres version.
// Other methods are internal and are not dispatched.
//
// It is similar to having several impl blocks for the same struct,
// but the impls here are in different modules, so need to use a trait.
//
impl WalStreamDecoderHandler for WalStreamDecoder {
fn validate_page_header(&self, hdr: &XLogPageHeaderData) -> Result<(), WalDecodeError> {
let validate_impl = || {
if hdr.xlp_magic != XLOG_PAGE_MAGIC as u16 {
@@ -125,7 +97,7 @@ impl WalStreamDecoder {
/// Ok(None): there is not enough data in the input buffer. Feed more by calling the `feed_bytes` function
/// Err(WalDecodeError): an error occurred while decoding, meaning the input was invalid.
///
pub fn poll_decode(&mut self) -> Result<Option<(Lsn, Bytes)>, WalDecodeError> {
fn poll_decode_internal(&mut self) -> Result<Option<(Lsn, Bytes)>, WalDecodeError> {
// Run state machine that validates page headers, and reassembles records
// that cross page boundaries.
loop {

View File

@@ -9,12 +9,13 @@
use crc32c::crc32c_append;
use super::super::waldecoder::WalStreamDecoder;
use super::bindings::{
CheckPoint, FullTransactionId, TimeLineID, TimestampTz, XLogLongPageHeaderData,
XLogPageHeaderData, XLogRecPtr, XLogRecord, XLogSegNo, XLOG_PAGE_MAGIC,
CheckPoint, ControlFileData, DBState_DB_SHUTDOWNED, FullTransactionId, TimeLineID, TimestampTz,
XLogLongPageHeaderData, XLogPageHeaderData, XLogRecPtr, XLogRecord, XLogSegNo, XLOG_PAGE_MAGIC,
};
use super::pg_constants;
use super::waldecoder::WalStreamDecoder;
use super::PG_MAJORVERSION;
use crate::pg_constants;
use crate::PG_TLI;
use crate::{uint32, uint64, Oid};
use crate::{WAL_SEGMENT_SIZE, XLOG_BLCKSZ};
@@ -113,6 +114,30 @@ pub fn normalize_lsn(lsn: Lsn, seg_sz: usize) -> Lsn {
}
}
pub fn generate_pg_control(
pg_control_bytes: &[u8],
checkpoint_bytes: &[u8],
lsn: Lsn,
) -> anyhow::Result<(Bytes, u64)> {
let mut pg_control = ControlFileData::decode(pg_control_bytes)?;
let mut checkpoint = CheckPoint::decode(checkpoint_bytes)?;
// Generate new pg_control needed for bootstrap
checkpoint.redo = normalize_lsn(lsn, WAL_SEGMENT_SIZE).0;
//reset some fields we don't want to preserve
//TODO Check this.
//We may need to determine the value from twophase data.
checkpoint.oldestActiveXid = 0;
//save new values in pg_control
pg_control.checkPoint = 0;
pg_control.checkPointCopy = checkpoint;
pg_control.state = DBState_DB_SHUTDOWNED;
Ok((pg_control.encode(), pg_control.system_identifier))
}
pub fn get_current_timestamp() -> TimestampTz {
to_pg_timestamp(SystemTime::now())
}
@@ -144,7 +169,10 @@ pub fn find_end_of_wal(
let mut result = start_lsn;
let mut curr_lsn = start_lsn;
let mut buf = [0u8; XLOG_BLCKSZ];
let mut decoder = WalStreamDecoder::new(start_lsn);
let pg_version = PG_MAJORVERSION[1..3].parse::<u32>().unwrap();
info!("find_end_of_wal PG_VERSION: {}", pg_version);
let mut decoder = WalStreamDecoder::new(start_lsn, pg_version);
// loop over segments
loop {
@@ -438,12 +466,15 @@ mod tests {
fn test_end_of_wal<C: wal_craft::Crafter>(test_name: &str) {
use wal_craft::*;
let pg_version = PG_MAJORVERSION[1..3].parse::<u32>().unwrap();
// Craft some WAL
let top_path = PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("..")
.join("..");
let cfg = Conf {
pg_distrib_dir: top_path.join(format!("pg_install/{PG_MAJORVERSION}")),
pg_version,
pg_distrib_dir: top_path.join("pg_install"),
datadir: top_path.join(format!("test_output/{}-{PG_MAJORVERSION}", test_name)),
};
if cfg.datadir.exists() {

View File

@@ -11,6 +11,7 @@ clap = "3.0"
env_logger = "0.9"
log = "0.4"
once_cell = "1.13.0"
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres_ffi = { path = "../" }
tempfile = "3.2"
workspace_hack = { version = "0.1", path = "../../../workspace_hack" }

View File

@@ -37,9 +37,16 @@ fn main() -> Result<()> {
Arg::new("pg-distrib-dir")
.long("pg-distrib-dir")
.takes_value(true)
.help("Directory with Postgres distribution (bin and lib directories, e.g. pg_install/v14)")
.help("Directory with Postgres distributions (bin and lib directories, e.g. pg_install containing subpath `v14/bin/postgresql`)")
.default_value("/usr/local")
)
.arg(
Arg::new("pg-version")
.long("pg-version")
.help("Postgres version to use for the initial tenant")
.required(true)
.takes_value(true)
)
)
.subcommand(
App::new("in-existing")
@@ -82,8 +89,14 @@ fn main() -> Result<()> {
}
Ok(())
}
Some(("with-initdb", arg_matches)) => {
let cfg = Conf {
pg_version: arg_matches
.value_of("pg-version")
.unwrap()
.parse::<u32>()
.context("Failed to parse postgres version from the argument string")?,
pg_distrib_dir: arg_matches.value_of("pg-distrib-dir").unwrap().into(),
datadir: arg_matches.value_of("datadir").unwrap().into(),
};

View File

@@ -15,6 +15,7 @@ use tempfile::{tempdir, TempDir};
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Conf {
pub pg_version: u32,
pub pg_distrib_dir: PathBuf,
pub datadir: PathBuf,
}
@@ -36,12 +37,22 @@ pub static REQUIRED_POSTGRES_CONFIG: Lazy<Vec<&'static str>> = Lazy::new(|| {
});
impl Conf {
pub fn pg_distrib_dir(&self) -> PathBuf {
let path = self.pg_distrib_dir.clone();
match self.pg_version {
14 => path.join(format!("v{}", self.pg_version)),
15 => path.join(format!("v{}", self.pg_version)),
_ => panic!("Unsupported postgres version: {}", self.pg_version),
}
}
fn pg_bin_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("bin")
self.pg_distrib_dir().join("bin")
}
fn pg_lib_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("lib")
self.pg_distrib_dir().join("lib")
}
pub fn wal_dir(&self) -> PathBuf {

View File

@@ -7,6 +7,7 @@ edition = "2021"
anyhow = { version = "1.0", features = ["backtrace"] }
async-trait = "0.1"
metrics = { version = "0.1", path = "../metrics" }
utils = { version = "0.1", path = "../utils" }
once_cell = "1.13.0"
rusoto_core = "0.48"
rusoto_s3 = "0.48"

View File

@@ -9,9 +9,7 @@ mod local_fs;
mod s3_bucket;
use std::{
borrow::Cow,
collections::HashMap,
ffi::OsStr,
fmt::{Debug, Display},
num::{NonZeroU32, NonZeroUsize},
ops::Deref,
@@ -344,22 +342,6 @@ impl Debug for S3Config {
}
}
/// Adds a suffix to the file(directory) name, either appending the suffux to the end of its extension,
/// or if there's no extension, creates one and puts a suffix there.
pub fn path_with_suffix_extension(original_path: impl AsRef<Path>, suffix: &str) -> PathBuf {
let new_extension = match original_path
.as_ref()
.extension()
.map(OsStr::to_string_lossy)
{
Some(extension) => Cow::Owned(format!("{extension}.{suffix}")),
None => Cow::Borrowed(suffix),
};
original_path
.as_ref()
.with_extension(new_extension.as_ref())
}
impl RemoteStorageConfig {
pub fn from_toml(toml: &toml_edit::Item) -> anyhow::Result<RemoteStorageConfig> {
let local_path = toml.get("local_path");
@@ -448,35 +430,6 @@ fn parse_toml_string(name: &str, item: &Item) -> anyhow::Result<String> {
mod tests {
use super::*;
#[test]
fn test_path_with_suffix_extension() {
let p = PathBuf::from("/foo/bar");
assert_eq!(
&path_with_suffix_extension(&p, "temp").to_string_lossy(),
"/foo/bar.temp"
);
let p = PathBuf::from("/foo/bar");
assert_eq!(
&path_with_suffix_extension(&p, "temp.temp").to_string_lossy(),
"/foo/bar.temp.temp"
);
let p = PathBuf::from("/foo/bar.baz");
assert_eq!(
&path_with_suffix_extension(&p, "temp.temp").to_string_lossy(),
"/foo/bar.baz.temp.temp"
);
let p = PathBuf::from("/foo/bar.baz");
assert_eq!(
&path_with_suffix_extension(&p, ".temp").to_string_lossy(),
"/foo/bar.baz..temp"
);
let p = PathBuf::from("/foo/bar/dir/");
assert_eq!(
&path_with_suffix_extension(&p, ".temp").to_string_lossy(),
"/foo/bar/dir..temp"
);
}
#[test]
fn object_name() {
let k = RemoteObjectId("a/b/c".to_owned());

View File

@@ -16,8 +16,9 @@ use tokio::{
io::{self, AsyncReadExt, AsyncSeekExt, AsyncWriteExt},
};
use tracing::*;
use utils::crashsafe_dir::path_with_suffix_extension;
use crate::{path_with_suffix_extension, Download, DownloadError, RemoteObjectId};
use crate::{Download, DownloadError, RemoteObjectId};
use super::{strip_path_prefix, RemoteStorage, StorageMetadata};

View File

@@ -10,8 +10,8 @@ bincode = "1.3"
bytes = "1.0.1"
hyper = { version = "0.14.7", features = ["full"] }
pin-project-lite = "0.2.7"
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres-protocol = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
routerify = "3"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"

View File

@@ -1,22 +1,22 @@
#![allow(unused)]
use criterion::{criterion_group, criterion_main, Criterion};
use utils::zid;
use utils::id;
pub fn bench_zid_stringify(c: &mut Criterion) {
pub fn bench_id_stringify(c: &mut Criterion) {
// Can only use public methods.
let ztl = zid::ZTenantTimelineId::generate();
let ttid = id::TenantTimelineId::generate();
c.bench_function("zid.to_string", |b| {
c.bench_function("id.to_string", |b| {
b.iter(|| {
// FIXME measurement overhead?
//for _ in 0..1000 {
// ztl.tenant_id.to_string();
// ttid.tenant_id.to_string();
//}
ztl.tenant_id.to_string();
ttid.tenant_id.to_string();
})
});
}
criterion_group!(benches, bench_zid_stringify);
criterion_group!(benches, bench_id_stringify);
criterion_main!(benches);

View File

@@ -14,7 +14,7 @@ use jsonwebtoken::{
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
use crate::zid::ZTenantId;
use crate::id::TenantId;
const JWT_ALGORITHM: Algorithm = Algorithm::RS256;
@@ -30,23 +30,23 @@ pub enum Scope {
pub struct Claims {
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub tenant_id: Option<ZTenantId>,
pub tenant_id: Option<TenantId>,
pub scope: Scope,
}
impl Claims {
pub fn new(tenant_id: Option<ZTenantId>, scope: Scope) -> Self {
pub fn new(tenant_id: Option<TenantId>, scope: Scope) -> Self {
Self { tenant_id, scope }
}
}
pub fn check_permission(claims: &Claims, tenantid: Option<ZTenantId>) -> Result<()> {
match (&claims.scope, tenantid) {
pub fn check_permission(claims: &Claims, tenant_id: Option<TenantId>) -> Result<()> {
match (&claims.scope, tenant_id) {
(Scope::Tenant, None) => {
bail!("Attempt to access management api with tenant scope. Permission denied")
}
(Scope::Tenant, Some(tenantid)) => {
if claims.tenant_id.unwrap() != tenantid {
(Scope::Tenant, Some(tenant_id)) => {
if claims.tenant_id.unwrap() != tenant_id {
bail!("Tenant id mismatch. Permission denied")
}
Ok(())

View File

@@ -1,7 +1,9 @@
use std::{
borrow::Cow,
ffi::OsStr,
fs::{self, File},
io,
path::Path,
path::{Path, PathBuf},
};
/// Similar to [`std::fs::create_dir`], except we fsync the
@@ -74,6 +76,22 @@ pub fn create_dir_all(path: impl AsRef<Path>) -> io::Result<()> {
Ok(())
}
/// Adds a suffix to the file(directory) name, either appending the suffix to the end of its extension,
/// or if there's no extension, creates one and puts a suffix there.
pub fn path_with_suffix_extension(original_path: impl AsRef<Path>, suffix: &str) -> PathBuf {
let new_extension = match original_path
.as_ref()
.extension()
.map(OsStr::to_string_lossy)
{
Some(extension) => Cow::Owned(format!("{extension}.{suffix}")),
None => Cow::Borrowed(suffix),
};
original_path
.as_ref()
.with_extension(new_extension.as_ref())
}
#[cfg(test)]
mod tests {
use tempfile::tempdir;
@@ -122,4 +140,33 @@ mod tests {
let invalid_dir_path = file_path.join("folder");
create_dir_all(&invalid_dir_path).unwrap_err();
}
#[test]
fn test_path_with_suffix_extension() {
let p = PathBuf::from("/foo/bar");
assert_eq!(
&path_with_suffix_extension(&p, "temp").to_string_lossy(),
"/foo/bar.temp"
);
let p = PathBuf::from("/foo/bar");
assert_eq!(
&path_with_suffix_extension(&p, "temp.temp").to_string_lossy(),
"/foo/bar.temp.temp"
);
let p = PathBuf::from("/foo/bar.baz");
assert_eq!(
&path_with_suffix_extension(&p, "temp.temp").to_string_lossy(),
"/foo/bar.baz.temp.temp"
);
let p = PathBuf::from("/foo/bar.baz");
assert_eq!(
&path_with_suffix_extension(&p, ".temp").to_string_lossy(),
"/foo/bar.baz..temp"
);
let p = PathBuf::from("/foo/bar/dir/");
assert_eq!(
&path_with_suffix_extension(&p, ".temp").to_string_lossy(),
"/foo/bar/dir..temp"
);
}
}

View File

@@ -1,6 +1,6 @@
use crate::auth::{self, Claims, JwtAuth};
use crate::http::error;
use crate::zid::ZTenantId;
use crate::id::TenantId;
use anyhow::anyhow;
use hyper::header::AUTHORIZATION;
use hyper::{header::CONTENT_TYPE, Body, Request, Response, Server};
@@ -137,9 +137,9 @@ pub fn auth_middleware<B: hyper::body::HttpBody + Send + Sync + 'static>(
})
}
pub fn check_permission(req: &Request<Body>, tenantid: Option<ZTenantId>) -> Result<(), ApiError> {
pub fn check_permission(req: &Request<Body>, tenant_id: Option<TenantId>) -> Result<(), ApiError> {
match req.context::<Claims>() {
Some(claims) => Ok(auth::check_permission(&claims, tenantid)
Some(claims) => Ok(auth::check_permission(&claims, tenant_id)
.map_err(|err| ApiError::Forbidden(err.to_string()))?),
None => Ok(()), // claims is None because auth is disabled
}

View File

@@ -1,12 +1,11 @@
use anyhow::anyhow;
use hyper::{header, Body, Response, StatusCode};
use serde::{Deserialize, Serialize};
use thiserror::Error;
#[derive(Debug, Error)]
pub enum ApiError {
#[error("Bad request: {0}")]
BadRequest(String),
#[error("Bad request: {0:#?}")]
BadRequest(anyhow::Error),
#[error("Forbidden: {0}")]
Forbidden(String),
@@ -15,24 +14,20 @@ pub enum ApiError {
Unauthorized(String),
#[error("NotFound: {0}")]
NotFound(String),
NotFound(anyhow::Error),
#[error("Conflict: {0}")]
Conflict(String),
#[error(transparent)]
InternalServerError(#[from] anyhow::Error),
InternalServerError(anyhow::Error),
}
impl ApiError {
pub fn from_err<E: Into<anyhow::Error>>(err: E) -> Self {
Self::InternalServerError(anyhow!(err))
}
pub fn into_response(self) -> Response<Body> {
match self {
ApiError::BadRequest(_) => HttpErrorBody::response_from_msg_and_status(
self.to_string(),
ApiError::BadRequest(err) => HttpErrorBody::response_from_msg_and_status(
format!("{err:#?}"), // use debug printing so that we give the cause
StatusCode::BAD_REQUEST,
),
ApiError::Forbidden(_) => {

View File

@@ -1,3 +1,4 @@
use anyhow::Context;
use bytes::Buf;
use hyper::{header, Body, Request, Response, StatusCode};
use serde::{Deserialize, Serialize};
@@ -9,20 +10,24 @@ pub async fn json_request<T: for<'de> Deserialize<'de>>(
) -> Result<T, ApiError> {
let whole_body = hyper::body::aggregate(request.body_mut())
.await
.map_err(ApiError::from_err)?;
.context("Failed to read request body")
.map_err(ApiError::BadRequest)?;
serde_json::from_reader(whole_body.reader())
.map_err(|err| ApiError::BadRequest(format!("Failed to parse json request {}", err)))
.context("Failed to parse json request")
.map_err(ApiError::BadRequest)
}
pub fn json_response<T: Serialize>(
status: StatusCode,
data: T,
) -> Result<Response<Body>, ApiError> {
let json = serde_json::to_string(&data).map_err(ApiError::from_err)?;
let json = serde_json::to_string(&data)
.context("Failed to serialize JSON response")
.map_err(ApiError::InternalServerError)?;
let response = Response::builder()
.status(status)
.header(header::CONTENT_TYPE, "application/json")
.body(Body::from(json))
.map_err(ApiError::from_err)?;
.map_err(|e| ApiError::InternalServerError(e.into()))?;
Ok(response)
}

View File

@@ -3,6 +3,6 @@ pub mod error;
pub mod json;
pub mod request;
/// Current fast way to apply simple http routing in various Zenith binaries.
/// Current fast way to apply simple http routing in various Neon binaries.
/// Re-exported for sake of uniform approach, that could be later replaced with better alternatives, if needed.
pub use routerify::{ext::RequestExt, RouterBuilder, RouterService};

View File

@@ -1,6 +1,7 @@
use std::str::FromStr;
use super::error::ApiError;
use anyhow::anyhow;
use hyper::{body::HttpBody, Body, Request};
use routerify::ext::RequestExt;
@@ -10,9 +11,8 @@ pub fn get_request_param<'a>(
) -> Result<&'a str, ApiError> {
match request.param(param_name) {
Some(arg) => Ok(arg),
None => Err(ApiError::BadRequest(format!(
"no {} specified in path param",
param_name
None => Err(ApiError::BadRequest(anyhow!(
"no {param_name} specified in path param",
))),
}
}
@@ -23,16 +23,15 @@ pub fn parse_request_param<T: FromStr>(
) -> Result<T, ApiError> {
match get_request_param(request, param_name)?.parse() {
Ok(v) => Ok(v),
Err(_) => Err(ApiError::BadRequest(format!(
"failed to parse {}",
param_name
Err(_) => Err(ApiError::BadRequest(anyhow!(
"failed to parse {param_name}",
))),
}
}
pub async fn ensure_no_body(request: &mut Request<Body>) -> Result<(), ApiError> {
match request.body_mut().data().await {
Some(_) => Err(ApiError::BadRequest("Unexpected request body".into())),
Some(_) => Err(ApiError::BadRequest(anyhow!("Unexpected request body"))),
None => Ok(()),
}
}

View File

@@ -4,7 +4,7 @@ use hex::FromHex;
use rand::Rng;
use serde::{Deserialize, Serialize};
/// Zenith ID is a 128-bit random ID.
/// Neon ID is a 128-bit random ID.
/// Used to represent various identifiers. Provides handy utility methods and impls.
///
/// NOTE: It (de)serializes as an array of hex bytes, so the string representation would look
@@ -13,13 +13,13 @@ use serde::{Deserialize, Serialize};
/// Use `#[serde_as(as = "DisplayFromStr")]` to (de)serialize it as hex string instead: `ad50847381e248feaac9876cc71ae418`.
/// Check the `serde_with::serde_as` documentation for options for more complex types.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize, PartialOrd, Ord)]
struct ZId([u8; 16]);
struct Id([u8; 16]);
impl ZId {
pub fn get_from_buf(buf: &mut dyn bytes::Buf) -> ZId {
impl Id {
pub fn get_from_buf(buf: &mut dyn bytes::Buf) -> Id {
let mut arr = [0u8; 16];
buf.copy_to_slice(&mut arr);
ZId::from(arr)
Id::from(arr)
}
pub fn as_arr(&self) -> [u8; 16] {
@@ -29,7 +29,7 @@ impl ZId {
pub fn generate() -> Self {
let mut tli_buf = [0u8; 16];
rand::thread_rng().fill(&mut tli_buf);
ZId::from(tli_buf)
Id::from(tli_buf)
}
fn hex_encode(&self) -> String {
@@ -44,54 +44,54 @@ impl ZId {
}
}
impl FromStr for ZId {
impl FromStr for Id {
type Err = hex::FromHexError;
fn from_str(s: &str) -> Result<ZId, Self::Err> {
fn from_str(s: &str) -> Result<Id, Self::Err> {
Self::from_hex(s)
}
}
// this is needed for pretty serialization and deserialization of ZId's using serde integration with hex crate
impl FromHex for ZId {
// this is needed for pretty serialization and deserialization of Id's using serde integration with hex crate
impl FromHex for Id {
type Error = hex::FromHexError;
fn from_hex<T: AsRef<[u8]>>(hex: T) -> Result<Self, Self::Error> {
let mut buf: [u8; 16] = [0u8; 16];
hex::decode_to_slice(hex, &mut buf)?;
Ok(ZId(buf))
Ok(Id(buf))
}
}
impl AsRef<[u8]> for ZId {
impl AsRef<[u8]> for Id {
fn as_ref(&self) -> &[u8] {
&self.0
}
}
impl From<[u8; 16]> for ZId {
impl From<[u8; 16]> for Id {
fn from(b: [u8; 16]) -> Self {
ZId(b)
Id(b)
}
}
impl fmt::Display for ZId {
impl fmt::Display for Id {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(&self.hex_encode())
}
}
impl fmt::Debug for ZId {
impl fmt::Debug for Id {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(&self.hex_encode())
}
}
macro_rules! zid_newtype {
macro_rules! id_newtype {
($t:ident) => {
impl $t {
pub fn get_from_buf(buf: &mut dyn bytes::Buf) -> $t {
$t(ZId::get_from_buf(buf))
$t(Id::get_from_buf(buf))
}
pub fn as_arr(&self) -> [u8; 16] {
@@ -99,11 +99,11 @@ macro_rules! zid_newtype {
}
pub fn generate() -> Self {
$t(ZId::generate())
$t(Id::generate())
}
pub const fn from_array(b: [u8; 16]) -> Self {
$t(ZId(b))
$t(Id(b))
}
}
@@ -111,14 +111,14 @@ macro_rules! zid_newtype {
type Err = hex::FromHexError;
fn from_str(s: &str) -> Result<$t, Self::Err> {
let value = ZId::from_str(s)?;
let value = Id::from_str(s)?;
Ok($t(value))
}
}
impl From<[u8; 16]> for $t {
fn from(b: [u8; 16]) -> Self {
$t(ZId::from(b))
$t(Id::from(b))
}
}
@@ -126,7 +126,7 @@ macro_rules! zid_newtype {
type Error = hex::FromHexError;
fn from_hex<T: AsRef<[u8]>>(hex: T) -> Result<Self, Self::Error> {
Ok($t(ZId::from_hex(hex)?))
Ok($t(Id::from_hex(hex)?))
}
}
@@ -150,7 +150,7 @@ macro_rules! zid_newtype {
};
}
/// Zenith timeline IDs are different from PostgreSQL timeline
/// Neon timeline IDs are different from PostgreSQL timeline
/// IDs. They serve a similar purpose though: they differentiate
/// between different "histories" of the same cluster. However,
/// PostgreSQL timeline IDs are a bit cumbersome, because they are only
@@ -158,7 +158,7 @@ macro_rules! zid_newtype {
/// timeline history. Those limitations mean that we cannot generate a
/// new PostgreSQL timeline ID by just generating a random number. And
/// that in turn is problematic for the "pull/push" workflow, where you
/// have a local copy of a zenith repository, and you periodically sync
/// have a local copy of a Neon repository, and you periodically sync
/// the local changes with a remote server. When you work "detached"
/// from the remote server, you cannot create a PostgreSQL timeline ID
/// that's guaranteed to be different from all existing timelines in
@@ -168,55 +168,55 @@ macro_rules! zid_newtype {
/// branches? If they pick the same one, and later try to push the
/// branches to the same remote server, they will get mixed up.
///
/// To avoid those issues, Zenith has its own concept of timelines that
/// To avoid those issues, Neon has its own concept of timelines that
/// is separate from PostgreSQL timelines, and doesn't have those
/// limitations. A zenith timeline is identified by a 128-bit ID, which
/// limitations. A Neon timeline is identified by a 128-bit ID, which
/// is usually printed out as a hex string.
///
/// NOTE: It (de)serializes as an array of hex bytes, so the string representation would look
/// like `[173,80,132,115,129,226,72,254,170,201,135,108,199,26,228,24]`.
/// See [`ZId`] for alternative ways to serialize it.
/// See [`Id`] for alternative ways to serialize it.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Ord, PartialOrd, Serialize, Deserialize)]
pub struct ZTimelineId(ZId);
pub struct TimelineId(Id);
zid_newtype!(ZTimelineId);
id_newtype!(TimelineId);
/// Zenith Tenant Id represents identifiar of a particular tenant.
/// Neon Tenant Id represents identifiar of a particular tenant.
/// Is used for distinguishing requests and data belonging to different users.
///
/// NOTE: It (de)serializes as an array of hex bytes, so the string representation would look
/// like `[173,80,132,115,129,226,72,254,170,201,135,108,199,26,228,24]`.
/// See [`ZId`] for alternative ways to serialize it.
/// See [`Id`] for alternative ways to serialize it.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize, PartialOrd, Ord)]
pub struct ZTenantId(ZId);
pub struct TenantId(Id);
zid_newtype!(ZTenantId);
id_newtype!(TenantId);
// A pair uniquely identifying Zenith instance.
// A pair uniquely identifying Neon instance.
#[derive(Debug, Clone, Copy, PartialOrd, Ord, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub struct ZTenantTimelineId {
pub tenant_id: ZTenantId,
pub timeline_id: ZTimelineId,
pub struct TenantTimelineId {
pub tenant_id: TenantId,
pub timeline_id: TimelineId,
}
impl ZTenantTimelineId {
pub fn new(tenant_id: ZTenantId, timeline_id: ZTimelineId) -> Self {
ZTenantTimelineId {
impl TenantTimelineId {
pub fn new(tenant_id: TenantId, timeline_id: TimelineId) -> Self {
TenantTimelineId {
tenant_id,
timeline_id,
}
}
pub fn generate() -> Self {
Self::new(ZTenantId::generate(), ZTimelineId::generate())
Self::new(TenantId::generate(), TimelineId::generate())
}
pub fn empty() -> Self {
Self::new(ZTenantId::from([0u8; 16]), ZTimelineId::from([0u8; 16]))
Self::new(TenantId::from([0u8; 16]), TimelineId::from([0u8; 16]))
}
}
impl fmt::Display for ZTenantTimelineId {
impl fmt::Display for TenantTimelineId {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}/{}", self.tenant_id, self.timeline_id)
}

View File

@@ -29,7 +29,7 @@ pub mod crashsafe_dir;
pub mod auth;
// utility functions and helper traits for unified unique id generation/serialization etc.
pub mod zid;
pub mod id;
// http endpoint utils
pub mod http;

View File

@@ -63,7 +63,7 @@ pub enum AuthType {
Trust,
MD5,
// This mimics postgres's AuthenticationCleartextPassword but instead of password expects JWT
ZenithJWT,
NeonJWT,
}
impl FromStr for AuthType {
@@ -73,8 +73,8 @@ impl FromStr for AuthType {
match s {
"Trust" => Ok(Self::Trust),
"MD5" => Ok(Self::MD5),
"ZenithJWT" => Ok(Self::ZenithJWT),
_ => bail!("invalid value \"{}\" for auth type", s),
"NeonJWT" => Ok(Self::NeonJWT),
_ => bail!("invalid value \"{s}\" for auth type"),
}
}
}
@@ -84,7 +84,7 @@ impl fmt::Display for AuthType {
f.write_str(match self {
AuthType::Trust => "Trust",
AuthType::MD5 => "MD5",
AuthType::ZenithJWT => "ZenithJWT",
AuthType::NeonJWT => "NeonJWT",
})
}
}
@@ -376,7 +376,7 @@ impl PostgresBackend {
))?;
self.state = ProtoState::Authentication;
}
AuthType::ZenithJWT => {
AuthType::NeonJWT => {
self.write_message(&BeMessage::AuthenticationCleartextPassword)?;
self.state = ProtoState::Authentication;
}
@@ -403,7 +403,7 @@ impl PostgresBackend {
bail!("auth failed: {}", e);
}
}
AuthType::ZenithJWT => {
AuthType::NeonJWT => {
let (_, jwt_response) = m.split_last().context("protocol violation")?;
if let Err(e) = handler.check_auth_jwt(self, jwt_response) {
@@ -429,8 +429,22 @@ impl PostgresBackend {
// full cause of the error, not just the top-level context + its trace.
// We don't want to send that in the ErrorResponse though,
// because it's not relevant to the compute node logs.
error!("query handler for '{}' failed: {:?}", query_string, e);
self.write_message_noflush(&BeMessage::ErrorResponse(&e.to_string()))?;
//
// We also don't want to log full stacktrace when the error is primitive,
// such as usual connection closed.
let short_error = format!("{:#}", e);
let root_cause = e.root_cause().to_string();
if root_cause.contains("connection closed unexpectedly")
|| root_cause.contains("Broken pipe (os error 32)")
{
error!(
"query handler for '{}' failed: {}",
query_string, short_error
);
} else {
error!("query handler for '{}' failed: {:?}", query_string, e);
}
self.write_message_noflush(&BeMessage::ErrorResponse(&short_error))?;
// TODO: untangle convoluted control flow
if e.to_string().contains("failed to run") {
return Ok(ProcessMsgResult::Break);

View File

@@ -346,7 +346,7 @@ impl PostgresBackend {
))?;
self.state = ProtoState::Authentication;
}
AuthType::ZenithJWT => {
AuthType::NeonJWT => {
self.write_message(&BeMessage::AuthenticationCleartextPassword)?;
self.state = ProtoState::Authentication;
}
@@ -374,7 +374,7 @@ impl PostgresBackend {
bail!("auth failed: {}", e);
}
}
AuthType::ZenithJWT => {
AuthType::NeonJWT => {
let (_, jwt_response) = m.split_last().context("protocol violation")?;
if let Err(e) = handler.check_auth_jwt(self, jwt_response) {

View File

@@ -931,7 +931,7 @@ impl ReplicationFeedback {
// Deserialize ReplicationFeedback message
pub fn parse(mut buf: Bytes) -> ReplicationFeedback {
let mut zf = ReplicationFeedback::empty();
let mut rf = ReplicationFeedback::empty();
let nfields = buf.get_u8();
for _ in 0..nfields {
let key = read_cstr(&mut buf).unwrap();
@@ -939,31 +939,31 @@ impl ReplicationFeedback {
b"current_timeline_size" => {
let len = buf.get_i32();
assert_eq!(len, 8);
zf.current_timeline_size = buf.get_u64();
rf.current_timeline_size = buf.get_u64();
}
b"ps_writelsn" => {
let len = buf.get_i32();
assert_eq!(len, 8);
zf.ps_writelsn = buf.get_u64();
rf.ps_writelsn = buf.get_u64();
}
b"ps_flushlsn" => {
let len = buf.get_i32();
assert_eq!(len, 8);
zf.ps_flushlsn = buf.get_u64();
rf.ps_flushlsn = buf.get_u64();
}
b"ps_applylsn" => {
let len = buf.get_i32();
assert_eq!(len, 8);
zf.ps_applylsn = buf.get_u64();
rf.ps_applylsn = buf.get_u64();
}
b"ps_replytime" => {
let len = buf.get_i32();
assert_eq!(len, 8);
let raw_time = buf.get_i64();
if raw_time > 0 {
zf.ps_replytime = *PG_EPOCH + Duration::from_micros(raw_time as u64);
rf.ps_replytime = *PG_EPOCH + Duration::from_micros(raw_time as u64);
} else {
zf.ps_replytime = *PG_EPOCH - Duration::from_micros(-raw_time as u64);
rf.ps_replytime = *PG_EPOCH - Duration::from_micros(-raw_time as u64);
}
}
_ => {
@@ -976,8 +976,8 @@ impl ReplicationFeedback {
}
}
}
trace!("ReplicationFeedback parsed is {:?}", zf);
zf
trace!("ReplicationFeedback parsed is {:?}", rf);
rf
}
}
@@ -987,29 +987,29 @@ mod tests {
#[test]
fn test_replication_feedback_serialization() {
let mut zf = ReplicationFeedback::empty();
// Fill zf with some values
zf.current_timeline_size = 12345678;
let mut rf = ReplicationFeedback::empty();
// Fill rf with some values
rf.current_timeline_size = 12345678;
// Set rounded time to be able to compare it with deserialized value,
// because it is rounded up to microseconds during serialization.
zf.ps_replytime = *PG_EPOCH + Duration::from_secs(100_000_000);
rf.ps_replytime = *PG_EPOCH + Duration::from_secs(100_000_000);
let mut data = BytesMut::new();
zf.serialize(&mut data).unwrap();
rf.serialize(&mut data).unwrap();
let zf_parsed = ReplicationFeedback::parse(data.freeze());
assert_eq!(zf, zf_parsed);
let rf_parsed = ReplicationFeedback::parse(data.freeze());
assert_eq!(rf, rf_parsed);
}
#[test]
fn test_replication_feedback_unknown_key() {
let mut zf = ReplicationFeedback::empty();
// Fill zf with some values
zf.current_timeline_size = 12345678;
let mut rf = ReplicationFeedback::empty();
// Fill rf with some values
rf.current_timeline_size = 12345678;
// Set rounded time to be able to compare it with deserialized value,
// because it is rounded up to microseconds during serialization.
zf.ps_replytime = *PG_EPOCH + Duration::from_secs(100_000_000);
rf.ps_replytime = *PG_EPOCH + Duration::from_secs(100_000_000);
let mut data = BytesMut::new();
zf.serialize(&mut data).unwrap();
rf.serialize(&mut data).unwrap();
// Add an extra field to the buffer and adjust number of keys
if let Some(first) = data.first_mut() {
@@ -1021,8 +1021,8 @@ mod tests {
data.put_u64(42);
// Parse serialized data and check that new field is not parsed
let zf_parsed = ReplicationFeedback::parse(data.freeze());
assert_eq!(zf, zf_parsed);
let rf_parsed = ReplicationFeedback::parse(data.freeze());
assert_eq!(rf, rf_parsed);
}
#[test]

7
logfile Normal file
View File

@@ -0,0 +1,7 @@
2022-09-22 12:07:18.140 EEST [463605] LOG: starting PostgreSQL 15beta3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2022-09-22 12:07:18.140 EEST [463605] LOG: listening on IPv4 address "127.0.0.1", port 15331
2022-09-22 12:07:18.142 EEST [463605] LOG: listening on Unix socket "/tmp/.s.PGSQL.15331"
2022-09-22 12:07:18.145 EEST [463608] LOG: database system was shut down at 2022-09-22 12:07:17 EEST
2022-09-22 12:07:18.149 EEST [463605] LOG: database system is ready to accept connections
2022-09-22 12:07:18.211 EEST [463605] LOG: received immediate shutdown request
2022-09-22 12:07:18.218 EEST [463605] LOG: database system is shut down

View File

@@ -4,12 +4,12 @@ version = "0.1.0"
edition = "2021"
[features]
# It is simpler infra-wise to have failpoints enabled by default
# It shouldn't affect performance in any way because failpoints
# are not placed in hot code paths
default = ["failpoints"]
default = []
# Enables test-only APIs, incuding failpoints. In particular, enables the `fail_point!` macro,
# which adds some runtime cost to run tests on outage conditions
testing = ["fail/failpoints"]
profiling = ["pprof"]
failpoints = ["fail/failpoints"]
[dependencies]
async-stream = "0.3"
@@ -27,10 +27,10 @@ clap = "3.0"
daemonize = "0.4.1"
tokio = { version = "1.17", features = ["process", "sync", "macros", "fs", "rt", "io-util", "time"] }
tokio-util = { version = "0.7.3", features = ["io", "io-util"] }
postgres-types = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres-protocol = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
tokio-postgres = { git = "https://github.com/zenithdb/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="d052ee8b86fff9897c77b0fe89ea9daba0e1fa38" }
anyhow = { version = "1.0", features = ["backtrace"] }
crc32c = "0.6.0"
thiserror = "1.0"
@@ -54,6 +54,9 @@ once_cell = "1.13.0"
crossbeam-utils = "0.8.5"
fail = "0.5.0"
git-version = "0.3.5"
rstar = "0.9.3"
num-traits = "0.2.15"
amplify_num = "0.4.1"
postgres_ffi = { path = "../libs/postgres_ffi" }
etcd_broker = { path = "../libs/etcd_broker" }

View File

@@ -22,13 +22,13 @@ use std::time::SystemTime;
use tar::{Builder, EntryType, Header};
use tracing::*;
use crate::layered_repository::Timeline;
use crate::reltag::{RelTag, SlruKind};
use crate::tenant::Timeline;
use postgres_ffi::v14::pg_constants;
use postgres_ffi::v14::xlog_utils::{generate_wal_segment, normalize_lsn, XLogFileName};
use postgres_ffi::v14::{CheckPoint, ControlFileData};
use postgres_ffi::pg_constants::{DEFAULTTABLESPACE_OID, GLOBALTABLESPACE_OID};
use postgres_ffi::pg_constants::{PGDATA_SPECIAL_FILES, PGDATA_SUBDIRS, PG_HBA};
use postgres_ffi::TransactionId;
use postgres_ffi::XLogFileName;
use postgres_ffi::PG_TLI;
use postgres_ffi::{BLCKSZ, RELSEG_SIZE, WAL_SEGMENT_SIZE};
use utils::lsn::Lsn;
@@ -129,15 +129,15 @@ where
// TODO include checksum
// Create pgdata subdirs structure
for dir in pg_constants::PGDATA_SUBDIRS.iter() {
for dir in PGDATA_SUBDIRS.iter() {
let header = new_tar_header_dir(*dir)?;
self.ar.append(&header, &mut io::empty())?;
}
// Send empty config files.
for filepath in pg_constants::PGDATA_SPECIAL_FILES.iter() {
for filepath in PGDATA_SPECIAL_FILES.iter() {
if *filepath == "pg_hba.conf" {
let data = pg_constants::PG_HBA.as_bytes();
let data = PG_HBA.as_bytes();
let header = new_tar_header(filepath, data.len() as u64)?;
self.ar.append(&header, data)?;
} else {
@@ -267,16 +267,12 @@ where
None
};
// TODO pass this as a parameter
let pg_version = "14";
if spcnode == GLOBALTABLESPACE_OID {
let pg_version_str = self.timeline.pg_version.to_string();
let header = new_tar_header("PG_VERSION", pg_version_str.len() as u64)?;
self.ar.append(&header, pg_version_str.as_bytes())?;
if spcnode == pg_constants::GLOBALTABLESPACE_OID {
let version_bytes = pg_version.as_bytes();
let header = new_tar_header("PG_VERSION", version_bytes.len() as u64)?;
self.ar.append(&header, version_bytes)?;
let header = new_tar_header("global/PG_VERSION", version_bytes.len() as u64)?;
self.ar.append(&header, version_bytes)?;
info!("timeline.pg_version {}", self.timeline.pg_version);
if let Some(img) = relmap_img {
// filenode map for global tablespace
@@ -305,7 +301,7 @@ where
return Ok(());
}
// User defined tablespaces are not supported
ensure!(spcnode == pg_constants::DEFAULTTABLESPACE_OID);
ensure!(spcnode == DEFAULTTABLESPACE_OID);
// Append dir path for each database
let path = format!("base/{}", dbnode);
@@ -314,9 +310,10 @@ where
if let Some(img) = relmap_img {
let dst_path = format!("base/{}/PG_VERSION", dbnode);
let version_bytes = pg_version.as_bytes();
let header = new_tar_header(&dst_path, version_bytes.len() as u64)?;
self.ar.append(&header, version_bytes)?;
let pg_version_str = self.timeline.pg_version.to_string();
let header = new_tar_header(&dst_path, pg_version_str.len() as u64)?;
self.ar.append(&header, pg_version_str.as_bytes())?;
let relmap_path = format!("base/{}/pg_filenode.map", dbnode);
let header = new_tar_header(&relmap_path, img.len() as u64)?;
@@ -348,30 +345,6 @@ where
// Also send zenith.signal file with extra bootstrap data.
//
fn add_pgcontrol_file(&mut self) -> anyhow::Result<()> {
let checkpoint_bytes = self
.timeline
.get_checkpoint(self.lsn)
.context("failed to get checkpoint bytes")?;
let pg_control_bytes = self
.timeline
.get_control_file(self.lsn)
.context("failed get control bytes")?;
let mut pg_control = ControlFileData::decode(&pg_control_bytes)?;
let mut checkpoint = CheckPoint::decode(&checkpoint_bytes)?;
// Generate new pg_control needed for bootstrap
checkpoint.redo = normalize_lsn(self.lsn, WAL_SEGMENT_SIZE).0;
//reset some fields we don't want to preserve
//TODO Check this.
//We may need to determine the value from twophase data.
checkpoint.oldestActiveXid = 0;
//save new values in pg_control
pg_control.checkPoint = 0;
pg_control.checkPointCopy = checkpoint;
pg_control.state = pg_constants::DB_SHUTDOWNED;
// add zenith.signal file
let mut zenith_signal = String::new();
if self.prev_record_lsn == Lsn(0) {
@@ -388,8 +361,23 @@ where
zenith_signal.as_bytes(),
)?;
let checkpoint_bytes = self
.timeline
.get_checkpoint(self.lsn)
.context("failed to get checkpoint bytes")?;
let pg_control_bytes = self
.timeline
.get_control_file(self.lsn)
.context("failed get control bytes")?;
let (pg_control_bytes, system_identifier) = postgres_ffi::generate_pg_control(
&pg_control_bytes,
&checkpoint_bytes,
self.lsn,
self.timeline.pg_version,
)?;
//send pg_control
let pg_control_bytes = pg_control.encode();
let header = new_tar_header("global/pg_control", pg_control_bytes.len() as u64)?;
self.ar.append(&header, &pg_control_bytes[..])?;
@@ -398,8 +386,10 @@ where
let wal_file_name = XLogFileName(PG_TLI, segno, WAL_SEGMENT_SIZE);
let wal_file_path = format!("pg_wal/{}", wal_file_name);
let header = new_tar_header(&wal_file_path, WAL_SEGMENT_SIZE as u64)?;
let wal_seg = generate_wal_segment(segno, pg_control.system_identifier)
.map_err(|e| anyhow!(e).context("Failed generating wal segment"))?;
let wal_seg =
postgres_ffi::generate_wal_segment(segno, system_identifier, self.timeline.pg_version)
.map_err(|e| anyhow!(e).context("Failed generating wal segment"))?;
ensure!(wal_seg.len() == WAL_SEGMENT_SIZE);
self.ar.append(&header, &wal_seg[..])?;
Ok(())

View File

@@ -3,8 +3,8 @@
//! A handy tool for debugging, that's all.
use anyhow::Result;
use clap::{App, Arg};
use pageserver::layered_repository::dump_layerfile_from_path;
use pageserver::page_cache;
use pageserver::tenant::dump_layerfile_from_path;
use pageserver::virtual_file;
use std::path::PathBuf;
use utils::project_git_version;
@@ -12,7 +12,7 @@ use utils::project_git_version;
project_git_version!(GIT_VERSION);
fn main() -> Result<()> {
let arg_matches = App::new("Zenith dump_layerfile utility")
let arg_matches = App::new("Neon dump_layerfile utility")
.about("Dump contents of one layer file, for debugging")
.version(GIT_VERSION)
.arg(

View File

@@ -40,7 +40,7 @@ fn version() -> String {
}
fn main() -> anyhow::Result<()> {
let arg_matches = App::new("Zenith page server")
let arg_matches = App::new("Neon page server")
.about("Materializes WAL stream to pages and serves them to the postgres")
.version(&*version())
.arg(
@@ -87,8 +87,8 @@ fn main() -> anyhow::Result<()> {
if arg_matches.is_present("enabled-features") {
let features: &[&str] = &[
#[cfg(feature = "failpoints")]
"failpoints",
#[cfg(feature = "testing")]
"testing",
#[cfg(feature = "profiling")]
"profiling",
];
@@ -182,7 +182,7 @@ fn initialize_config(
cfg_file_path.display()
);
} else {
// We're initializing the repo, so there's no config file yet
// We're initializing the tenant, so there's no config file yet
(
DEFAULT_CONFIG_FILE
.parse::<toml_edit::Document>()
@@ -293,7 +293,7 @@ fn start_pageserver(conf: &'static PageServerConf, daemonize: bool) -> Result<()
// initialize authentication for incoming connections
let auth = match &conf.auth_type {
AuthType::Trust | AuthType::MD5 => None,
AuthType::ZenithJWT => {
AuthType::NeonJWT => {
// unwrap is ok because check is performed when creating config, so path is set and file exists
let key_path = conf.auth_validation_public_key_path.as_ref().unwrap();
Some(JwtAuth::from_key_path(key_path)?.into())

View File

@@ -3,7 +3,7 @@
//! A handy tool for debugging, that's all.
use anyhow::Result;
use clap::{App, Arg};
use pageserver::layered_repository::metadata::TimelineMetadata;
use pageserver::tenant::metadata::TimelineMetadata;
use std::path::PathBuf;
use std::str::FromStr;
use utils::{lsn::Lsn, project_git_version};
@@ -11,7 +11,7 @@ use utils::{lsn::Lsn, project_git_version};
project_git_version!(GIT_VERSION);
fn main() -> Result<()> {
let arg_matches = App::new("Zenith update metadata utility")
let arg_matches = App::new("Neon update metadata utility")
.about("Dump or update metadata file")
.version(GIT_VERSION)
.arg(
@@ -50,6 +50,7 @@ fn main() -> Result<()> {
meta.ancestor_lsn(),
meta.latest_gc_cutoff_lsn(),
meta.initdb_lsn(),
meta.pg_version(),
);
update_meta = true;
}
@@ -62,6 +63,7 @@ fn main() -> Result<()> {
meta.ancestor_lsn(),
meta.latest_gc_cutoff_lsn(),
meta.initdb_lsn(),
meta.pg_version(),
);
update_meta = true;
}

View File

@@ -15,13 +15,17 @@ use toml_edit;
use toml_edit::{Document, Item};
use url::Url;
use utils::{
id::{NodeId, TenantId, TimelineId},
postgres_backend::AuthType,
zid::{NodeId, ZTenantId, ZTimelineId},
};
use crate::layered_repository::TIMELINES_SEGMENT_NAME;
use crate::tenant::TIMELINES_SEGMENT_NAME;
use crate::tenant_config::{TenantConf, TenantConfOpt};
/// The name of the metadata file pageserver creates per timeline.
pub const METADATA_FILE_NAME: &str = "metadata";
const TENANT_CONFIG_NAME: &str = "config";
pub mod defaults {
use crate::tenant_config::defaults::*;
use const_format::formatcp;
@@ -205,7 +209,7 @@ impl Default for PageServerConfigBuilder {
workdir: Set(PathBuf::new()),
pg_distrib_dir: Set(env::current_dir()
.expect("cannot access current directory")
.join("pg_install/v14")),
.join("pg_install")),
auth_type: Set(AuthType::Trust),
auth_validation_public_key_path: Set(None),
remote_storage_config: Set(None),
@@ -342,28 +346,57 @@ impl PageServerConf {
self.workdir.join("tenants")
}
pub fn tenant_path(&self, tenantid: &ZTenantId) -> PathBuf {
self.tenants_path().join(tenantid.to_string())
pub fn tenant_path(&self, tenant_id: &TenantId) -> PathBuf {
self.tenants_path().join(tenant_id.to_string())
}
pub fn timelines_path(&self, tenantid: &ZTenantId) -> PathBuf {
self.tenant_path(tenantid).join(TIMELINES_SEGMENT_NAME)
/// Points to a place in pageserver's local directory,
/// where certain tenant's tenantconf file should be located.
pub fn tenant_config_path(&self, tenant_id: TenantId) -> PathBuf {
self.tenant_path(&tenant_id).join(TENANT_CONFIG_NAME)
}
pub fn timeline_path(&self, timelineid: &ZTimelineId, tenantid: &ZTenantId) -> PathBuf {
self.timelines_path(tenantid).join(timelineid.to_string())
pub fn timelines_path(&self, tenant_id: &TenantId) -> PathBuf {
self.tenant_path(tenant_id).join(TIMELINES_SEGMENT_NAME)
}
pub fn timeline_path(&self, timeline_id: &TimelineId, tenant_id: &TenantId) -> PathBuf {
self.timelines_path(tenant_id).join(timeline_id.to_string())
}
/// Points to a place in pageserver's local directory,
/// where certain timeline's metadata file should be located.
pub fn metadata_path(&self, timeline_id: TimelineId, tenant_id: TenantId) -> PathBuf {
self.timeline_path(&timeline_id, &tenant_id)
.join(METADATA_FILE_NAME)
}
//
// Postgres distribution paths
//
pub fn pg_distrib_dir(&self, pg_version: u32) -> PathBuf {
let path = self.pg_distrib_dir.clone();
pub fn pg_bin_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("bin")
match pg_version {
14 => path.join(format!("v{pg_version}")),
15 => path.join(format!("v{pg_version}")),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
pub fn pg_lib_dir(&self) -> PathBuf {
self.pg_distrib_dir.join("lib")
pub fn pg_bin_dir(&self, pg_version: u32) -> PathBuf {
match pg_version {
14 => self.pg_distrib_dir(pg_version).join("bin"),
15 => self.pg_distrib_dir(pg_version).join("bin"),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
pub fn pg_lib_dir(&self, pg_version: u32) -> PathBuf {
match pg_version {
14 => self.pg_distrib_dir(pg_version).join("lib"),
15 => self.pg_distrib_dir(pg_version).join("lib"),
_ => panic!("Unsupported postgres version: {}", pg_version),
}
}
/// Parse a configuration file (pageserver.toml) into a PageServerConf struct,
@@ -419,7 +452,7 @@ impl PageServerConf {
let mut conf = builder.build().context("invalid config")?;
if conf.auth_type == AuthType::ZenithJWT {
if conf.auth_type == AuthType::NeonJWT {
let auth_validation_public_key_path = conf
.auth_validation_public_key_path
.get_or_insert_with(|| workdir.join("auth_public_key.pem"));
@@ -432,13 +465,6 @@ impl PageServerConf {
);
}
if !conf.pg_distrib_dir.join("bin/postgres").exists() {
bail!(
"Can't find postgres binary at {}",
conf.pg_distrib_dir.display()
);
}
conf.default_tenant_conf = t_conf.merge(TenantConf::default());
Ok(conf)
@@ -608,6 +634,7 @@ mod tests {
use tempfile::{tempdir, TempDir};
use super::*;
use crate::DEFAULT_PG_VERSION;
const ALL_BASE_VALUES_TOML: &str = r#"
# Initial configuration file created by 'pageserver --init'
@@ -847,8 +874,9 @@ broker_endpoints = ['{broker_endpoint}']
fs::create_dir_all(&workdir)?;
let pg_distrib_dir = tempdir_path.join("pg_distrib");
fs::create_dir_all(&pg_distrib_dir)?;
let postgres_bin_dir = pg_distrib_dir.join("bin");
let pg_distrib_dir_versioned = pg_distrib_dir.join(format!("v{DEFAULT_PG_VERSION}"));
fs::create_dir_all(&pg_distrib_dir_versioned)?;
let postgres_bin_dir = pg_distrib_dir_versioned.join("bin");
fs::create_dir_all(&postgres_bin_dir)?;
fs::write(postgres_bin_dir.join("postgres"), "I'm postgres, trust me")?;

View File

@@ -3,25 +3,25 @@ use std::num::NonZeroU64;
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
use utils::{
id::{NodeId, TenantId, TimelineId},
lsn::Lsn,
zid::{NodeId, ZTenantId, ZTimelineId},
};
// These enums are used in the API response fields.
use crate::tenant_mgr::TenantState;
use crate::tenant::TenantState;
#[serde_as]
#[derive(Serialize, Deserialize)]
pub struct TimelineCreateRequest {
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub new_timeline_id: Option<ZTimelineId>,
pub new_timeline_id: Option<TimelineId>,
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub ancestor_timeline_id: Option<ZTimelineId>,
pub ancestor_timeline_id: Option<TimelineId>,
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub ancestor_start_lsn: Option<Lsn>,
pub pg_version: Option<u32>,
}
#[serde_as]
@@ -29,7 +29,7 @@ pub struct TimelineCreateRequest {
pub struct TenantCreateRequest {
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub new_tenant_id: Option<ZTenantId>,
pub new_tenant_id: Option<TenantId>,
pub checkpoint_distance: Option<u64>,
pub checkpoint_timeout: Option<String>,
pub compaction_target_size: Option<u64>,
@@ -47,7 +47,7 @@ pub struct TenantCreateRequest {
#[serde_as]
#[derive(Serialize, Deserialize)]
#[serde(transparent)]
pub struct TenantCreateResponse(#[serde_as(as = "DisplayFromStr")] pub ZTenantId);
pub struct TenantCreateResponse(#[serde_as(as = "DisplayFromStr")] pub TenantId);
#[derive(Serialize)]
pub struct StatusResponse {
@@ -55,7 +55,7 @@ pub struct StatusResponse {
}
impl TenantCreateRequest {
pub fn new(new_tenant_id: Option<ZTenantId>) -> TenantCreateRequest {
pub fn new(new_tenant_id: Option<TenantId>) -> TenantCreateRequest {
TenantCreateRequest {
new_tenant_id,
..Default::default()
@@ -66,7 +66,7 @@ impl TenantCreateRequest {
#[serde_as]
#[derive(Serialize, Deserialize)]
pub struct TenantConfigRequest {
pub tenant_id: ZTenantId,
pub tenant_id: TenantId,
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
pub checkpoint_distance: Option<u64>,
@@ -84,7 +84,7 @@ pub struct TenantConfigRequest {
}
impl TenantConfigRequest {
pub fn new(tenant_id: ZTenantId) -> TenantConfigRequest {
pub fn new(tenant_id: TenantId) -> TenantConfigRequest {
TenantConfigRequest {
tenant_id,
checkpoint_distance: None,
@@ -107,8 +107,8 @@ impl TenantConfigRequest {
#[derive(Serialize, Deserialize, Clone)]
pub struct TenantInfo {
#[serde_as(as = "DisplayFromStr")]
pub id: ZTenantId,
pub state: Option<TenantState>,
pub id: TenantId,
pub state: TenantState,
pub current_physical_size: Option<u64>, // physical size is only included in `tenant_status` endpoint
pub has_in_progress_downloads: Option<bool>,
}
@@ -117,7 +117,7 @@ pub struct TenantInfo {
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct LocalTimelineInfo {
#[serde_as(as = "Option<DisplayFromStr>")]
pub ancestor_timeline_id: Option<ZTimelineId>,
pub ancestor_timeline_id: Option<TimelineId>,
#[serde_as(as = "Option<DisplayFromStr>")]
pub ancestor_lsn: Option<Lsn>,
#[serde_as(as = "DisplayFromStr")]
@@ -138,6 +138,7 @@ pub struct LocalTimelineInfo {
pub last_received_msg_lsn: Option<Lsn>,
/// the timestamp (in microseconds) of the last received message
pub last_received_msg_ts: Option<u128>,
pub pg_version: u32,
}
#[serde_as]
@@ -155,9 +156,27 @@ pub struct RemoteTimelineInfo {
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct TimelineInfo {
#[serde_as(as = "DisplayFromStr")]
pub tenant_id: ZTenantId,
pub tenant_id: TenantId,
#[serde_as(as = "DisplayFromStr")]
pub timeline_id: ZTimelineId,
pub timeline_id: TimelineId,
pub local: Option<LocalTimelineInfo>,
pub remote: Option<RemoteTimelineInfo>,
}
pub type ConfigureFailpointsRequest = Vec<FailpointConfig>;
/// Information for configuring a single fail point
#[derive(Debug, Serialize, Deserialize)]
pub struct FailpointConfig {
/// Name of the fail point
pub name: String,
/// List of actions to take, using the format described in `fail::cfg`
///
/// We also support `actions = "exit"` to cause the fail point to immediately exit.
pub actions: String,
}
#[derive(Debug, Serialize, Deserialize)]
pub struct TimelineGcRequest {
pub gc_horizon: Option<u64>,
}

View File

@@ -307,6 +307,7 @@ paths:
description: |
Create a timeline. Returns new timeline id on success.\
If no new timeline id is specified in parameters, it would be generated. It's an error to recreate the same timeline.
If no pg_version is specified, assume DEFAULT_PG_VERSION hardcoded in the pageserver.
requestBody:
content:
application/json:
@@ -322,6 +323,8 @@ paths:
ancestor_start_lsn:
type: string
format: hex
pg_version:
type: integer
responses:
"201":
description: TimelineInfo
@@ -489,11 +492,18 @@ components:
type: object
required:
- id
- state
properties:
id:
type: string
state:
type: string
oneOf:
- type: string
- type: object
properties:
background_jobs_running:
type: boolean
current_physical_size:
type: integer
has_in_progress_downloads:
@@ -573,7 +583,6 @@ components:
required:
- last_record_lsn
- disk_consistent_lsn
- timeline_state
properties:
last_record_lsn:
type: string
@@ -581,8 +590,6 @@ components:
disk_consistent_lsn:
type: string
format: hex
timeline_state:
type: string
ancestor_timeline_id:
type: string
format: hex

View File

@@ -1,9 +1,10 @@
use std::sync::Arc;
use anyhow::{Context, Result};
use anyhow::{anyhow, Context, Result};
use hyper::StatusCode;
use hyper::{Body, Request, Response, Uri};
use remote_storage::GenericRemoteStorage;
use tokio::task::JoinError;
use tracing::*;
use super::models::{LocalTimelineInfo, RemoteTimelineInfo, TimelineInfo};
@@ -11,11 +12,11 @@ use super::models::{
StatusResponse, TenantConfigRequest, TenantCreateRequest, TenantCreateResponse, TenantInfo,
TimelineCreateRequest,
};
use crate::layered_repository::Timeline;
use crate::storage_sync;
use crate::storage_sync::index::{RemoteIndex, RemoteTimeline};
use crate::tenant::{TenantState, Timeline};
use crate::tenant_config::TenantConfOpt;
use crate::{config::PageServerConf, tenant_mgr, timelines};
use crate::{config::PageServerConf, tenant_mgr};
use utils::{
auth::JwtAuth,
http::{
@@ -25,10 +26,16 @@ use utils::{
request::parse_request_param,
RequestExt, RouterBuilder,
},
id::{TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
zid::{ZTenantId, ZTenantTimelineId, ZTimelineId},
};
// Imports only used for testing APIs
#[cfg(feature = "testing")]
use super::models::{ConfigureFailpointsRequest, TimelineGcRequest};
#[cfg(feature = "testing")]
use crate::CheckpointConfig;
struct State {
conf: &'static PageServerConf,
auth: Option<Arc<JwtAuth>>,
@@ -123,21 +130,21 @@ fn local_timeline_info_from_timeline(
wal_source_connstr,
last_received_msg_lsn,
last_received_msg_ts,
pg_version: timeline.pg_version,
};
Ok(info)
}
fn list_local_timelines(
tenant_id: ZTenantId,
tenant_id: TenantId,
include_non_incremental_logical_size: bool,
include_non_incremental_physical_size: bool,
) -> Result<Vec<(ZTimelineId, LocalTimelineInfo)>> {
let repo = tenant_mgr::get_repository_for_tenant(tenant_id)
.with_context(|| format!("Failed to get repo for tenant {tenant_id}"))?;
let repo_timelines = repo.list_timelines();
) -> Result<Vec<(TimelineId, LocalTimelineInfo)>> {
let tenant = tenant_mgr::get_tenant(tenant_id, true)?;
let timelines = tenant.list_timelines();
let mut local_timeline_info = Vec::with_capacity(repo_timelines.len());
for (timeline_id, repository_timeline) in repo_timelines {
let mut local_timeline_info = Vec::with_capacity(timelines.len());
for (timeline_id, repository_timeline) in timelines {
local_timeline_info.push((
timeline_id,
local_timeline_info_from_timeline(
@@ -157,21 +164,22 @@ async fn status_handler(request: Request<Body>) -> Result<Response<Body>, ApiErr
}
async fn timeline_create_handler(mut request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let request_data: TimelineCreateRequest = json_request(&mut request).await?;
check_permission(&request, Some(tenant_id))?;
let tenant = tenant_mgr::get_tenant(tenant_id, true).map_err(ApiError::NotFound)?;
let new_timeline_info = async {
match timelines::create_timeline(
get_config(&request),
tenant_id,
request_data.new_timeline_id.map(ZTimelineId::from),
request_data.ancestor_timeline_id.map(ZTimelineId::from),
match tenant.create_timeline(
request_data.new_timeline_id.map(TimelineId::from),
request_data.ancestor_timeline_id.map(TimelineId::from),
request_data.ancestor_start_lsn,
request_data.pg_version.unwrap_or(crate::DEFAULT_PG_VERSION)
).await {
Ok(Some(new_timeline)) => {
// Created. Construct a TimelineInfo for it.
let local_info = local_timeline_info_from_timeline(&new_timeline, false, false)?;
let local_info = local_timeline_info_from_timeline(&new_timeline, false, false)
.map_err(ApiError::InternalServerError)?;
Ok(Some(TimelineInfo {
tenant_id,
timeline_id: new_timeline.timeline_id,
@@ -180,12 +188,11 @@ async fn timeline_create_handler(mut request: Request<Body>) -> Result<Response<
}))
}
Ok(None) => Ok(None), // timeline already exists
Err(err) => Err(err),
Err(err) => Err(ApiError::InternalServerError(err)),
}
}
.instrument(info_span!("timeline_create", tenant = %tenant_id, new_timeline = ?request_data.new_timeline_id, lsn=?request_data.ancestor_start_lsn))
.await
.map_err(ApiError::from_err)?;
.instrument(info_span!("timeline_create", tenant = %tenant_id, new_timeline = ?request_data.new_timeline_id, lsn=?request_data.ancestor_start_lsn, pg_version=?request_data.pg_version))
.await?;
Ok(match new_timeline_info {
Some(info) => json_response(StatusCode::CREATED, info)?,
@@ -194,35 +201,44 @@ async fn timeline_create_handler(mut request: Request<Body>) -> Result<Response<
}
async fn timeline_list_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let include_non_incremental_logical_size =
query_param_present(&request, "include-non-incremental-logical-size");
let include_non_incremental_physical_size =
query_param_present(&request, "include-non-incremental-physical-size");
check_permission(&request, Some(tenant_id))?;
let local_timeline_infos = tokio::task::spawn_blocking(move || {
let timelines = tokio::task::spawn_blocking(move || {
let _enter = info_span!("timeline_list", tenant = %tenant_id).entered();
list_local_timelines(
tenant_id,
include_non_incremental_logical_size,
include_non_incremental_physical_size,
)
let tenant = tenant_mgr::get_tenant(tenant_id, true).map_err(ApiError::NotFound)?;
Ok(tenant.list_timelines())
})
.await
.map_err(ApiError::from_err)??;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))??;
let mut response_data = Vec::with_capacity(timelines.len());
for (timeline_id, timeline) in timelines {
let local = match local_timeline_info_from_timeline(
&timeline,
include_non_incremental_logical_size,
include_non_incremental_physical_size,
) {
Ok(local) => Some(local),
Err(e) => {
error!("Failed to convert tenant timeline {timeline_id} into the local one: {e:?}");
None
}
};
let mut response_data = Vec::with_capacity(local_timeline_infos.len());
for (timeline_id, local_timeline_info) in local_timeline_infos {
response_data.push(TimelineInfo {
tenant_id,
timeline_id,
local: Some(local_timeline_info),
local,
remote: get_state(&request)
.remote_index
.read()
.await
.timeline_entry(&ZTenantTimelineId {
.timeline_entry(&TenantTimelineId {
tenant_id,
timeline_id,
})
@@ -250,8 +266,8 @@ fn query_param_present(request: &Request<Body>, param: &str) -> bool {
}
async fn timeline_detail_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: ZTimelineId = parse_request_param(&request, "timeline_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
let include_non_incremental_logical_size =
query_param_present(&request, "include-non-incremental-logical-size");
let include_non_incremental_physical_size =
@@ -259,33 +275,30 @@ async fn timeline_detail_handler(request: Request<Body>) -> Result<Response<Body
check_permission(&request, Some(tenant_id))?;
let (local_timeline_info, remote_timeline_info) = async {
// any error here will render local timeline as None
// XXX .in_current_span does not attach messages in spawn_blocking future to current future's span
let local_timeline_info = tokio::task::spawn_blocking(move || {
let repo = tenant_mgr::get_repository_for_tenant(tenant_id)?;
let local_timeline = {
repo.get_timeline(timeline_id)
.as_ref()
.map(|timeline| {
local_timeline_info_from_timeline(
timeline,
include_non_incremental_logical_size,
include_non_incremental_physical_size,
)
})
.transpose()?
};
Ok::<_, anyhow::Error>(local_timeline)
let timeline = tokio::task::spawn_blocking(move || {
tenant_mgr::get_tenant(tenant_id, true)?.get_timeline(timeline_id)
})
.await
.ok()
.and_then(|r| r.ok())
.flatten();
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))?;
let local_timeline_info = match timeline.and_then(|timeline| {
local_timeline_info_from_timeline(
&timeline,
include_non_incremental_logical_size,
include_non_incremental_physical_size,
)
}) {
Ok(local_info) => Some(local_info),
Err(e) => {
error!("Failed to get local timeline info: {e:#}");
None
}
};
let remote_timeline_info = {
let remote_index_read = get_state(&request).remote_index.read().await;
remote_index_read
.timeline_entry(&ZTenantTimelineId {
.timeline_entry(&TenantTimelineId {
tenant_id,
timeline_id,
})
@@ -294,42 +307,43 @@ async fn timeline_detail_handler(request: Request<Body>) -> Result<Response<Body
awaits_download: remote_entry.awaits_download,
})
};
(local_timeline_info, remote_timeline_info)
Ok::<_, ApiError>((local_timeline_info, remote_timeline_info))
}
.instrument(info_span!("timeline_detail", tenant = %tenant_id, timeline = %timeline_id))
.await;
.await?;
if local_timeline_info.is_none() && remote_timeline_info.is_none() {
return Err(ApiError::NotFound(format!(
Err(ApiError::NotFound(anyhow!(
"Timeline {tenant_id}/{timeline_id} is not found neither locally nor remotely"
)));
)))
} else {
json_response(
StatusCode::OK,
TimelineInfo {
tenant_id,
timeline_id,
local: local_timeline_info,
remote: remote_timeline_info,
},
)
}
let timeline_info = TimelineInfo {
tenant_id,
timeline_id,
local: local_timeline_info,
remote: remote_timeline_info,
};
json_response(StatusCode::OK, timeline_info)
}
// TODO makes sense to provide tenant config right away the same way as it handled in tenant_create
async fn tenant_attach_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
info!("Handling tenant attach {}", tenant_id);
info!("Handling tenant attach {tenant_id}");
tokio::task::spawn_blocking(move || {
if tenant_mgr::get_tenant_state(tenant_id).is_some() {
anyhow::bail!("Tenant is already present locally")
};
Ok(())
tokio::task::spawn_blocking(move || match tenant_mgr::get_tenant(tenant_id, false) {
Ok(_) => Err(ApiError::Conflict(
"Tenant is already present locally".to_owned(),
)),
Err(_) => Ok(()),
})
.await
.map_err(ApiError::from_err)??;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))??;
let state = get_state(&request);
let remote_index = &state.remote_index;
@@ -354,12 +368,12 @@ async fn tenant_attach_handler(request: Request<Body>) -> Result<Response<Body>,
// download index parts for every tenant timeline
let remote_timelines = match gather_tenant_timelines_index_parts(state, tenant_id).await {
Ok(Some(remote_timelines)) => remote_timelines,
Ok(None) => return Err(ApiError::NotFound("Unknown remote tenant".to_string())),
Ok(None) => return Err(ApiError::NotFound(anyhow!("Unknown remote tenant"))),
Err(e) => {
error!("Failed to retrieve remote tenant data: {:?}", e);
return Err(ApiError::NotFound(
"Failed to retrieve remote tenant".to_string(),
));
return Err(ApiError::NotFound(anyhow!(
"Failed to retrieve remote tenant"
)));
}
};
@@ -382,7 +396,8 @@ async fn tenant_attach_handler(request: Request<Body>) -> Result<Response<Body>,
for (timeline_id, mut remote_timeline) in remote_timelines {
tokio::fs::create_dir_all(state.conf.timeline_path(&timeline_id, &tenant_id))
.await
.context("Failed to create new timeline directory")?;
.context("Failed to create new timeline directory")
.map_err(ApiError::InternalServerError)?;
remote_timeline.awaits_download = true;
tenant_entry.insert(timeline_id, remote_timeline);
@@ -397,8 +412,8 @@ async fn tenant_attach_handler(request: Request<Body>) -> Result<Response<Body>,
/// for details see comment to `storage_sync::gather_tenant_timelines_index_parts`
async fn gather_tenant_timelines_index_parts(
state: &State,
tenant_id: ZTenantId,
) -> anyhow::Result<Option<Vec<(ZTimelineId, RemoteTimeline)>>> {
tenant_id: TenantId,
) -> anyhow::Result<Option<Vec<(TimelineId, RemoteTimeline)>>> {
let index_parts = match state.remote_storage.as_ref() {
Some(storage) => {
storage_sync::gather_tenant_timelines_index_parts(state.conf, storage, tenant_id).await
@@ -420,18 +435,21 @@ async fn gather_tenant_timelines_index_parts(
}
async fn timeline_delete_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: ZTimelineId = parse_request_param(&request, "timeline_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
check_permission(&request, Some(tenant_id))?;
let state = get_state(&request);
tenant_mgr::delete_timeline(tenant_id, timeline_id)
.instrument(info_span!("timeline_delete", tenant = %tenant_id))
.instrument(info_span!("timeline_delete", tenant = %tenant_id, timeline = %timeline_id))
.await
.map_err(ApiError::from_err)?;
// FIXME: Errors from `delete_timeline` can occur for a number of reasons, incuding both
// user and internal errors. Replace this with better handling once the error type permits
// it.
.map_err(ApiError::InternalServerError)?;
let mut remote_index = state.remote_index.write().await;
remote_index.remove_timeline_entry(ZTenantTimelineId {
remote_index.remove_timeline_entry(TenantTimelineId {
tenant_id,
timeline_id,
});
@@ -440,7 +458,7 @@ async fn timeline_delete_handler(request: Request<Body>) -> Result<Response<Body
}
async fn tenant_detach_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
let state = get_state(&request);
@@ -448,7 +466,9 @@ async fn tenant_detach_handler(request: Request<Body>) -> Result<Response<Body>,
tenant_mgr::detach_tenant(conf, tenant_id)
.instrument(info_span!("tenant_detach", tenant = %tenant_id))
.await
.map_err(ApiError::from_err)?;
// FIXME: Errors from `detach_tenant` can be caused by both both user and internal errors.
// Replace this with better handling once the error type permits it.
.map_err(ApiError::InternalServerError)?;
let mut remote_index = state.remote_index.write().await;
remote_index.remove_tenant_entry(&tenant_id);
@@ -468,19 +488,19 @@ async fn tenant_list_handler(request: Request<Body>) -> Result<Response<Body>, A
crate::tenant_mgr::list_tenant_info(&remote_index)
})
.await
.map_err(ApiError::from_err)?;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))?;
json_response(StatusCode::OK, response_data)
}
async fn tenant_status(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: ZTenantId = parse_request_param(&request, "tenant_id")?;
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
// if tenant is in progress of downloading it can be absent in global tenant map
let tenant_state = tokio::task::spawn_blocking(move || tenant_mgr::get_tenant_state(tenant_id))
let tenant = tokio::task::spawn_blocking(move || tenant_mgr::get_tenant(tenant_id, false))
.await
.map_err(ApiError::from_err)?;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))?;
let state = get_state(&request);
let remote_index = &state.remote_index;
@@ -494,13 +514,25 @@ async fn tenant_status(request: Request<Body>) -> Result<Response<Body>, ApiErro
false
});
let tenant_state = match tenant {
Ok(tenant) => tenant.current_state(),
Err(e) => {
error!("Failed to get local tenant state: {e:#}");
if has_in_progress_downloads {
TenantState::Paused
} else {
TenantState::Broken
}
}
};
let current_physical_size =
match tokio::task::spawn_blocking(move || list_local_timelines(tenant_id, false, false))
.await
.map_err(ApiError::from_err)?
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))?
{
Err(err) => {
// Getting local timelines can fail when no local repo is on disk (e.g, when tenant data is being downloaded).
// Getting local timelines can fail when no local tenant directory is on disk (e.g, when tenant data is being downloaded).
// In that case, put a warning message into log and operate normally.
warn!("Failed to get local timelines for tenant {tenant_id}: {err}");
None
@@ -523,6 +555,16 @@ async fn tenant_status(request: Request<Body>) -> Result<Response<Body>, ApiErro
)
}
// Helper function to standardize the error messages we produce on bad durations
//
// Intended to be used with anyhow's `with_context`, e.g.:
//
// let value = result.with_context(bad_duration("name", &value))?;
//
fn bad_duration<'a>(field_name: &'static str, value: &'a str) -> impl 'a + Fn() -> String {
move || format!("Cannot parse `{field_name}` duration {value:?}")
}
async fn tenant_create_handler(mut request: Request<Body>) -> Result<Response<Body>, ApiError> {
check_permission(&request, None)?;
@@ -531,25 +573,39 @@ async fn tenant_create_handler(mut request: Request<Body>) -> Result<Response<Bo
let mut tenant_conf = TenantConfOpt::default();
if let Some(gc_period) = request_data.gc_period {
tenant_conf.gc_period =
Some(humantime::parse_duration(&gc_period).map_err(ApiError::from_err)?);
tenant_conf.gc_period = Some(
humantime::parse_duration(&gc_period)
.with_context(bad_duration("gc_period", &gc_period))
.map_err(ApiError::BadRequest)?,
);
}
tenant_conf.gc_horizon = request_data.gc_horizon;
tenant_conf.image_creation_threshold = request_data.image_creation_threshold;
if let Some(pitr_interval) = request_data.pitr_interval {
tenant_conf.pitr_interval =
Some(humantime::parse_duration(&pitr_interval).map_err(ApiError::from_err)?);
tenant_conf.pitr_interval = Some(
humantime::parse_duration(&pitr_interval)
.with_context(bad_duration("pitr_interval", &pitr_interval))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(walreceiver_connect_timeout) = request_data.walreceiver_connect_timeout {
tenant_conf.walreceiver_connect_timeout = Some(
humantime::parse_duration(&walreceiver_connect_timeout).map_err(ApiError::from_err)?,
humantime::parse_duration(&walreceiver_connect_timeout)
.with_context(bad_duration(
"walreceiver_connect_timeout",
&walreceiver_connect_timeout,
))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(lagging_wal_timeout) = request_data.lagging_wal_timeout {
tenant_conf.lagging_wal_timeout =
Some(humantime::parse_duration(&lagging_wal_timeout).map_err(ApiError::from_err)?);
tenant_conf.lagging_wal_timeout = Some(
humantime::parse_duration(&lagging_wal_timeout)
.with_context(bad_duration("lagging_wal_timeout", &lagging_wal_timeout))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag {
tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag);
@@ -557,31 +613,40 @@ async fn tenant_create_handler(mut request: Request<Body>) -> Result<Response<Bo
tenant_conf.checkpoint_distance = request_data.checkpoint_distance;
if let Some(checkpoint_timeout) = request_data.checkpoint_timeout {
tenant_conf.checkpoint_timeout =
Some(humantime::parse_duration(&checkpoint_timeout).map_err(ApiError::from_err)?);
tenant_conf.checkpoint_timeout = Some(
humantime::parse_duration(&checkpoint_timeout)
.with_context(bad_duration("checkpoint_timeout", &checkpoint_timeout))
.map_err(ApiError::BadRequest)?,
);
}
tenant_conf.compaction_target_size = request_data.compaction_target_size;
tenant_conf.compaction_threshold = request_data.compaction_threshold;
if let Some(compaction_period) = request_data.compaction_period {
tenant_conf.compaction_period =
Some(humantime::parse_duration(&compaction_period).map_err(ApiError::from_err)?);
tenant_conf.compaction_period = Some(
humantime::parse_duration(&compaction_period)
.with_context(bad_duration("compaction_period", &compaction_period))
.map_err(ApiError::BadRequest)?,
);
}
let target_tenant_id = request_data
.new_tenant_id
.map(ZTenantId::from)
.unwrap_or_else(ZTenantId::generate);
.map(TenantId::from)
.unwrap_or_else(TenantId::generate);
let new_tenant_id = tokio::task::spawn_blocking(move || {
let _enter = info_span!("tenant_create", tenant = ?target_tenant_id).entered();
let conf = get_config(&request);
tenant_mgr::create_tenant(conf, tenant_conf, target_tenant_id, remote_index)
// FIXME: `create_tenant` can fail from both user and internal errors. Replace this
// with better error handling once the type permits it
.map_err(ApiError::InternalServerError)
})
.await
.map_err(ApiError::from_err)??;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))??;
Ok(match new_tenant_id {
Some(id) => json_response(StatusCode::CREATED, TenantCreateResponse(id))?,
@@ -596,24 +661,38 @@ async fn tenant_config_handler(mut request: Request<Body>) -> Result<Response<Bo
let mut tenant_conf: TenantConfOpt = Default::default();
if let Some(gc_period) = request_data.gc_period {
tenant_conf.gc_period =
Some(humantime::parse_duration(&gc_period).map_err(ApiError::from_err)?);
tenant_conf.gc_period = Some(
humantime::parse_duration(&gc_period)
.with_context(bad_duration("gc_period", &gc_period))
.map_err(ApiError::BadRequest)?,
);
}
tenant_conf.gc_horizon = request_data.gc_horizon;
tenant_conf.image_creation_threshold = request_data.image_creation_threshold;
if let Some(pitr_interval) = request_data.pitr_interval {
tenant_conf.pitr_interval =
Some(humantime::parse_duration(&pitr_interval).map_err(ApiError::from_err)?);
tenant_conf.pitr_interval = Some(
humantime::parse_duration(&pitr_interval)
.with_context(bad_duration("pitr_interval", &pitr_interval))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(walreceiver_connect_timeout) = request_data.walreceiver_connect_timeout {
tenant_conf.walreceiver_connect_timeout = Some(
humantime::parse_duration(&walreceiver_connect_timeout).map_err(ApiError::from_err)?,
humantime::parse_duration(&walreceiver_connect_timeout)
.with_context(bad_duration(
"walreceiver_connect_timeout",
&walreceiver_connect_timeout,
))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(lagging_wal_timeout) = request_data.lagging_wal_timeout {
tenant_conf.lagging_wal_timeout =
Some(humantime::parse_duration(&lagging_wal_timeout).map_err(ApiError::from_err)?);
tenant_conf.lagging_wal_timeout = Some(
humantime::parse_duration(&lagging_wal_timeout)
.with_context(bad_duration("lagging_wal_timeout", &lagging_wal_timeout))
.map_err(ApiError::BadRequest)?,
);
}
if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag {
tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag);
@@ -621,15 +700,21 @@ async fn tenant_config_handler(mut request: Request<Body>) -> Result<Response<Bo
tenant_conf.checkpoint_distance = request_data.checkpoint_distance;
if let Some(checkpoint_timeout) = request_data.checkpoint_timeout {
tenant_conf.checkpoint_timeout =
Some(humantime::parse_duration(&checkpoint_timeout).map_err(ApiError::from_err)?);
tenant_conf.checkpoint_timeout = Some(
humantime::parse_duration(&checkpoint_timeout)
.with_context(bad_duration("checkpoint_timeout", &checkpoint_timeout))
.map_err(ApiError::BadRequest)?,
);
}
tenant_conf.compaction_target_size = request_data.compaction_target_size;
tenant_conf.compaction_threshold = request_data.compaction_threshold;
if let Some(compaction_period) = request_data.compaction_period {
tenant_conf.compaction_period =
Some(humantime::parse_duration(&compaction_period).map_err(ApiError::from_err)?);
tenant_conf.compaction_period = Some(
humantime::parse_duration(&compaction_period)
.with_context(bad_duration("compaction_period", &compaction_period))
.map_err(ApiError::BadRequest)?,
);
}
tokio::task::spawn_blocking(move || {
@@ -637,9 +722,114 @@ async fn tenant_config_handler(mut request: Request<Body>) -> Result<Response<Bo
let state = get_state(&request);
tenant_mgr::update_tenant_config(state.conf, tenant_conf, tenant_id)
// FIXME: `update_tenant_config` can fail because of both user and internal errors.
// Replace this `map_err` with better error handling once the type permits it
.map_err(ApiError::InternalServerError)
})
.await
.map_err(ApiError::from_err)??;
.map_err(|e: JoinError| ApiError::InternalServerError(e.into()))??;
json_response(StatusCode::OK, ())
}
#[cfg(any(feature = "testing", feature = "failpoints"))]
async fn failpoints_handler(mut request: Request<Body>) -> Result<Response<Body>, ApiError> {
if !fail::has_failpoints() {
return Err(ApiError::BadRequest(anyhow!(
"Cannot manage failpoints because pageserver was compiled without failpoints support"
)));
}
let failpoints: ConfigureFailpointsRequest = json_request(&mut request).await?;
for fp in failpoints {
info!("cfg failpoint: {} {}", fp.name, fp.actions);
// We recognize one extra "action" that's not natively recognized
// by the failpoints crate: exit, to immediately kill the process
let cfg_result = if fp.actions == "exit" {
fail::cfg_callback(fp.name, || {
info!("Exit requested by failpoint");
std::process::exit(1);
})
} else {
fail::cfg(fp.name, &fp.actions)
};
if let Err(err_msg) = cfg_result {
return Err(ApiError::BadRequest(anyhow!(
"Failed to configure failpoints: {err_msg}"
)));
}
}
json_response(StatusCode::OK, ())
}
// Run GC immediately on given timeline.
// FIXME: This is just for tests. See test_runner/regress/test_gc.py.
// This probably should require special authentication or a global flag to
// enable, I don't think we want to or need to allow regular clients to invoke
// GC.
// @hllinnaka in commits ec44f4b29, 3aca717f3
#[cfg(feature = "testing")]
async fn timeline_gc_handler(mut request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
check_permission(&request, Some(tenant_id))?;
// FIXME: currently this will return a 500 error on bad tenant id; it should be 4XX
let repo = tenant_mgr::get_tenant(tenant_id, false).map_err(ApiError::NotFound)?;
let gc_req: TimelineGcRequest = json_request(&mut request).await?;
let _span_guard =
info_span!("manual_gc", tenant = %tenant_id, timeline = %timeline_id).entered();
let gc_horizon = gc_req.gc_horizon.unwrap_or_else(|| repo.get_gc_horizon());
// Use tenant's pitr setting
let pitr = repo.get_pitr_interval();
let result = repo
.gc_iteration(Some(timeline_id), gc_horizon, pitr, true)
// FIXME: `gc_iteration` can return an error for multiple reasons; we should handle it
// better once the types support it.
.map_err(ApiError::InternalServerError)?;
json_response(StatusCode::OK, result)
}
// Run compaction immediately on given timeline.
// FIXME This is just for tests. Don't expect this to be exposed to
// the users or the api.
// @dhammika in commit a0781f229
#[cfg(feature = "testing")]
async fn timeline_compact_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
check_permission(&request, Some(tenant_id))?;
let repo = tenant_mgr::get_tenant(tenant_id, true).map_err(ApiError::NotFound)?;
let timeline = repo
.get_timeline(timeline_id)
.with_context(|| format!("No timeline {timeline_id} in repository for tenant {tenant_id}"))
.map_err(ApiError::NotFound)?;
timeline.compact().map_err(ApiError::InternalServerError)?;
json_response(StatusCode::OK, ())
}
// Run checkpoint immediately on given timeline.
#[cfg(feature = "testing")]
async fn timeline_checkpoint_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
check_permission(&request, Some(tenant_id))?;
let repo = tenant_mgr::get_tenant(tenant_id, true).map_err(ApiError::NotFound)?;
let timeline = repo
.get_timeline(timeline_id)
.with_context(|| format!("No timeline {timeline_id} in repository for tenant {tenant_id}"))
.map_err(ApiError::NotFound)?;
timeline
.checkpoint(CheckpointConfig::Forced)
.map_err(ApiError::InternalServerError)?;
json_response(StatusCode::OK, ())
}
@@ -670,12 +860,35 @@ pub fn make_router(
}))
}
macro_rules! testing_api {
($handler_desc:literal, $handler:path $(,)?) => {{
#[cfg(not(feature = "testing"))]
async fn cfg_disabled(_req: Request<Body>) -> Result<Response<Body>, ApiError> {
Err(ApiError::BadRequest(anyhow!(concat!(
"Cannot ",
$handler_desc,
" because pageserver was compiled without testing APIs",
))))
}
#[cfg(feature = "testing")]
let handler = $handler;
#[cfg(not(feature = "testing"))]
let handler = cfg_disabled;
handler
}};
}
Ok(router
.data(Arc::new(
State::new(conf, auth, remote_index, remote_storage)
.context("Failed to initialize router state")?,
))
.get("/v1/status", status_handler)
.put(
"/v1/failpoints",
testing_api!("manage failpoints", failpoints_handler),
)
.get("/v1/tenant", tenant_list_handler)
.post("/v1/tenant", tenant_create_handler)
.get("/v1/tenant/:tenant_id", tenant_status)
@@ -688,6 +901,18 @@ pub fn make_router(
"/v1/tenant/:tenant_id/timeline/:timeline_id",
timeline_detail_handler,
)
.put(
"/v1/tenant/:tenant_id/timeline/:timeline_id/do_gc",
testing_api!("run timeline GC", timeline_gc_handler),
)
.put(
"/v1/tenant/:tenant_id/timeline/:timeline_id/compact",
testing_api!("run timeline compaction", timeline_compact_handler),
)
.put(
"/v1/tenant/:tenant_id/timeline/:timeline_id/checkpoint",
testing_api!("run timeline checkpoint", timeline_checkpoint_handler),
)
.delete(
"/v1/tenant/:tenant_id/timeline/:timeline_id",
timeline_delete_handler,

View File

@@ -1,6 +1,6 @@
//!
//! Import data and WAL from a PostgreSQL data directory and WAL segments into
//! a zenith Timeline.
//! a neon Timeline.
//!
use std::fs::File;
use std::io::{Read, Seek, SeekFrom};
@@ -11,16 +11,18 @@ use bytes::Bytes;
use tracing::*;
use walkdir::WalkDir;
use crate::layered_repository::Timeline;
use crate::pgdatadir_mapping::*;
use crate::reltag::{RelTag, SlruKind};
use crate::tenant::Timeline;
use crate::walingest::WalIngest;
use crate::walrecord::DecodedWALRecord;
use postgres_ffi::v14::relfile_utils::*;
use postgres_ffi::v14::waldecoder::*;
use postgres_ffi::v14::xlog_utils::*;
use postgres_ffi::v14::{pg_constants, ControlFileData, DBState_DB_SHUTDOWNED};
use postgres_ffi::pg_constants;
use postgres_ffi::relfile_utils::*;
use postgres_ffi::waldecoder::WalStreamDecoder;
use postgres_ffi::ControlFileData;
use postgres_ffi::DBState_DB_SHUTDOWNED;
use postgres_ffi::Oid;
use postgres_ffi::XLogFileName;
use postgres_ffi::{BLCKSZ, WAL_SEGMENT_SIZE};
use utils::lsn::Lsn;
@@ -236,7 +238,7 @@ fn import_slru<Reader: Read>(
/// Scan PostgreSQL WAL files in given directory and load all records between
/// 'startpoint' and 'endpoint' into the repository.
fn import_wal(walpath: &Path, tline: &Timeline, startpoint: Lsn, endpoint: Lsn) -> Result<()> {
let mut waldecoder = WalStreamDecoder::new(startpoint);
let mut waldecoder = WalStreamDecoder::new(startpoint, tline.pg_version);
let mut segno = startpoint.segment_number(WAL_SEGMENT_SIZE);
let mut offset = startpoint.segment_offset(WAL_SEGMENT_SIZE);
@@ -354,7 +356,7 @@ pub fn import_wal_from_tar<Reader: Read>(
end_lsn: Lsn,
) -> Result<()> {
// Set up walingest mutable state
let mut waldecoder = WalStreamDecoder::new(start_lsn);
let mut waldecoder = WalStreamDecoder::new(start_lsn, tline.pg_version);
let mut segno = start_lsn.segment_number(WAL_SEGMENT_SIZE);
let mut offset = start_lsn.segment_offset(WAL_SEGMENT_SIZE);
let mut last_lsn = start_lsn;
@@ -439,7 +441,7 @@ fn import_file<Reader: Read>(
len: usize,
) -> Result<Option<ControlFileData>> {
if file_path.starts_with("global") {
let spcnode = pg_constants::GLOBALTABLESPACE_OID;
let spcnode = postgres_ffi::pg_constants::GLOBALTABLESPACE_OID;
let dbnode = 0;
match file_path
@@ -467,7 +469,7 @@ fn import_file<Reader: Read>(
debug!("imported relmap file")
}
"PG_VERSION" => {
debug!("ignored");
debug!("ignored PG_VERSION file");
}
_ => {
import_rel(modification, file_path, spcnode, dbnode, reader, len)?;
@@ -495,7 +497,7 @@ fn import_file<Reader: Read>(
debug!("imported relmap file")
}
"PG_VERSION" => {
debug!("ignored");
debug!("ignored PG_VERSION file");
}
_ => {
import_rel(modification, file_path, spcnode, dbnode, reader, len)?;

View File

@@ -3,7 +3,6 @@ pub mod config;
pub mod http;
pub mod import_datadir;
pub mod keyspace;
pub mod layered_repository;
pub mod metrics;
pub mod page_cache;
pub mod page_service;
@@ -13,10 +12,10 @@ pub mod reltag;
pub mod repository;
pub mod storage_sync;
pub mod task_mgr;
pub mod tenant;
pub mod tenant_config;
pub mod tenant_mgr;
pub mod tenant_tasks;
pub mod timelines;
pub mod virtual_file;
pub mod walingest;
pub mod walreceiver;
@@ -26,17 +25,21 @@ pub mod walredo;
use std::collections::HashMap;
use tracing::info;
use utils::zid::{ZTenantId, ZTimelineId};
use utils::id::{TenantId, TimelineId};
use crate::task_mgr::TaskKind;
/// Current storage format version
///
/// This is embedded in the metadata file, and also in the header of all the
/// layer files. If you make any backwards-incompatible changes to the storage
/// This is embedded in the header of all the layer files.
/// If you make any backwards-incompatible changes to the storage
/// format, bump this!
/// Note that TimelineMetadata uses its own version number to track
/// backwards-compatible changes to the metadata format.
pub const STORAGE_FORMAT_VERSION: u16 = 3;
pub const DEFAULT_PG_VERSION: u32 = 14;
// Magic constants used to identify different kinds of files
pub const IMAGE_FILE_MAGIC: u16 = 0x5A60;
pub const DELTA_FILE_MAGIC: u16 = 0x5A61;
@@ -105,12 +108,12 @@ fn exponential_backoff_duration_seconds(n: u32, base_increment: f64, max_seconds
}
/// A newtype to store arbitrary data grouped by tenant and timeline ids.
/// One could use [`utils::zid::ZTenantTimelineId`] for grouping, but that would
/// One could use [`utils::id::TenantTimelineId`] for grouping, but that would
/// not include the cases where a certain tenant has zero timelines.
/// This is sometimes important: a tenant could be registered during initial load from FS,
/// even if he has no timelines on disk.
#[derive(Debug)]
pub struct TenantTimelineValues<T>(HashMap<ZTenantId, HashMap<ZTimelineId, T>>);
pub struct TenantTimelineValues<T>(HashMap<TenantId, HashMap<TimelineId, T>>);
impl<T> TenantTimelineValues<T> {
fn new() -> Self {
@@ -181,14 +184,14 @@ mod backoff_defaults_tests {
#[cfg(test)]
mod tests {
use crate::layered_repository::repo_harness::TIMELINE_ID;
use crate::tenant::harness::TIMELINE_ID;
use super::*;
#[test]
fn tenant_timeline_value_mapping() {
let first_tenant = ZTenantId::generate();
let second_tenant = ZTenantId::generate();
let first_tenant = TenantId::generate();
let second_tenant = TenantId::generate();
assert_ne!(first_tenant, second_tenant);
let mut initial = TenantTimelineValues::new();

View File

@@ -5,7 +5,7 @@ use metrics::{
IntCounter, IntCounterVec, IntGauge, IntGaugeVec, UIntGauge, UIntGaugeVec,
};
use once_cell::sync::Lazy;
use utils::zid::{ZTenantId, ZTimelineId};
use utils::id::{TenantId, TimelineId};
/// Prometheus histogram buckets (in seconds) that capture the majority of
/// latencies in the microsecond range but also extend far enough up to distinguish
@@ -327,7 +327,7 @@ pub struct TimelineMetrics {
}
impl TimelineMetrics {
pub fn new(tenant_id: &ZTenantId, timeline_id: &ZTimelineId) -> Self {
pub fn new(tenant_id: &TenantId, timeline_id: &TimelineId) -> Self {
let tenant_id = tenant_id.to_string();
let timeline_id = timeline_id.to_string();
let reconstruct_time_histo = RECONSTRUCT_TIME
@@ -414,6 +414,6 @@ impl Drop for TimelineMetrics {
}
}
pub fn remove_tenant_metrics(tenant_id: &ZTenantId) {
pub fn remove_tenant_metrics(tenant_id: &TenantId) {
let _ = STORAGE_TIME.remove_label_values(&["gc", &tenant_id.to_string(), "-"]);
}

View File

@@ -49,12 +49,12 @@ use anyhow::Context;
use once_cell::sync::OnceCell;
use tracing::error;
use utils::{
id::{TenantId, TimelineId},
lsn::Lsn,
zid::{ZTenantId, ZTimelineId},
};
use crate::layered_repository::writeback_ephemeral_file;
use crate::repository::Key;
use crate::tenant::writeback_ephemeral_file;
static PAGE_CACHE: OnceCell<PageCache> = OnceCell::new();
const TEST_PAGE_CACHE_SIZE: usize = 50;
@@ -109,8 +109,8 @@ enum CacheKey {
#[derive(Debug, PartialEq, Eq, Hash, Clone)]
struct MaterializedPageHashKey {
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
key: Key,
}
@@ -308,8 +308,8 @@ impl PageCache {
/// returned page.
pub fn lookup_materialized_page(
&self,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
key: &Key,
lsn: Lsn,
) -> Option<(Lsn, PageReadGuard)> {
@@ -338,8 +338,8 @@ impl PageCache {
///
pub fn memorize_materialized_page(
&self,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
key: Key,
lsn: Lsn,
img: &[u8],

View File

@@ -23,29 +23,29 @@ use tokio_util::io::SyncIoBridge;
use tracing::*;
use utils::{
auth::{self, Claims, JwtAuth, Scope},
id::{TenantId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
postgres_backend_async::{self, PostgresBackend},
pq_proto::{BeMessage, FeMessage, RowDescriptor, SINGLE_COL_ROWDESC},
pq_proto::{BeMessage, FeMessage, RowDescriptor},
simple_rcu::RcuReadGuard,
zid::{ZTenantId, ZTimelineId},
};
use crate::basebackup;
use crate::config::{PageServerConf, ProfilingConfig};
use crate::import_datadir::{import_basebackup_from_tar, import_wal_from_tar};
use crate::layered_repository::Timeline;
use crate::metrics::{LIVE_CONNECTIONS_COUNT, SMGR_QUERY_TIME};
use crate::pgdatadir_mapping::LsnForTimestamp;
use crate::profiling::profpoint_start;
use crate::reltag::RelTag;
use crate::task_mgr;
use crate::task_mgr::TaskKind;
use crate::tenant::Timeline;
use crate::tenant_mgr;
use crate::CheckpointConfig;
use postgres_ffi::v14::xlog_utils::to_pg_timestamp;
use postgres_ffi::v14::pg_constants::DEFAULTTABLESPACE_OID;
use postgres_ffi::pg_constants::DEFAULTTABLESPACE_OID;
use postgres_ffi::to_pg_timestamp;
use postgres_ffi::BLCKSZ;
// Wrapped in libpq CopyData
@@ -123,7 +123,7 @@ impl PagestreamFeMessage {
fn parse(mut body: Bytes) -> anyhow::Result<PagestreamFeMessage> {
// TODO these gets can fail
// these correspond to the ZenithMessageTag enum in pagestore_client.h
// these correspond to the NeonMessageTag enum in pagestore_client.h
//
// TODO: consider using protobuf or serde bincode for less error prone
// serialization.
@@ -362,6 +362,39 @@ async fn page_service_conn_main(
}
}
struct PageRequestMetrics {
get_rel_exists: metrics::Histogram,
get_rel_size: metrics::Histogram,
get_page_at_lsn: metrics::Histogram,
get_db_size: metrics::Histogram,
}
impl PageRequestMetrics {
fn new(tenant_id: &TenantId, timeline_id: &TimelineId) -> Self {
let tenant_id = tenant_id.to_string();
let timeline_id = timeline_id.to_string();
let get_rel_exists =
SMGR_QUERY_TIME.with_label_values(&["get_rel_exists", &tenant_id, &timeline_id]);
let get_rel_size =
SMGR_QUERY_TIME.with_label_values(&["get_rel_size", &tenant_id, &timeline_id]);
let get_page_at_lsn =
SMGR_QUERY_TIME.with_label_values(&["get_page_at_lsn", &tenant_id, &timeline_id]);
let get_db_size =
SMGR_QUERY_TIME.with_label_values(&["get_db_size", &tenant_id, &timeline_id]);
Self {
get_rel_exists,
get_rel_size,
get_page_at_lsn,
get_db_size,
}
}
}
#[derive(Debug)]
struct PageServerHandler {
conf: &'static PageServerConf,
@@ -382,8 +415,8 @@ impl PageServerHandler {
async fn handle_pagerequests(
&self,
pgb: &mut PostgresBackend,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
) -> anyhow::Result<()> {
// NOTE: pagerequests handler exits when connection is closed,
// so there is no need to reset the association
@@ -396,6 +429,8 @@ impl PageServerHandler {
pgb.write_message(&BeMessage::CopyBothResponse)?;
pgb.flush().await?;
let metrics = PageRequestMetrics::new(&tenant_id, &timeline_id);
loop {
let msg = tokio::select! {
biased;
@@ -417,35 +452,25 @@ impl PageServerHandler {
None => break, // client disconnected
};
trace!("query: {:?}", copy_data_bytes);
trace!("query: {copy_data_bytes:?}");
let zenith_fe_msg = PagestreamFeMessage::parse(copy_data_bytes)?;
let tenant_str = tenant_id.to_string();
let timeline_str = timeline_id.to_string();
let neon_fe_msg = PagestreamFeMessage::parse(copy_data_bytes)?;
let response = match zenith_fe_msg {
let response = match neon_fe_msg {
PagestreamFeMessage::Exists(req) => {
let _timer = SMGR_QUERY_TIME
.with_label_values(&["get_rel_exists", &tenant_str, &timeline_str])
.start_timer();
let _timer = metrics.get_rel_exists.start_timer();
self.handle_get_rel_exists_request(&timeline, &req).await
}
PagestreamFeMessage::Nblocks(req) => {
let _timer = SMGR_QUERY_TIME
.with_label_values(&["get_rel_size", &tenant_str, &timeline_str])
.start_timer();
let _timer = metrics.get_rel_size.start_timer();
self.handle_get_nblocks_request(&timeline, &req).await
}
PagestreamFeMessage::GetPage(req) => {
let _timer = SMGR_QUERY_TIME
.with_label_values(&["get_page_at_lsn", &tenant_str, &timeline_str])
.start_timer();
let _timer = metrics.get_page_at_lsn.start_timer();
self.handle_get_page_at_lsn_request(&timeline, &req).await
}
PagestreamFeMessage::DbSize(req) => {
let _timer = SMGR_QUERY_TIME
.with_label_values(&["get_db_size", &tenant_str, &timeline_str])
.start_timer();
let _timer = metrics.get_db_size.start_timer();
self.handle_db_size_request(&timeline, &req).await
}
};
@@ -469,16 +494,20 @@ impl PageServerHandler {
async fn handle_import_basebackup(
&self,
pgb: &mut PostgresBackend,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
base_lsn: Lsn,
_end_lsn: Lsn,
pg_version: u32,
) -> anyhow::Result<()> {
task_mgr::associate_with(Some(tenant_id), Some(timeline_id));
// Create empty timeline
info!("creating new timeline");
let repo = tenant_mgr::get_repository_for_tenant(tenant_id)?;
let timeline = repo.create_empty_timeline(timeline_id, base_lsn)?;
let timeline = tenant_mgr::get_tenant(tenant_id, true)?.create_empty_timeline(
timeline_id,
base_lsn,
pg_version,
)?;
// TODO mark timeline as not ready until it reaches end_lsn.
// We might have some wal to import as well, and we should prevent compute
@@ -532,17 +561,14 @@ impl PageServerHandler {
async fn handle_import_wal(
&self,
pgb: &mut PostgresBackend,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
start_lsn: Lsn,
end_lsn: Lsn,
) -> anyhow::Result<()> {
task_mgr::associate_with(Some(tenant_id), Some(timeline_id));
let repo = tenant_mgr::get_repository_for_tenant(tenant_id)?;
let timeline = repo
.get_timeline(timeline_id)
.with_context(|| format!("Timeline {timeline_id} was not found"))?;
let timeline = get_local_timeline(tenant_id, timeline_id)?;
ensure!(timeline.get_last_record_lsn() == start_lsn);
// TODO leave clean state on error. For now you can use detach to clean
@@ -728,8 +754,8 @@ impl PageServerHandler {
async fn handle_basebackup_request(
&self,
pgb: &mut PostgresBackend,
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
lsn: Option<Lsn>,
prev_lsn: Option<Lsn>,
full_backup: bool,
@@ -770,7 +796,7 @@ impl PageServerHandler {
// when accessing management api supply None as an argument
// when using to authorize tenant pass corresponding tenant id
fn check_permission(&self, tenantid: Option<ZTenantId>) -> Result<()> {
fn check_permission(&self, tenant_id: Option<TenantId>) -> Result<()> {
if self.auth.is_none() {
// auth is set to Trust, nothing to check so just return ok
return Ok(());
@@ -782,7 +808,7 @@ impl PageServerHandler {
.claims
.as_ref()
.expect("claims presence already checked");
auth::check_permission(claims, tenantid)
auth::check_permission(claims, tenant_id)
}
}
@@ -793,7 +819,7 @@ impl postgres_backend_async::Handler for PageServerHandler {
_pgb: &mut PostgresBackend,
jwt_response: &[u8],
) -> anyhow::Result<()> {
// this unwrap is never triggered, because check_auth_jwt only called when auth_type is ZenithJWT
// this unwrap is never triggered, because check_auth_jwt only called when auth_type is NeonJWT
// which requires auth to be present
let data = self
.auth
@@ -809,7 +835,7 @@ impl postgres_backend_async::Handler for PageServerHandler {
}
info!(
"jwt auth succeeded for scope: {:#?} by tenantid: {:?}",
"jwt auth succeeded for scope: {:#?} by tenant id: {:?}",
data.claims.scope, data.claims.tenant_id,
);
@@ -831,8 +857,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
params.len() == 2,
"invalid param number for pagestream command"
);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
self.check_permission(Some(tenant_id))?;
@@ -847,8 +873,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
"invalid param number for basebackup command"
);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
self.check_permission(Some(tenant_id))?;
@@ -873,8 +899,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
"invalid param number for get_last_record_rlsn command"
);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
self.check_permission(Some(tenant_id))?;
let timeline = get_local_timeline(tenant_id, timeline_id)?;
@@ -901,8 +927,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
"invalid param number for fullbackup command"
);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
// The caller is responsible for providing correct lsn and prev_lsn.
let lsn = if params.len() > 2 {
@@ -933,19 +959,27 @@ impl postgres_backend_async::Handler for PageServerHandler {
// 1. Get start/end LSN from backup_manifest file
// 2. Run:
// cat my_backup/base.tar | psql -h $PAGESERVER \
// -c "import basebackup $TENANT $TIMELINE $START_LSN $END_LSN"
// -c "import basebackup $TENANT $TIMELINE $START_LSN $END_LSN $PG_VERSION"
let (_, params_raw) = query_string.split_at("import basebackup ".len());
let params = params_raw.split_whitespace().collect::<Vec<_>>();
ensure!(params.len() == 4);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
ensure!(params.len() == 5);
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
let base_lsn = Lsn::from_str(params[2])?;
let end_lsn = Lsn::from_str(params[3])?;
let pg_version = u32::from_str(params[4])?;
self.check_permission(Some(tenant_id))?;
match self
.handle_import_basebackup(pgb, tenant_id, timeline_id, base_lsn, end_lsn)
.handle_import_basebackup(
pgb,
tenant_id,
timeline_id,
base_lsn,
end_lsn,
pg_version,
)
.await
{
Ok(()) => pgb.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?,
@@ -962,8 +996,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
let (_, params_raw) = query_string.split_at("import wal ".len());
let params = params_raw.split_whitespace().collect::<Vec<_>>();
ensure!(params.len() == 4);
let tenant_id = ZTenantId::from_str(params[0])?;
let timeline_id = ZTimelineId::from_str(params[1])?;
let tenant_id = TenantId::from_str(params[0])?;
let timeline_id = TimelineId::from_str(params[1])?;
let start_lsn = Lsn::from_str(params[2])?;
let end_lsn = Lsn::from_str(params[3])?;
@@ -983,38 +1017,13 @@ impl postgres_backend_async::Handler for PageServerHandler {
// important because psycopg2 executes "SET datestyle TO 'ISO'"
// on connect
pgb.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("failpoints ") {
ensure!(fail::has_failpoints(), "Cannot manage failpoints because pageserver was compiled without failpoints support");
let (_, failpoints) = query_string.split_at("failpoints ".len());
for failpoint in failpoints.split(';') {
if let Some((name, actions)) = failpoint.split_once('=') {
info!("cfg failpoint: {} {}", name, actions);
// We recognize one extra "action" that's not natively recognized
// by the failpoints crate: exit, to immediately kill the process
if actions == "exit" {
fail::cfg_callback(name, || {
info!("Exit requested by failpoint");
std::process::exit(1);
})
.unwrap();
} else {
fail::cfg(name, actions).unwrap();
}
} else {
bail!("Invalid failpoints format");
}
}
pgb.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("show ") {
// show <tenant_id>
let (_, params_raw) = query_string.split_at("show ".len());
let params = params_raw.split(' ').collect::<Vec<_>>();
ensure!(params.len() == 1, "invalid param number for config command");
let tenantid = ZTenantId::from_str(params[0])?;
let repo = tenant_mgr::get_repository_for_tenant(tenantid)?;
let tenant_id = TenantId::from_str(params[0])?;
let tenant = tenant_mgr::get_tenant(tenant_id, true)?;
pgb.write_message(&BeMessage::RowDescription(&[
RowDescriptor::int8_col(b"checkpoint_distance"),
RowDescriptor::int8_col(b"checkpoint_timeout"),
@@ -1027,112 +1036,29 @@ impl postgres_backend_async::Handler for PageServerHandler {
RowDescriptor::int8_col(b"pitr_interval"),
]))?
.write_message(&BeMessage::DataRow(&[
Some(repo.get_checkpoint_distance().to_string().as_bytes()),
Some(tenant.get_checkpoint_distance().to_string().as_bytes()),
Some(
repo.get_checkpoint_timeout()
tenant
.get_checkpoint_timeout()
.as_secs()
.to_string()
.as_bytes(),
),
Some(repo.get_compaction_target_size().to_string().as_bytes()),
Some(tenant.get_compaction_target_size().to_string().as_bytes()),
Some(
repo.get_compaction_period()
tenant
.get_compaction_period()
.as_secs()
.to_string()
.as_bytes(),
),
Some(repo.get_compaction_threshold().to_string().as_bytes()),
Some(repo.get_gc_horizon().to_string().as_bytes()),
Some(repo.get_gc_period().as_secs().to_string().as_bytes()),
Some(repo.get_image_creation_threshold().to_string().as_bytes()),
Some(repo.get_pitr_interval().as_secs().to_string().as_bytes()),
Some(tenant.get_compaction_threshold().to_string().as_bytes()),
Some(tenant.get_gc_horizon().to_string().as_bytes()),
Some(tenant.get_gc_period().as_secs().to_string().as_bytes()),
Some(tenant.get_image_creation_threshold().to_string().as_bytes()),
Some(tenant.get_pitr_interval().as_secs().to_string().as_bytes()),
]))?
.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("do_gc ") {
// Run GC immediately on given timeline.
// FIXME: This is just for tests. See test_runner/regress/test_gc.py.
// This probably should require special authentication or a global flag to
// enable, I don't think we want to or need to allow regular clients to invoke
// GC.
// do_gc <tenant_id> <timeline_id> <gc_horizon>
let re = Regex::new(r"^do_gc ([[:xdigit:]]+)\s([[:xdigit:]]+)($|\s)([[:digit:]]+)?")
.unwrap();
let caps = re
.captures(query_string)
.with_context(|| format!("invalid do_gc: '{}'", query_string))?;
let tenant_id = ZTenantId::from_str(caps.get(1).unwrap().as_str())?;
let timeline_id = ZTimelineId::from_str(caps.get(2).unwrap().as_str())?;
let repo = tenant_mgr::get_repository_for_tenant(tenant_id)?;
let gc_horizon: u64 = caps
.get(4)
.map(|h| h.as_str().parse())
.unwrap_or_else(|| Ok(repo.get_gc_horizon()))?;
// Use tenant's pitr setting
let pitr = repo.get_pitr_interval();
let result = repo.gc_iteration(Some(timeline_id), gc_horizon, pitr, true)?;
pgb.write_message(&BeMessage::RowDescription(&[
RowDescriptor::int8_col(b"layers_total"),
RowDescriptor::int8_col(b"layers_needed_by_cutoff"),
RowDescriptor::int8_col(b"layers_needed_by_pitr"),
RowDescriptor::int8_col(b"layers_needed_by_branches"),
RowDescriptor::int8_col(b"layers_not_updated"),
RowDescriptor::int8_col(b"layers_removed"),
RowDescriptor::int8_col(b"elapsed"),
]))?
.write_message(&BeMessage::DataRow(&[
Some(result.layers_total.to_string().as_bytes()),
Some(result.layers_needed_by_cutoff.to_string().as_bytes()),
Some(result.layers_needed_by_pitr.to_string().as_bytes()),
Some(result.layers_needed_by_branches.to_string().as_bytes()),
Some(result.layers_not_updated.to_string().as_bytes()),
Some(result.layers_removed.to_string().as_bytes()),
Some(result.elapsed.as_millis().to_string().as_bytes()),
]))?
.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("compact ") {
// Run compaction immediately on given timeline.
// FIXME This is just for tests. Don't expect this to be exposed to
// the users or the api.
// compact <tenant_id> <timeline_id>
let re = Regex::new(r"^compact ([[:xdigit:]]+)\s([[:xdigit:]]+)($|\s)?").unwrap();
let caps = re
.captures(query_string)
.with_context(|| format!("Invalid compact: '{}'", query_string))?;
let tenant_id = ZTenantId::from_str(caps.get(1).unwrap().as_str())?;
let timeline_id = ZTimelineId::from_str(caps.get(2).unwrap().as_str())?;
let timeline = get_local_timeline(tenant_id, timeline_id)?;
timeline.compact()?;
pgb.write_message(&SINGLE_COL_ROWDESC)?
.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("checkpoint ") {
// Run checkpoint immediately on given timeline.
// checkpoint <tenant_id> <timeline_id>
let re = Regex::new(r"^checkpoint ([[:xdigit:]]+)\s([[:xdigit:]]+)($|\s)?").unwrap();
let caps = re
.captures(query_string)
.with_context(|| format!("invalid checkpoint command: '{}'", query_string))?;
let tenant_id = ZTenantId::from_str(caps.get(1).unwrap().as_str())?;
let timeline_id = ZTimelineId::from_str(caps.get(2).unwrap().as_str())?;
let timeline = get_local_timeline(tenant_id, timeline_id)?;
// Checkpoint the timeline and also compact it (due to `CheckpointConfig::Forced`).
timeline.checkpoint(CheckpointConfig::Forced)?;
pgb.write_message(&SINGLE_COL_ROWDESC)?
.write_message(&BeMessage::CommandComplete(b"SELECT 1"))?;
} else if query_string.starts_with("get_lsn_by_timestamp ") {
// Locate LSN of last transaction with timestamp less or equal than sppecified
// TODO lazy static
@@ -1142,8 +1068,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
.captures(query_string)
.with_context(|| format!("invalid get_lsn_by_timestamp: '{}'", query_string))?;
let tenant_id = ZTenantId::from_str(caps.get(1).unwrap().as_str())?;
let timeline_id = ZTimelineId::from_str(caps.get(2).unwrap().as_str())?;
let tenant_id = TenantId::from_str(caps.get(1).unwrap().as_str())?;
let timeline_id = TimelineId::from_str(caps.get(2).unwrap().as_str())?;
let timeline = get_local_timeline(tenant_id, timeline_id)?;
let timestamp = humantime::parse_rfc3339(caps.get(3).unwrap().as_str())?;
@@ -1168,13 +1094,8 @@ impl postgres_backend_async::Handler for PageServerHandler {
}
}
fn get_local_timeline(tenant_id: ZTenantId, timeline_id: ZTimelineId) -> Result<Arc<Timeline>> {
tenant_mgr::get_repository_for_tenant(tenant_id)
.and_then(|repo| {
repo.get_timeline(timeline_id)
.context("No timeline in tenant's repository")
})
.with_context(|| format!("Could not get timeline {timeline_id} in tenant {tenant_id}"))
fn get_local_timeline(tenant_id: TenantId, timeline_id: TimelineId) -> Result<Arc<Timeline>> {
tenant_mgr::get_tenant(tenant_id, true).and_then(|tenant| tenant.get_timeline(timeline_id))
}
///

View File

@@ -7,13 +7,13 @@
//! Clarify that)
//!
use crate::keyspace::{KeySpace, KeySpaceAccum};
use crate::layered_repository::Timeline;
use crate::reltag::{RelTag, SlruKind};
use crate::repository::*;
use crate::walrecord::ZenithWalRecord;
use crate::tenant::Timeline;
use crate::walrecord::NeonWalRecord;
use anyhow::{bail, ensure, Result};
use bytes::{Buf, Bytes};
use postgres_ffi::v14::pg_constants;
use postgres_ffi::relfile_utils::{FSM_FORKNUM, VISIBILITYMAP_FORKNUM};
use postgres_ffi::BLCKSZ;
use postgres_ffi::{Oid, TimestampTz, TransactionId};
use serde::{Deserialize, Serialize};
@@ -125,8 +125,7 @@ impl Timeline {
return Ok(nblocks);
}
if (tag.forknum == pg_constants::FSM_FORKNUM
|| tag.forknum == pg_constants::VISIBILITYMAP_FORKNUM)
if (tag.forknum == FSM_FORKNUM || tag.forknum == VISIBILITYMAP_FORKNUM)
&& !self.get_rel_exists(tag, lsn, latest)?
{
// FIXME: Postgres sometimes calls smgrcreate() to create
@@ -570,7 +569,7 @@ impl<'a> DatadirModification<'a> {
&mut self,
rel: RelTag,
blknum: BlockNumber,
rec: ZenithWalRecord,
rec: NeonWalRecord,
) -> Result<()> {
ensure!(rel.relnode != 0, "invalid relnode");
self.put(rel_block_to_key(rel, blknum), Value::WalRecord(rec));
@@ -583,7 +582,7 @@ impl<'a> DatadirModification<'a> {
kind: SlruKind,
segno: u32,
blknum: BlockNumber,
rec: ZenithWalRecord,
rec: NeonWalRecord,
) -> Result<()> {
self.put(
slru_block_to_key(kind, segno, blknum),
@@ -1090,6 +1089,7 @@ static ZERO_PAGE: Bytes = Bytes::from_static(&[0u8; BLCKSZ as usize]);
// 03 misc
// controlfile
// checkpoint
// pg_version
//
// Below is a full list of the keyspace allocation:
//
@@ -1128,7 +1128,6 @@ static ZERO_PAGE: Bytes = Bytes::from_static(&[0u8; BLCKSZ as usize]);
//
// Checkpoint:
// 03 00000000 00000000 00000000 00 00000001
//-- Section 01: relation data and metadata
const DBDIR_KEY: Key = Key {
@@ -1398,16 +1397,13 @@ fn is_slru_block_key(key: Key) -> bool {
&& key.field6 != 0xffffffff // and not SlruSegSize
}
//
//-- Tests that should work the same with any Repository/Timeline implementation.
//
#[cfg(test)]
pub fn create_test_timeline(
repo: &crate::layered_repository::Repository,
timeline_id: utils::zid::ZTimelineId,
tenant: &crate::tenant::Tenant,
timeline_id: utils::id::TimelineId,
pg_version: u32,
) -> Result<std::sync::Arc<Timeline>> {
let tline = repo.create_empty_timeline(timeline_id, Lsn(8))?;
let tline = tenant.create_empty_timeline(timeline_id, Lsn(8), pg_version)?;
let mut m = tline.begin_modification(Lsn(8));
m.init_empty()?;
m.commit()?;

View File

@@ -2,8 +2,8 @@ use serde::{Deserialize, Serialize};
use std::cmp::Ordering;
use std::fmt;
use postgres_ffi::v14::pg_constants;
use postgres_ffi::v14::relfile_utils::forknumber_to_name;
use postgres_ffi::pg_constants::GLOBALTABLESPACE_OID;
use postgres_ffi::relfile_utils::forknumber_to_name;
use postgres_ffi::Oid;
///
@@ -78,7 +78,7 @@ impl fmt::Display for RelTag {
impl RelTag {
pub fn to_segfile_name(&self, segno: u32) -> String {
let mut name = if self.spcnode == pg_constants::GLOBALTABLESPACE_OID {
let mut name = if self.spcnode == GLOBALTABLESPACE_OID {
"global/".to_string()
} else {
format!("base/{}/", self.dbnode)

View File

@@ -1,4 +1,4 @@
use crate::walrecord::ZenithWalRecord;
use crate::walrecord::NeonWalRecord;
use anyhow::{bail, Result};
use byteorder::{ByteOrder, BE};
use bytes::Bytes;
@@ -24,6 +24,19 @@ pub struct Key {
pub const KEY_SIZE: usize = 18;
impl Key {
/// 'field2' is used to store tablespaceid for relations and small enum numbers for other relish.
/// As long as Neon does not support tablespace (because of lack of access to local file system),
/// we can assume that only some predefined namespace OIDs are used which can fit in u16
pub fn to_i128(&self) -> i128 {
assert!(self.field2 < 0xFFFF || self.field2 == 0xFFFFFFFF || self.field2 == 0x22222222);
(((self.field1 & 0xf) as i128) << 120)
| (((self.field2 & 0xFFFF) as i128) << 104)
| ((self.field3 as i128) << 72)
| ((self.field4 as i128) << 40)
| ((self.field5 as i128) << 32)
| self.field6 as i128
}
pub fn next(&self) -> Key {
self.add(1)
}
@@ -157,7 +170,7 @@ pub enum Value {
/// replayed get the full value. Replaying the WAL record
/// might need a previous version of the value (if will_init()
/// returns false), or it may be replayed stand-alone (true).
WalRecord(ZenithWalRecord),
WalRecord(NeonWalRecord),
}
impl Value {
@@ -176,7 +189,7 @@ impl Value {
///
/// Result of performing GC
///
#[derive(Default)]
#[derive(Default, Serialize)]
pub struct GcResult {
pub layers_total: u64,
pub layers_needed_by_cutoff: u64,
@@ -185,9 +198,18 @@ pub struct GcResult {
pub layers_not_updated: u64,
pub layers_removed: u64, // # of layer files removed because they have been made obsolete by newer ondisk files.
#[serde(serialize_with = "serialize_duration_as_millis")]
pub elapsed: Duration,
}
// helper function for `GcResult`, serializing a `Duration` as an integer number of milliseconds
fn serialize_duration_as_millis<S>(d: &Duration, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
d.as_millis().serialize(serializer)
}
impl AddAssign for GcResult {
fn add_assign(&mut self, other: Self) {
self.layers_total += other.layers_total;

View File

@@ -46,10 +46,10 @@
//! Some time later, during pageserver checkpoints, in-memory data is flushed onto disk along with its metadata.
//! If the storage sync loop was successfully started before, pageserver schedules the layer files and the updated metadata file for upload, every time a layer is flushed to disk.
//! The uploads are disabled, if no remote storage configuration is provided (no sync loop is started this way either).
//! See [`crate::layered_repository`] for the upload calls and the adjacent logic.
//! See [`crate::tenant`] for the upload calls and the adjacent logic.
//!
//! Synchronization logic is able to communicate back with updated timeline sync states, [`crate::repository::TimelineSyncStatusUpdate`],
//! submitted via [`crate::tenant_mgr::apply_timeline_sync_status_updates`] function. Tenant manager applies corresponding timeline updates in pageserver's in-memory state.
//! Synchronization logic is able to communicate back with updated timeline sync states, submitted via [`crate::tenant_mgr::attach_local_tenants`] function.
//! Tenant manager applies corresponding timeline updates in pageserver's in-memory state.
//! Such submissions happen in two cases:
//! * once after the sync loop startup, to signal pageserver which timelines will be synchronized in the near future
//! * after every loop step, in case a timeline needs to be reloaded or evicted from pageserver's memory
@@ -68,7 +68,7 @@
//! Pageserver maintains similar to the local file structure remotely: all layer files are uploaded with the same names under the same directory structure.
//! Yet instead of keeping the `metadata` file remotely, we wrap it with more data in [`IndexPart`], containing the list of remote files.
//! This file gets read to populate the cache, if the remote timeline data is missing from it and gets updated after every successful download.
//! This way, we optimize S3 storage access by not running the `S3 list` command that could be expencive and slow: knowing both [`ZTenantId`] and [`ZTimelineId`],
//! This way, we optimize S3 storage access by not running the `S3 list` command that could be expencive and slow: knowing both [`TenantId`] and [`TimelineId`],
//! we can always reconstruct the path to the timeline, use this to get the same path on the remote storage and retrieve its shard contents, if needed, same as any layer files.
//!
//! By default, pageserver reads the remote storage index data only for timelines located locally, to synchronize those, if needed.
@@ -169,13 +169,8 @@ use self::{
upload::{upload_index_part, upload_timeline_layers, UploadedTimeline},
};
use crate::{
config::PageServerConf,
exponential_backoff,
layered_repository::metadata::{metadata_path, TimelineMetadata},
storage_sync::index::RemoteIndex,
task_mgr,
task_mgr::TaskKind,
task_mgr::BACKGROUND_RUNTIME,
config::PageServerConf, exponential_backoff, storage_sync::index::RemoteIndex, task_mgr,
task_mgr::TaskKind, task_mgr::BACKGROUND_RUNTIME, tenant::metadata::TimelineMetadata,
tenant_mgr::attach_local_tenants,
};
use crate::{
@@ -183,7 +178,7 @@ use crate::{
TenantTimelineValues,
};
use utils::zid::{ZTenantId, ZTenantTimelineId, ZTimelineId};
use utils::id::{TenantId, TenantTimelineId, TimelineId};
use self::download::download_index_parts;
pub use self::download::gather_tenant_timelines_index_parts;
@@ -227,7 +222,7 @@ pub struct SyncStartupData {
struct SyncQueue {
max_timelines_per_batch: NonZeroUsize,
queue: Mutex<VecDeque<(ZTenantTimelineId, SyncTask)>>,
queue: Mutex<VecDeque<(TenantTimelineId, SyncTask)>>,
condvar: Condvar,
}
@@ -241,7 +236,7 @@ impl SyncQueue {
}
/// Queue a new task
fn push(&self, sync_id: ZTenantTimelineId, new_task: SyncTask) {
fn push(&self, sync_id: TenantTimelineId, new_task: SyncTask) {
let mut q = self.queue.lock().unwrap();
q.push_back((sync_id, new_task));
@@ -254,7 +249,7 @@ impl SyncQueue {
/// A timeline has to care to not to delete certain layers from the remote storage before the corresponding uploads happen.
/// Other than that, due to "immutable" nature of the layers, the order of their deletion/uploading/downloading does not matter.
/// Hence, we merge the layers together into single task per timeline and run those concurrently (with the deletion happening only after successful uploading).
fn next_task_batch(&self) -> (HashMap<ZTenantTimelineId, SyncTaskBatch>, usize) {
fn next_task_batch(&self) -> (HashMap<TenantTimelineId, SyncTaskBatch>, usize) {
// Wait for the first task in blocking fashion
let mut q = self.queue.lock().unwrap();
while q.is_empty() {
@@ -488,8 +483,8 @@ struct LayersDeletion {
///
/// Ensure that the loop is started otherwise the task is never processed.
pub fn schedule_layer_upload(
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
layers_to_upload: HashSet<PathBuf>,
metadata: Option<TimelineMetadata>,
) {
@@ -501,7 +496,7 @@ pub fn schedule_layer_upload(
}
};
sync_queue.push(
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
},
@@ -519,8 +514,8 @@ pub fn schedule_layer_upload(
///
/// Ensure that the loop is started otherwise the task is never processed.
pub fn schedule_layer_delete(
tenant_id: ZTenantId,
timeline_id: ZTimelineId,
tenant_id: TenantId,
timeline_id: TimelineId,
layers_to_delete: HashSet<PathBuf>,
) {
let sync_queue = match SYNC_QUEUE.get() {
@@ -531,7 +526,7 @@ pub fn schedule_layer_delete(
}
};
sync_queue.push(
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
},
@@ -551,7 +546,7 @@ pub fn schedule_layer_delete(
/// On any failure, the task gets retried, omitting already downloaded layers.
///
/// Ensure that the loop is started otherwise the task is never processed.
pub fn schedule_layer_download(tenant_id: ZTenantId, timeline_id: ZTimelineId) {
pub fn schedule_layer_download(tenant_id: TenantId, timeline_id: TimelineId) {
debug!("Scheduling layer download for tenant {tenant_id}, timeline {timeline_id}");
let sync_queue = match SYNC_QUEUE.get() {
Some(queue) => queue,
@@ -561,7 +556,7 @@ pub fn schedule_layer_download(tenant_id: ZTenantId, timeline_id: ZTimelineId) {
}
};
sync_queue.push(
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
},
@@ -601,10 +596,11 @@ pub fn spawn_storage_sync_task(
for (tenant_id, timeline_data) in local_timeline_files.0 {
if timeline_data.is_empty() {
info!("got empty tenant {}", tenant_id);
let _ = empty_tenants.0.entry(tenant_id).or_default();
} else {
for (timeline_id, timeline_data) in timeline_data {
let id = ZTenantTimelineId::new(tenant_id, timeline_id);
let id = TenantTimelineId::new(tenant_id, timeline_id);
keys_for_index_part_downloads.insert(id);
timelines_to_sync.insert(id, timeline_data);
}
@@ -714,17 +710,17 @@ async fn storage_sync_loop(
};
if tenant_entry.has_in_progress_downloads() {
info!("Tenant {tenant_id} has pending timeline downloads, skipping repository registration");
info!("Tenant {tenant_id} has pending timeline downloads, skipping tenant registration");
continue;
} else {
info!(
"Tenant {tenant_id} download completed. Picking to register in repository"
"Tenant {tenant_id} download completed. Picking to register in tenant"
);
// Here we assume that if tenant has no in-progress downloads that
// means that it is the last completed timeline download that triggered
// sync status update. So we look at the index for available timelines
// and register them all at once in a repository for download
// to be submitted in a single operation to repository
// and register them all at once in a tenant for download
// to be submitted in a single operation to tenant
// so it can apply them at once to internal timeline map.
timelines_to_attach.0.insert(
tenant_id,
@@ -737,9 +733,7 @@ async fn storage_sync_loop(
}
drop(index_accessor);
// Batch timeline download registration to ensure that the external registration code won't block any running tasks before.
if let Err(e) = attach_local_tenants(conf, &index, timelines_to_attach) {
error!("Failed to attach new timelines: {e:?}");
};
attach_local_tenants(conf, &index, timelines_to_attach);
}
}
ControlFlow::Break(()) => {
@@ -768,9 +762,9 @@ async fn process_batches(
max_sync_errors: NonZeroU32,
storage: GenericRemoteStorage,
index: &RemoteIndex,
batched_tasks: HashMap<ZTenantTimelineId, SyncTaskBatch>,
batched_tasks: HashMap<TenantTimelineId, SyncTaskBatch>,
sync_queue: &SyncQueue,
) -> HashSet<ZTenantId> {
) -> HashSet<TenantId> {
let mut sync_results = batched_tasks
.into_iter()
.map(|(sync_id, batch)| {
@@ -810,7 +804,7 @@ async fn process_sync_task_batch(
conf: &'static PageServerConf,
(storage, index, sync_queue): (GenericRemoteStorage, RemoteIndex, &SyncQueue),
max_sync_errors: NonZeroU32,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
batch: SyncTaskBatch,
) -> DownloadStatus {
let sync_start = Instant::now();
@@ -951,7 +945,7 @@ async fn download_timeline_data(
conf: &'static PageServerConf,
(storage, index, sync_queue): (&GenericRemoteStorage, &RemoteIndex, &SyncQueue),
current_remote_timeline: Option<&RemoteTimeline>,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
new_download_data: SyncData<LayersDownload>,
sync_start: Instant,
task_name: &str,
@@ -1001,7 +995,7 @@ async fn download_timeline_data(
async fn update_local_metadata(
conf: &'static PageServerConf,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
remote_timeline: Option<&RemoteTimeline>,
) -> anyhow::Result<()> {
let remote_metadata = match remote_timeline {
@@ -1013,7 +1007,7 @@ async fn update_local_metadata(
};
let remote_lsn = remote_metadata.disk_consistent_lsn();
let local_metadata_path = metadata_path(conf, sync_id.timeline_id, sync_id.tenant_id);
let local_metadata_path = conf.metadata_path(sync_id.timeline_id, sync_id.tenant_id);
let local_lsn = if local_metadata_path.exists() {
let local_metadata = read_metadata_file(&local_metadata_path)
.await
@@ -1033,18 +1027,12 @@ async fn update_local_metadata(
info!("Updating local timeline metadata from remote timeline: local disk_consistent_lsn={local_lsn:?}, remote disk_consistent_lsn={remote_lsn}");
// clone because spawn_blocking requires static lifetime
let cloned_metadata = remote_metadata.to_owned();
let ZTenantTimelineId {
let TenantTimelineId {
tenant_id,
timeline_id,
} = sync_id;
tokio::task::spawn_blocking(move || {
crate::layered_repository::save_metadata(
conf,
timeline_id,
tenant_id,
&cloned_metadata,
true,
)
crate::tenant::save_metadata(conf, timeline_id, tenant_id, &cloned_metadata, true)
})
.await
.with_context(|| {
@@ -1069,7 +1057,7 @@ async fn update_local_metadata(
async fn delete_timeline_data(
conf: &'static PageServerConf,
(storage, index, sync_queue): (&GenericRemoteStorage, &RemoteIndex, &SyncQueue),
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
mut new_delete_data: SyncData<LayersDeletion>,
sync_start: Instant,
task_name: &str,
@@ -1112,7 +1100,7 @@ async fn upload_timeline_data(
conf: &'static PageServerConf,
(storage, index, sync_queue): (&GenericRemoteStorage, &RemoteIndex, &SyncQueue),
current_remote_timeline: Option<&RemoteTimeline>,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
new_upload_data: SyncData<LayersUpload>,
sync_start: Instant,
task_name: &str,
@@ -1171,7 +1159,7 @@ async fn update_remote_data(
conf: &'static PageServerConf,
storage: &GenericRemoteStorage,
index: &RemoteIndex,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
update: RemoteDataUpdate<'_>,
) -> anyhow::Result<()> {
let updated_remote_timeline = {
@@ -1269,7 +1257,7 @@ async fn validate_task_retries(
fn schedule_first_sync_tasks(
index: &mut RemoteTimelineIndex,
sync_queue: &SyncQueue,
local_timeline_files: HashMap<ZTenantTimelineId, (TimelineMetadata, HashSet<PathBuf>)>,
local_timeline_files: HashMap<TenantTimelineId, (TimelineMetadata, HashSet<PathBuf>)>,
) -> TenantTimelineValues<LocalTimelineInitStatus> {
let mut local_timeline_init_statuses = TenantTimelineValues::new();
@@ -1311,6 +1299,10 @@ fn schedule_first_sync_tasks(
None => {
// TODO (rodionov) does this mean that we've crashed during tenant creation?
// is it safe to upload this checkpoint? could it be half broken?
warn!(
"marking {} as locally complete, while it doesnt exist in remote index",
sync_id
);
new_sync_tasks.push_back((
sync_id,
SyncTask::upload(LayersUpload {
@@ -1339,12 +1331,14 @@ fn schedule_first_sync_tasks(
/// bool in return value stands for awaits_download
fn compare_local_and_remote_timeline(
new_sync_tasks: &mut VecDeque<(ZTenantTimelineId, SyncTask)>,
sync_id: ZTenantTimelineId,
new_sync_tasks: &mut VecDeque<(TenantTimelineId, SyncTask)>,
sync_id: TenantTimelineId,
local_metadata: TimelineMetadata,
local_files: HashSet<PathBuf>,
remote_entry: &RemoteTimeline,
) -> (LocalTimelineInitStatus, bool) {
let _entered = info_span!("compare_local_and_remote_timeline", sync_id = %sync_id).entered();
let remote_files = remote_entry.stored_files();
let number_of_layers_to_download = remote_files.difference(&local_files).count();
@@ -1355,10 +1349,12 @@ fn compare_local_and_remote_timeline(
layers_to_skip: local_files.clone(),
}),
));
info!("NeedsSync");
(LocalTimelineInitStatus::NeedsSync, true)
// we do not need to manipulate with remote consistent lsn here
// because it will be updated when sync will be completed
} else {
info!("LocallyComplete");
(
LocalTimelineInitStatus::LocallyComplete(local_metadata.clone()),
false,
@@ -1385,7 +1381,7 @@ fn compare_local_and_remote_timeline(
}
fn register_sync_status(
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
sync_start: Instant,
sync_name: &str,
sync_status: Option<bool>,
@@ -1411,13 +1407,13 @@ fn register_sync_status(
mod test_utils {
use utils::lsn::Lsn;
use crate::layered_repository::repo_harness::RepoHarness;
use crate::tenant::harness::TenantHarness;
use super::*;
pub(super) async fn create_local_timeline(
harness: &RepoHarness<'_>,
timeline_id: ZTimelineId,
harness: &TenantHarness<'_>,
timeline_id: TimelineId,
filenames: &[&str],
metadata: TimelineMetadata,
) -> anyhow::Result<LayersUpload> {
@@ -1432,7 +1428,7 @@ mod test_utils {
}
fs::write(
metadata_path(harness.conf, timeline_id, harness.tenant_id),
harness.conf.metadata_path(timeline_id, harness.tenant_id),
metadata.to_bytes()?,
)
.await?;
@@ -1449,21 +1445,31 @@ mod test_utils {
}
pub(super) fn dummy_metadata(disk_consistent_lsn: Lsn) -> TimelineMetadata {
TimelineMetadata::new(disk_consistent_lsn, None, None, Lsn(0), Lsn(0), Lsn(0))
TimelineMetadata::new(
disk_consistent_lsn,
None,
None,
Lsn(0),
Lsn(0),
Lsn(0),
// Any version will do
// but it should be consistent with the one in the tests
crate::DEFAULT_PG_VERSION,
)
}
}
#[cfg(test)]
mod tests {
use super::test_utils::dummy_metadata;
use crate::layered_repository::repo_harness::TIMELINE_ID;
use crate::tenant::harness::TIMELINE_ID;
use hex_literal::hex;
use utils::lsn::Lsn;
use super::*;
const TEST_SYNC_ID: ZTenantTimelineId = ZTenantTimelineId {
tenant_id: ZTenantId::from_array(hex!("11223344556677881122334455667788")),
const TEST_SYNC_ID: TenantTimelineId = TenantTimelineId {
tenant_id: TenantId::from_array(hex!("11223344556677881122334455667788")),
timeline_id: TIMELINE_ID,
};
@@ -1472,12 +1478,12 @@ mod tests {
let sync_queue = SyncQueue::new(NonZeroUsize::new(100).unwrap());
assert_eq!(sync_queue.len(), 0);
let sync_id_2 = ZTenantTimelineId {
tenant_id: ZTenantId::from_array(hex!("22223344556677881122334455667788")),
let sync_id_2 = TenantTimelineId {
tenant_id: TenantId::from_array(hex!("22223344556677881122334455667788")),
timeline_id: TIMELINE_ID,
};
let sync_id_3 = ZTenantTimelineId {
tenant_id: ZTenantId::from_array(hex!("33223344556677881122334455667788")),
let sync_id_3 = TenantTimelineId {
tenant_id: TenantId::from_array(hex!("33223344556677881122334455667788")),
timeline_id: TIMELINE_ID,
};
assert!(sync_id_2 != TEST_SYNC_ID);
@@ -1599,8 +1605,8 @@ mod tests {
layers_to_skip: HashSet::from([PathBuf::from("sk4")]),
};
let sync_id_2 = ZTenantTimelineId {
tenant_id: ZTenantId::from_array(hex!("22223344556677881122334455667788")),
let sync_id_2 = TenantTimelineId {
tenant_id: TenantId::from_array(hex!("22223344556677881122334455667788")),
timeline_id: TIMELINE_ID,
};
assert!(sync_id_2 != TEST_SYNC_ID);

View File

@@ -8,7 +8,7 @@ use tracing::{debug, error, info};
use crate::storage_sync::{SyncQueue, SyncTask};
use remote_storage::GenericRemoteStorage;
use utils::zid::ZTenantTimelineId;
use utils::id::TenantTimelineId;
use super::{LayersDeletion, SyncData};
@@ -17,7 +17,7 @@ use super::{LayersDeletion, SyncData};
pub(super) async fn delete_timeline_layers(
storage: &GenericRemoteStorage,
sync_queue: &SyncQueue,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
mut delete_data: SyncData<LayersDeletion>,
) -> bool {
if !delete_data.data.deletion_registered {
@@ -112,8 +112,8 @@ mod tests {
use utils::lsn::Lsn;
use crate::{
layered_repository::repo_harness::{RepoHarness, TIMELINE_ID},
storage_sync::test_utils::{create_local_timeline, dummy_metadata},
tenant::harness::{TenantHarness, TIMELINE_ID},
};
use remote_storage::{LocalFs, RemoteStorage};
@@ -121,9 +121,9 @@ mod tests {
#[tokio::test]
async fn delete_timeline_negative() -> anyhow::Result<()> {
let harness = RepoHarness::create("delete_timeline_negative")?;
let harness = TenantHarness::create("delete_timeline_negative")?;
let sync_queue = SyncQueue::new(NonZeroUsize::new(100).unwrap());
let sync_id = ZTenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let sync_id = TenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let storage = GenericRemoteStorage::new(LocalFs::new(
tempdir()?.path().to_path_buf(),
harness.conf.workdir.clone(),
@@ -154,10 +154,10 @@ mod tests {
#[tokio::test]
async fn delete_timeline() -> anyhow::Result<()> {
let harness = RepoHarness::create("delete_timeline")?;
let harness = TenantHarness::create("delete_timeline")?;
let sync_queue = SyncQueue::new(NonZeroUsize::new(100).unwrap());
let sync_id = ZTenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let sync_id = TenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let layer_files = ["a", "b", "c", "d"];
let storage = GenericRemoteStorage::new(LocalFs::new(
tempdir()?.path().to_path_buf(),

View File

@@ -9,18 +9,18 @@ use std::{
use anyhow::Context;
use futures::stream::{FuturesUnordered, StreamExt};
use remote_storage::{path_with_suffix_extension, DownloadError, GenericRemoteStorage};
use remote_storage::{DownloadError, GenericRemoteStorage};
use tokio::{
fs,
io::{self, AsyncWriteExt},
};
use tracing::{debug, error, info, warn};
use crate::{
config::PageServerConf, layered_repository::metadata::metadata_path, storage_sync::SyncTask,
TEMP_FILE_SUFFIX,
use crate::{config::PageServerConf, storage_sync::SyncTask, TEMP_FILE_SUFFIX};
use utils::{
crashsafe_dir::path_with_suffix_extension,
id::{TenantId, TenantTimelineId, TimelineId},
};
use utils::zid::{ZTenantId, ZTenantTimelineId, ZTimelineId};
use super::{
index::{IndexPart, RemoteTimeline},
@@ -33,14 +33,14 @@ use super::{
// When data is received succesfully without errors Present variant is used.
pub enum TenantIndexParts {
Poisoned {
present: HashMap<ZTimelineId, IndexPart>,
missing: HashSet<ZTimelineId>,
present: HashMap<TimelineId, IndexPart>,
missing: HashSet<TimelineId>,
},
Present(HashMap<ZTimelineId, IndexPart>),
Present(HashMap<TimelineId, IndexPart>),
}
impl TenantIndexParts {
fn add_poisoned(&mut self, timeline_id: ZTimelineId) {
fn add_poisoned(&mut self, timeline_id: TimelineId) {
match self {
TenantIndexParts::Poisoned { missing, .. } => {
missing.insert(timeline_id);
@@ -64,9 +64,9 @@ impl Default for TenantIndexParts {
pub async fn download_index_parts(
conf: &'static PageServerConf,
storage: &GenericRemoteStorage,
keys: HashSet<ZTenantTimelineId>,
) -> HashMap<ZTenantId, TenantIndexParts> {
let mut index_parts: HashMap<ZTenantId, TenantIndexParts> = HashMap::new();
keys: HashSet<TenantTimelineId>,
) -> HashMap<TenantId, TenantIndexParts> {
let mut index_parts: HashMap<TenantId, TenantIndexParts> = HashMap::new();
let mut part_downloads = keys
.into_iter()
@@ -112,8 +112,8 @@ pub async fn download_index_parts(
pub async fn gather_tenant_timelines_index_parts(
conf: &'static PageServerConf,
storage: &GenericRemoteStorage,
tenant_id: ZTenantId,
) -> anyhow::Result<HashMap<ZTimelineId, IndexPart>> {
tenant_id: TenantId,
) -> anyhow::Result<HashMap<TimelineId, IndexPart>> {
let tenant_path = conf.timelines_path(&tenant_id);
let timeline_sync_ids = get_timeline_sync_ids(storage, &tenant_path, tenant_id)
.await
@@ -135,9 +135,10 @@ pub async fn gather_tenant_timelines_index_parts(
async fn download_index_part(
conf: &'static PageServerConf,
storage: &GenericRemoteStorage,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
) -> Result<IndexPart, DownloadError> {
let index_part_path = metadata_path(conf, sync_id.timeline_id, sync_id.tenant_id)
let index_part_path = conf
.metadata_path(sync_id.timeline_id, sync_id.tenant_id)
.with_file_name(IndexPart::FILE_NAME);
let mut index_part_download = storage
.download_storage_object(None, &index_part_path)
@@ -197,7 +198,7 @@ pub(super) async fn download_timeline_layers<'a>(
storage: &'a GenericRemoteStorage,
sync_queue: &'a SyncQueue,
remote_timeline: Option<&'a RemoteTimeline>,
sync_id: ZTenantTimelineId,
sync_id: TenantTimelineId,
mut download_data: SyncData<LayersDownload>,
) -> DownloadedTimeline {
let remote_timeline = match remote_timeline {
@@ -335,7 +336,7 @@ pub(super) async fn download_timeline_layers<'a>(
}
// fsync timeline directory which is a parent directory for downloaded files
let ZTenantTimelineId {
let TenantTimelineId {
tenant_id,
timeline_id,
} = &sync_id;
@@ -366,8 +367,8 @@ pub(super) async fn download_timeline_layers<'a>(
async fn get_timeline_sync_ids(
storage: &GenericRemoteStorage,
tenant_path: &Path,
tenant_id: ZTenantId,
) -> anyhow::Result<HashSet<ZTenantTimelineId>> {
tenant_id: TenantId,
) -> anyhow::Result<HashSet<TenantTimelineId>> {
let tenant_storage_path = storage.remote_object_id(tenant_path).with_context(|| {
format!(
"Failed to get tenant storage path for local path '{}'",
@@ -395,11 +396,11 @@ async fn get_timeline_sync_ids(
anyhow::anyhow!("failed to get timeline id for remote tenant {tenant_id}")
})?;
let timeline_id: ZTimelineId = object_name.parse().with_context(|| {
let timeline_id: TimelineId = object_name.parse().with_context(|| {
format!("failed to parse object name into timeline id '{object_name}'")
})?;
sync_ids.insert(ZTenantTimelineId {
sync_ids.insert(TenantTimelineId {
tenant_id,
timeline_id,
});
@@ -425,21 +426,21 @@ mod tests {
use utils::lsn::Lsn;
use crate::{
layered_repository::repo_harness::{RepoHarness, TIMELINE_ID},
storage_sync::{
index::RelativePath,
test_utils::{create_local_timeline, dummy_metadata},
},
tenant::harness::{TenantHarness, TIMELINE_ID},
};
use super::*;
#[tokio::test]
async fn download_timeline() -> anyhow::Result<()> {
let harness = RepoHarness::create("download_timeline")?;
let harness = TenantHarness::create("download_timeline")?;
let sync_queue = SyncQueue::new(NonZeroUsize::new(100).unwrap());
let sync_id = ZTenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let sync_id = TenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let layer_files = ["a", "b", "layer_to_skip", "layer_to_keep_locally"];
let storage = GenericRemoteStorage::new(LocalFs::new(
tempdir()?.path().to_owned(),
@@ -537,9 +538,9 @@ mod tests {
#[tokio::test]
async fn download_timeline_negatives() -> anyhow::Result<()> {
let harness = RepoHarness::create("download_timeline_negatives")?;
let harness = TenantHarness::create("download_timeline_negatives")?;
let sync_queue = SyncQueue::new(NonZeroUsize::new(100).unwrap());
let sync_id = ZTenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let sync_id = TenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let storage = GenericRemoteStorage::new(LocalFs::new(
tempdir()?.path().to_owned(),
harness.conf.workdir.clone(),
@@ -596,8 +597,8 @@ mod tests {
#[tokio::test]
async fn test_download_index_part() -> anyhow::Result<()> {
let harness = RepoHarness::create("test_download_index_part")?;
let sync_id = ZTenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let harness = TenantHarness::create("test_download_index_part")?;
let sync_id = TenantTimelineId::new(harness.tenant_id, TIMELINE_ID);
let storage = GenericRemoteStorage::new(LocalFs::new(
tempdir()?.path().to_owned(),
@@ -620,9 +621,10 @@ mod tests {
metadata.to_bytes()?,
);
let local_index_part_path =
metadata_path(harness.conf, sync_id.timeline_id, sync_id.tenant_id)
.with_file_name(IndexPart::FILE_NAME);
let local_index_part_path = harness
.conf
.metadata_path(sync_id.timeline_id, sync_id.tenant_id)
.with_file_name(IndexPart::FILE_NAME);
let index_part_remote_id = local_storage.remote_object_id(&local_index_part_path)?;
let index_part_local_path = PathBuf::from(index_part_remote_id.to_string());
fs::create_dir_all(index_part_local_path.parent().unwrap()).await?;

View File

@@ -15,10 +15,10 @@ use serde_with::{serde_as, DisplayFromStr};
use tokio::sync::RwLock;
use tracing::log::warn;
use crate::{config::PageServerConf, layered_repository::metadata::TimelineMetadata};
use crate::{config::PageServerConf, tenant::metadata::TimelineMetadata};
use utils::{
id::{TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
zid::{ZTenantId, ZTenantTimelineId, ZTimelineId},
};
use super::download::TenantIndexParts;
@@ -49,7 +49,7 @@ impl RelativePath {
}
#[derive(Debug, Clone, Default)]
pub struct TenantEntry(HashMap<ZTimelineId, RemoteTimeline>);
pub struct TenantEntry(HashMap<TimelineId, RemoteTimeline>);
impl TenantEntry {
pub fn has_in_progress_downloads(&self) -> bool {
@@ -59,7 +59,7 @@ impl TenantEntry {
}
impl Deref for TenantEntry {
type Target = HashMap<ZTimelineId, RemoteTimeline>;
type Target = HashMap<TimelineId, RemoteTimeline>;
fn deref(&self) -> &Self::Target {
&self.0
@@ -72,8 +72,8 @@ impl DerefMut for TenantEntry {
}
}
impl From<HashMap<ZTimelineId, RemoteTimeline>> for TenantEntry {
fn from(inner: HashMap<ZTimelineId, RemoteTimeline>) -> Self {
impl From<HashMap<TimelineId, RemoteTimeline>> for TenantEntry {
fn from(inner: HashMap<TimelineId, RemoteTimeline>) -> Self {
Self(inner)
}
}
@@ -81,7 +81,7 @@ impl From<HashMap<ZTimelineId, RemoteTimeline>> for TenantEntry {
/// An index to track tenant files that exist on the remote storage.
#[derive(Debug, Clone, Default)]
pub struct RemoteTimelineIndex {
entries: HashMap<ZTenantId, TenantEntry>,
entries: HashMap<TenantId, TenantEntry>,
}
/// A wrapper to synchronize the access to the index, should be created and used before dealing with any [`RemoteTimelineIndex`].
@@ -91,9 +91,9 @@ pub struct RemoteIndex(Arc<RwLock<RemoteTimelineIndex>>);
impl RemoteIndex {
pub fn from_parts(
conf: &'static PageServerConf,
index_parts: HashMap<ZTenantId, TenantIndexParts>,
index_parts: HashMap<TenantId, TenantIndexParts>,
) -> anyhow::Result<Self> {
let mut entries: HashMap<ZTenantId, TenantEntry> = HashMap::new();
let mut entries: HashMap<TenantId, TenantEntry> = HashMap::new();
for (tenant_id, index_parts) in index_parts {
match index_parts {
@@ -136,30 +136,30 @@ impl Clone for RemoteIndex {
impl RemoteTimelineIndex {
pub fn timeline_entry(
&self,
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
}: &ZTenantTimelineId,
}: &TenantTimelineId,
) -> Option<&RemoteTimeline> {
self.entries.get(tenant_id)?.get(timeline_id)
}
pub fn timeline_entry_mut(
&mut self,
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
}: &ZTenantTimelineId,
}: &TenantTimelineId,
) -> Option<&mut RemoteTimeline> {
self.entries.get_mut(tenant_id)?.get_mut(timeline_id)
}
pub fn add_timeline_entry(
&mut self,
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
}: ZTenantTimelineId,
}: TenantTimelineId,
entry: RemoteTimeline,
) {
self.entries
@@ -170,10 +170,10 @@ impl RemoteTimelineIndex {
pub fn remove_timeline_entry(
&mut self,
ZTenantTimelineId {
TenantTimelineId {
tenant_id,
timeline_id,
}: ZTenantTimelineId,
}: TenantTimelineId,
) -> Option<RemoteTimeline> {
self.entries
.entry(tenant_id)
@@ -181,25 +181,25 @@ impl RemoteTimelineIndex {
.remove(&timeline_id)
}
pub fn tenant_entry(&self, tenant_id: &ZTenantId) -> Option<&TenantEntry> {
pub fn tenant_entry(&self, tenant_id: &TenantId) -> Option<&TenantEntry> {
self.entries.get(tenant_id)
}
pub fn tenant_entry_mut(&mut self, tenant_id: &ZTenantId) -> Option<&mut TenantEntry> {
pub fn tenant_entry_mut(&mut self, tenant_id: &TenantId) -> Option<&mut TenantEntry> {
self.entries.get_mut(tenant_id)
}
pub fn add_tenant_entry(&mut self, tenant_id: ZTenantId) -> &mut TenantEntry {
pub fn add_tenant_entry(&mut self, tenant_id: TenantId) -> &mut TenantEntry {
self.entries.entry(tenant_id).or_default()
}
pub fn remove_tenant_entry(&mut self, tenant_id: &ZTenantId) -> Option<TenantEntry> {
pub fn remove_tenant_entry(&mut self, tenant_id: &TenantId) -> Option<TenantEntry> {
self.entries.remove(tenant_id)
}
pub fn set_awaits_download(
&mut self,
id: &ZTenantTimelineId,
id: &TenantTimelineId,
awaits_download: bool,
) -> anyhow::Result<()> {
self.timeline_entry_mut(id)
@@ -340,14 +340,22 @@ mod tests {
use std::collections::BTreeSet;
use super::*;
use crate::layered_repository::repo_harness::{RepoHarness, TIMELINE_ID};
use crate::tenant::harness::{TenantHarness, TIMELINE_ID};
use crate::DEFAULT_PG_VERSION;
#[test]
fn index_part_conversion() {
let harness = RepoHarness::create("index_part_conversion").unwrap();
let harness = TenantHarness::create("index_part_conversion").unwrap();
let timeline_path = harness.timeline_path(&TIMELINE_ID);
let metadata =
TimelineMetadata::new(Lsn(5).align(), Some(Lsn(4)), None, Lsn(3), Lsn(2), Lsn(1));
let metadata = TimelineMetadata::new(
Lsn(5).align(),
Some(Lsn(4)),
None,
Lsn(3),
Lsn(2),
Lsn(1),
DEFAULT_PG_VERSION,
);
let remote_timeline = RemoteTimeline {
timeline_layers: HashSet::from([
timeline_path.join("layer_1"),
@@ -462,10 +470,17 @@ mod tests {
#[test]
fn index_part_conversion_negatives() {
let harness = RepoHarness::create("index_part_conversion_negatives").unwrap();
let harness = TenantHarness::create("index_part_conversion_negatives").unwrap();
let timeline_path = harness.timeline_path(&TIMELINE_ID);
let metadata =
TimelineMetadata::new(Lsn(5).align(), Some(Lsn(4)), None, Lsn(3), Lsn(2), Lsn(1));
let metadata = TimelineMetadata::new(
Lsn(5).align(),
Some(Lsn(4)),
None,
Lsn(3),
Lsn(2),
Lsn(1),
DEFAULT_PG_VERSION,
);
let conversion_result = IndexPart::from_remote_timeline(
&timeline_path,

Some files were not shown because too many files have changed in this diff Show More