Compare commits

...

64 Commits

Author SHA1 Message Date
Heikki Linnakangas
7eefad691c Improve the scaling, add Play/Stop buttons 2023-03-22 19:40:54 +02:00
Heikki Linnakangas
fe59a063ea WIP: Collect and draw layer trace 2023-03-22 19:40:54 +02:00
Heikki Linnakangas
ae8e5b3a8e Add test from PR #3673 2023-03-22 19:40:54 +02:00
Kirill Bulatov
8bd565e09e Ensure branches with no layers have their remote storage counterpart created eventually (#3857)
Discovered during writing a test for
https://github.com/neondatabase/neon/pull/3843
2023-03-22 17:42:31 +02:00
Joonas Koivunen
6033dfdf4a Re-access layers before threshold eviction (#3867)
To avoid re-downloading evicted files on restart, re-compute logical
size and partitioning before each threshold based eviction run.

Cc: #3802

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-03-22 16:26:27 +02:00
mikecaat
14a40c9ca6 Fix minor things for the docker-compose file (#3862)
* Add the REPOSITORY env to build args to avoid the following error when
executing without the credentials for the repository.

```
ERROR: Service 'compute' failed to build: Head
"https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221":
no basic auth credentials
```

* update the tag version in the documentation to support storage broker
2023-03-22 08:10:53 +00:00
Shany Pozin
0f7de84785 Allow calling detach on ignored tenant (#3834)
## Describe your changes
Added a query param to detach API
Allow to remove local state of a tenant even if its not in the memory
(following ignore API)
## Issue ticket number and link
#3828
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-03-22 07:17:00 +00:00
Kirill Bulatov
dd22c87100 Remove older layer metadata format support code (#3854)
The PR enforces current newest `index_part.json` format in the type
system (version `1`), not allowing any previous forms of it, that were
used in the past.
Similarly, the code to mitigate the
https://github.com/neondatabase/neon/issues/3024 issue is now also
removed.

Current code does not produce old formats and extra files in the
index_part.json, in the future we will be able to use
https://github.com/neondatabase/aversion or other approach to make
version transitions more explicit.

See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for
the justification on the breaking changes.
2023-03-21 23:33:28 +02:00
Heikki Linnakangas
6fdd9c10d1 Read storage auth token from spec file.
We read the pageserver connection string from the spec file, so let's
read the auth token from the same place.

We've been talking about pre-launching compute nodes that are not
associated with any particular tenant at startup, so that the spec
file is delivered to the compute node later. We cannot change the env
variables after the process has been launched.

We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN
env variable, but compute_ctl is now responsible for setting it.
2023-03-21 20:12:09 +02:00
Dmitry Rodionov
4158e24e60 rfc: delete pageserver data from s3 (#3792)
[Rendered](https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-21 20:03:27 +02:00
Shany Pozin
809acb5fa9 Move neon-image-depot to a larger runner (#3860)
## Describe your changes
https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-21 19:32:36 +02:00
Heikki Linnakangas
299db9d028 Simplify and clean up the $NEON_AUTH_TOKEN stuff in compute
- Remove the neon.safekeeper_token_env GUC. It was used to set the
  name of an environment variable, which was then used in pageserver
  and safekeeper connection strings to in place of the
  password. Instead, always look up the environment variable called
  NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always
  set to in practice, and I don't see the need for the extra level of
  indirection or configurability.

- Instead of substituting $NEON_AUTH_TOKEN in the connection strings,
  pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect
  to the pageserver or safekeepers. That's simpler.

- Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it
  connects to the pageserver to get the "base backup".
2023-03-21 00:15:04 +02:00
Heikki Linnakangas
5a786fab4f Remove duplicated global variables in neon extension.
Walproposer used to live in the backend, while pagestore_smgr was an
extension. But now that both are part of the neon extension,
walproposer can access the same 'neon_tenant' and 'neon_timeline'
variables as the pageserver_smgr code.
2023-03-21 00:15:04 +02:00
Arseny Sher
699f200811 Send error context chain to the client when Copy stream errors. 2023-03-21 01:22:02 +04:00
Christian Schwarz
881356c417 add metrics to detect eviction-induced thrashing (#3837)
This patch adds two metrics that will enable us to detect *thrashing* of
layers, i.e., repetitions of `eviction, on-demand-download, eviction,
... ` for a given layer.

The first metric counts all layer evictions per timeline. It requires no
further explanation. The second metric counts the layer evictions where
the layer was resident for less than a given threshold.

We can alert on increments to the second metric. The first metric will
serve as a baseline, and further, it's generally interesting, outside of
thrashing.

The second metric's threshold is configurable in PageServerConf and
defaults to 24h. The threshold value is reproduced as a label in the
metric because the counter's value is semantically tied to that
threshold. Since changes to the config and hence the label value are
infrequent, this will have low storage overhead in the metrics storage.

The data source to determine the time that the layer was resident is the
file's `mtime`. Using `mtime` is more of a crutch. It would be better if
Pageserver did its own persistent bookkeeping of residence change events
instead of relying on the filesystem. We had some discussion about this:
https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900

My position is that `mtime` is good enough for now. It can theoretically
jump forward if someone copies files without resetting `mtime`. But that
shouldn't happen in practice. Note that moving files back and forth
doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`,
which is typically used for filesystem-level backup / restore, correctly
syncs `mtime`.

I've added a label that identifies the data source to keep options open
for a future, better data source than `mtime`. Since this value will
stay the same for the time being, it's not a problem for metrics
storage.

refs https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:11:36 +01:00
Heikki Linnakangas
fea4b5f551 Switch to EdDSA algorithm for the storage JWT authentication tokens.
The control plane currently only supports EdDSA. We need to either teach
the storage to use EdDSA, or the control plane to use RSA. EdDSA is more
modern, so let's use that.

We could support both, but it would require a little more code and tests,
and we don't really need the flexibility since we control both sides.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
77107607f3 Allow JWT key generation to fail if authentication is not enabled.
This allows you to run without the 'openssl' binary as long as you
don't enable authentication. This becomes more important with the next
commit, which switches the JWT algorithm to EdDSA. LibreSSL does not
support EdDSA, and LibreSSL comes with macOS, so the next commit makes
it much more likely for the key generation to fail for macOS users.

To allow running without a keypair, don't generate the authentication
token in the 'neon_local init' step. Instead, generate a new token on
every request that needs one, using the private key.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1da963b2f9 Remove some unused code in control plane. 2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1ddb9249aa Reduce the # of histogram buckets in metrics. (#3850)
Shrinks the total number of metrics collected for each timeline by
about 50%.

See https://github.com/neondatabase/neon/issues/2848. This doesn't fully
solve the problem, we still collect a lot of metrics even with this, but
this gives us a lot of headroom.
2023-03-20 15:49:16 +02:00
Joonas Koivunen
0c1228c37a feat: store initial timeline in env fixture (#3839)
minor change, but will allow more use in future for the default
tenants.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-20 11:57:27 +02:00
Christian Schwarz
3c15874c48 allow specifying eviction_policy in TenantCreateRequest
This was on oversight from 175a577ad4.

Nothing uses this AFAIK, but, let's fix it anyways.

Noticed while working on https://github.com/neondatabase/neon/issues/3728
2023-03-20 10:43:53 +01:00
Shany Pozin
93f3f4ab5f Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818)
## Describe your changes
Add Error enum for tenant state response to allow better error handling
in mgmt api
## Issue ticket number and link
#2238
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-19 10:44:42 +02:00
sharnoff
f6e2e0042d Fix + re-enable VM cgroup creation + running (#3820)
Re-enable cgroup shenanigans in VMs, with some special care taken to
make sure that our version of cgroup-tools supports cgroup v2 (debian
bullseye does not, and probably won't because it requires a breaking
change in libcgroup).

This involves manually building libcgroup  / cgroup-tools from source,
then copying the output into the final build stage.

We originally considered pulling the package from debian's testing repo
(which is up-to-date), but decided against it. Refer to the PR for more
details.

Prior work, for reference:

* 2153d2e0 - Run compute_ctl in a cgroup in VMs
* 1360361f - Fix missing VM cgconfig.conf
* 8dae8799 - Disable VM cgroup shenanigans
2023-03-16 17:09:45 -07:00
Christian Schwarz
b917270c67 remove unused TenantConfig::update function 2023-03-16 16:25:35 +01:00
Arthur Petukhovsky
b067378d0d Measure cross-AZ traffic in safekeepers (#3806)
Create `safekeeper_pg_io_bytes_total` metric to track total amount of
bytes written/read in a postgres connections to safekeepers. This metric
has the following labels:
- `client_az` – availability zone of the connection initiator, or
`"unknown"`
- `sk_az` – availability zone of the safekeeper, or `"unknown"`
- `app_name` – `application_name` of the postgres client
- `dir` – data direction, either `"read"` or `"write"`
- `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from
`client_az` and `sk_az`, exists purely for convenience.

This is implemented by passing availability zone in the connection
string, like this: `-c tenant_id=AAA timeline_id=BBB
availability-zone=AZ-1`.

Update ansible deployment scripts to add availability_zone argument
to safekeeper and pageserver in systemd service files.
2023-03-16 17:24:01 +03:00
Joonas Koivunen
768c8d9972 test: allow gc to get unlucky (#3826)
this failure case was probably introduced by b220ba6, because earlier
the gc would always have run fast enough for restart every 1s. however,
test got added later, so we have just been lucky.

fixes #3824 by allowing this error to happen.
2023-03-16 11:03:21 +02:00
Rahul Patil
f1d960d2c2 Add new pageserver to eu-central-1 (#3829)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-15 16:58:28 +01:00
Arthur Petukhovsky
cd17802b1f Add pg_clients test for serverless driver (#3827)
Fixes #3819
2023-03-15 16:18:38 +03:00
Heikki Linnakangas
10a5d36af8 Separate mgmt and libpq authentication configs in pageserver. (#3773)
This makes it possible to enable authentication only for the mgmt HTTP
API or the compute API. The HTTP API doesn't need to be directly
accessible from compute nodes, and it can be secured through network
policies. This also allows rolling out authentication in a piecemeal
fashion.
2023-03-15 13:52:29 +02:00
Arseny Sher
a7ab53c80c Forward framed read buf contents to compute before proxy pass.
Otherwise they get lost. Normally buffer is empty before proxy pass, but this is
not the case with pipeline mode of out npm driver; fixes connection hangup
introduced by b80fe41af3 for it.

fixes https://github.com/neondatabase/neon/issues/3822
2023-03-15 14:32:41 +03:00
Heikki Linnakangas
2672fd09d8 Make test independent of the order of config lines. 2023-03-14 20:10:34 +02:00
Heikki Linnakangas
4a92799f24 Fix check for trailing garbage in basebackup import.
There was a warning for trailing garbage after end-of-tar archive, but
it didn't always work. The reason is that we created a StreamReader
over the original copyin-stream, but performed the check for garbage
on the copyin-stream. There could be some garbage bytes buffered in
the StreamReader, which were not caught by the warning.

I considered turning the the warning into a fatal error, aborting the
import, but I wasn't sure if we handle aborting the import properly.
Do we clean up the timeline directory on error? If we don't, we should
make that more robust, but that's a different story.

Also, normally a valid tar archive ends with two 512-byte blocks of zeros.
The tokio_tar crate stops at the first all-zeros block. Read and check
the second all-zeros block, and error out if it's not there, or contains
something unexpected.
2023-03-14 19:45:33 +02:00
Konstantin Knizhnik
5396273541 Avoid holes between generated image layers (#3771)
## Describe your changes

When we perform partitioning of the whole key space, we take in account
actual ranges of relation present in the database. So if we have
relation with relid=1 and size 100 and relation with relid=2 with size
200 then result of KeySpace::partition may contain partitions
<100000000..100000099> and <200000000..200000199>. Generated image
layers will contain the same boundaries.
But when GC is checking image coverage to find out of old layer is fully
covered by newer image layer and so can be deleted, it takes in account
only full key range. I.e. if there is delta layer <100000000..300000000>
then it never be garbage collected because image layers
<100000000..100000099> and <200000000..200000199> are not completely
covering it.This is how it looks in practice:


000000067F000032AC00000A300000000000-000000067F000032AC00000A330000000000__000000000F761828

000000067F000032AC00000A31000000001F-000000067F000032AC00000A620000000005__0000000001696070-000000000442A551

000000067F000032AC00000A3300FFFFFFFF-000000067F000032AC00000A650100000000__000000000F761828

So there are two image layers covering delta layer but ... there is a
hole: A330000000000...A3300FFFFFFFF and as a result delta layer is not
collected.

## Issue ticket number and link

This PR is deeply related with #3673 because it is addressing the same
problem: old layers are not utilized by GC.
The test test_gc_old_layers.py in #3673 can be used to see effect of
this patch.

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-14 16:15:07 +02:00
Joonas Koivunen
c23c8946a3 chore: clippies introduced with rust 1.68 (#3781)
- handle automatically fixable future clippies
- tune run-clippy.sh to remove macos specifics which we no longer have

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-14 15:29:02 +02:00
Joonas Koivunen
15b692ccc9 test: more strict finding of WARN, ERROR lines (#3798)
this prevents flakyness when `WARN|ERROR` appears in some other part of
the line, for example in a random filename.
2023-03-14 15:27:52 +02:00
Alexander Bayandin
3d869cbcde Replace flake8 and isort with ruff (#3810)
- Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort
- Update mypy and black
2023-03-14 13:25:44 +00:00
Lassi Pölönen
68ae020b37 Use RollingUpdate strategy also for legacy proxy (#3814)
## Describe your changes
We have previously changed the neon-proxy to use RollingUpdate. This
should be enabled in legacy proxy too in order to avoid breaking
connections for the clients and allow for example backups to run even
during deployment. (https://github.com/neondatabase/neon/pull/3683)

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-03-14 13:23:46 +00:00
Joonas Koivunen
d6bb8caad4 refactor: correct return value for not found L0's on LayerMap::replace (#3805)
in prev implementation, an `ok_or_else(...)?` is used to cause a
"precondition error" on LayerMap::replace, however we only see this
particular error if an L0 for which replace fails is not in the layermap
because it is not in `l0_delta_layers`. changes or fixes this to be
Replacement::NotFound instead, making it more clear that an error would
only be raised for actual preconditions, like trying to replace layer
with completly unrelated layer.
2023-03-14 13:18:26 +02:00
Alexander Bayandin
319402fc74 postgres_ffi: restore POSTGRES_INSTALL_DIR support (#3811)
Fix path construction to `pg_config`: `pg_install_dir_versioned` 
already includes `pg_version`
2023-03-14 10:25:48 +00:00
Dmitry Ivanov
2e4bf7cee4 [proxy] Immediately log all compute node connection errors. 2023-03-14 01:45:57 +03:00
Dmitry Ivanov
15ed6af5f2 Add descriptions to proxy's python tests. 2023-03-14 01:32:37 +03:00
Dmitry Rodionov
50476a7cc7 test: update to match current interfaces 2023-03-13 17:50:10 +02:00
andres
d7ab69f303 add test for getting branchpoints from an inactive timeline 2023-03-13 17:50:10 +02:00
sharnoff
582620274a Enable file cache handling by vm-informant (#3794)
Enables the VM informant's file cache integration.

See also: https://github.com/neondatabase/autoscaling/pull/47
2023-03-13 07:16:39 -07:00
Vadim Kharitonov
daeaa767c4 Add neondatabase/release team as a default reviewers for storage
releases
2023-03-13 13:40:15 +01:00
Konstantin Knizhnik
f0573f5991 Remove block cursor cache (#3740)
## Describe your changes

Do not pin current block in BlockCursor

## Issue ticket number and link

See #3712 

There are places (see get_reconstruct_data) in our code when thread is
holding read layers lock and then try to read file and so lock page
cache slot. So we have edge in dependency graph layers->page cache slot.
At the same time (as Christian noticed) we can lock page cache slot in
BlockCursor and then try obtain shard lock on layers.
So there is backward edge in dependency graph page cache slot>layers
which forms loop and may cause deadlock.

There are three possible fixes of the problem:
1. Perform compaction under `layers` shared lock. See PR #3732. It fixes
the problem but make it not possible to append any data to pageserver
until compaction is completed.
2. Do not hold `layers` lock while accessing layers (not sure if it is
possible to do because it definitely introduce some new race
conditions).
3. Do not pin current pages in BockCursor (this PR).

My experiments shows that this cache in BlockCursor is not so useful:
the number of hits/misses for cursor cache on pgbench workload (-i -s
10/-c 10 -T 100/-c 10 -S -T 100):
```
hits: 163011
misses: 1023602
```
So number of cache misses is 10x times  larger.
And results for read-only pgbench are mostly the same:
```
with cache: 14581
w/out cache: 14429
```
 
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:07:35 +02:00
Nikita Kalyanov
07dcf679de set content type explicitly (#3799)
I moved management API v2 to ogen and the generated code seems to be
more strict about content type. Let's set it properly as it is json
after all

## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:00:01 +02:00
Heikki Linnakangas
e0ee138a8b Add a test for tokio-postgres client to the driver test suite.
It is fully supported. To enable TLS, though, it requires some extra
glue code, and a dependency to a TLS library.
2023-03-13 12:14:37 +02:00
Arthur Petukhovsky
d9a1329834 Make postgres_backend use generic IO type (#3789)
- Support measuring inbound and outbound traffic in MeasuredStream
- Start using MeasuredStream in safekeepers code
2023-03-13 12:18:10 +03:00
Joonas Koivunen
8699342249 Ondemand rx bytes and layer count (#3777)
Adds two new *global* metrics:
- pageserver_remote_ondemand_downloaded_layers_total
- pageserver_remote_ondemand_downloaded_bytes_total

An existing test is repurposed once more to check that we do get some
reasonable counts. These are to replace guessing from the nic RX bytes
metric how much was on-demand downloaded.

First part of #3745: This does not add the "(un)?avoidable" metric,
which I plan to add as a new metric, which will be a subset of the
counts of the metrics added here.
2023-03-13 09:26:49 +02:00
Joonas Koivunen
ce8fbbd910 Fix allowed error again (#3790)
Fixes #3360 again, this time checking all other "Error processing HTTP
request" messages and aligning the regex with the two others.
2023-03-10 17:44:12 +00:00
Vadim Kharitonov
1401021b21 Be able to get number of CPUs (#3774)
After enabling autoscaling, we faced the issue that customers are not
able to get the number of CPUs they use at this moment. Therefore I've
added these two options:

1. Postgresql function to allow customers to call it whenever they want
2. `compute_ctl` endpoint to show these number in console
2023-03-10 19:00:20 +02:00
Stas Kelvich
252b3685a2 Use unsafe-postgres feature to build pgx extension
Recently added `unsafe-postgres` feature allows to build pgx extensions
against postgres forks that decided to change their ABI name (like us).
With that we can build extensions without forking them and using stock
pgx. As this feature is new few manual version bumps were required.
2023-03-10 17:40:45 +02:00
Heikki Linnakangas
34d3385b2e Add unit tests for JWT encoding and decoding. 2023-03-10 16:09:32 +02:00
Heikki Linnakangas
b00530df2a Add section in internal docs on the JWT payload.
Just copied from the code comments. Could be improved, but this is a
start.
2023-03-10 16:09:32 +02:00
Heikki Linnakangas
bebf76c461 Accept RS384 and RS512 JWT tokens.
Previously, we only accepted RS256. Seems like a pointless limitation,
and when I was testing it with RS512 tokens, it took me a while to
understand why it wasn't working.
2023-03-10 16:09:32 +02:00
Vadim Kharitonov
2ceef91da1 Compile pg_tiktoken extension 2023-03-10 14:59:26 +01:00
Anastasia Lubennikova
b7fddfa70d Add branch_id field to proxy_io_bytes_per_client metric.
Since we allow switching endpoints between different branches, it is important to use composite key.
Otherwise, we may try to calculate delta between metric values for two different branches.
2023-03-10 15:00:48 +02:00
Heikki Linnakangas
d1537a49fa Fix escaping in postgresql.conf that we generate at compute startup
If there are any config options that contain single quotes or backslashes,
they need to be escaped
2023-03-10 14:59:21 +02:00
Heikki Linnakangas
856d01ff68 Add newline at end of postgresql.conf 2023-03-10 14:59:21 +02:00
Heikki Linnakangas
42ec79fb0d Make expected test output nicer to read.
By using Rust raw string literal.
2023-03-10 14:59:21 +02:00
Rory de Zoete
3c4f5af1b9 Try depot.dev for image building (#3768)
To see if it is faster. Run side-by-side for a while so we can gather
enough data.
2023-03-10 11:11:39 +01:00
Arseny Sher
290884ea3b Fix too many arguments in read_network clippy complain. 2023-03-10 10:50:03 +03:00
Arseny Sher
965837df53 Log connection ids in safekeeper instead of thread ids.
Fixes build on macOS (which doesn't have nix gettid) after 0d8ced8534.
2023-03-10 10:50:03 +03:00
163 changed files with 4989 additions and 1750 deletions

View File

@@ -91,6 +91,15 @@
tags:
- pageserver
# used in `pageserver.service` template
- name: learn current availability_zone
shell:
cmd: "curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone"
register: ec2_availability_zone
- set_fact:
ec2_availability_zone={{ ec2_availability_zone.stdout }}
- name: upload systemd service definition
ansible.builtin.template:
src: systemd/pageserver.service
@@ -118,7 +127,7 @@
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/pageservers
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -H "Content-Type: application/json" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/pageservers
tags:
- pageserver
@@ -153,6 +162,15 @@
tags:
- safekeeper
# used in `safekeeper.service` template
- name: learn current availability_zone
shell:
cmd: "curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone"
register: ec2_availability_zone
- set_fact:
ec2_availability_zone={{ ec2_availability_zone.stdout }}
# in the future safekeepers should discover pageservers byself
# but currently use first pageserver that was discovered
- name: set first pageserver var for safekeepers
@@ -188,6 +206,6 @@
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/safekeepers
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -H "Content-Type: application/json" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/safekeepers
tags:
- safekeeper

View File

@@ -27,6 +27,8 @@ storage:
ansible_host: i-0cd8d316ecbb715be
pageserver-1.eu-central-1.aws.neon.tech:
ansible_host: i-090044ed3d383fef0
pageserver-2.eu-central-1.aws.neon.tech:
ansible_host: i-033584edf3f4b6742
safekeepers:
hosts:

View File

@@ -26,7 +26,7 @@ EOF
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers -d@/tmp/payload | jq -r '.id')
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -H "Content-Type: application/json" {{ console_mgmt_base_url }}/management/api/v2/pageservers -d@/tmp/payload | jq -r '.id')
# init pageserver
sudo -u pageserver /usr/local/bin/pageserver -c "id=${ID}" -c "pg_distrib_dir='/usr/local'" --init -D /storage/pageserver/data

View File

@@ -25,7 +25,7 @@ EOF
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers -d@/tmp/payload | jq -r '.id')
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -H "Content-Type: application/json" {{ console_mgmt_base_url }}/management/api/v2/safekeepers -d@/tmp/payload | jq -r '.id')
# init safekeeper
sudo -u safekeeper /usr/local/bin/safekeeper --id ${ID} --init -D /storage/safekeeper/data
fi

View File

@@ -6,7 +6,7 @@ After=network.target auditd.service
Type=simple
User=pageserver
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/pageserver LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_PAGESERVER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoint='{{ broker_endpoint }}'" -D /storage/pageserver/data
ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoint='{{ broker_endpoint }}'" -c "availability_zone='{{ ec2_availability_zone }}'" -D /storage/pageserver/data
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT

View File

@@ -6,7 +6,7 @@ After=network.target auditd.service
Type=simple
User=safekeeper
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/safekeeper/data LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_SAFEKEEPER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}{{ hostname_suffix }}:6500 --listen-http {{ inventory_hostname }}{{ hostname_suffix }}:7676 -D /storage/safekeeper/data --broker-endpoint={{ broker_endpoint }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ safekeeper_s3_prefix }}"}'
ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}{{ hostname_suffix }}:6500 --listen-http {{ inventory_hostname }}{{ hostname_suffix }}:7676 -D /storage/safekeeper/data --broker-endpoint={{ broker_endpoint }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ safekeeper_s3_prefix }}"}' --availability-zone={{ ec2_availability_zone }}
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT

View File

@@ -1,6 +1,22 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 100%
maxUnavailable: 50%
# Delay the kill signal by 7 days (7 * 24 * 60 * 60)
# The pod(s) will stay in Terminating, keeps the existing connections
# but doesn't receive new ones
containerLifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 604800"]
terminationGracePeriodSeconds: 604800
image:
repository: neondatabase/neon

View File

@@ -74,15 +74,12 @@ jobs:
- name: Install Python deps
run: ./scripts/pysync
- name: Run isort to ensure code format
run: poetry run isort --diff --check .
- name: Run ruff to ensure code format
run: poetry run ruff .
- name: Run black to ensure code format
run: poetry run black --diff --check .
- name: Run flake8 to ensure code format
run: poetry run flake8 .
- name: Run mypy to check types
run: poetry run mypy .
@@ -551,6 +548,48 @@ jobs:
- name: Cleanup ECR folder
run: rm -rf ~/.ecr
neon-image-depot:
# For testing this will run side-by-side for a few merges.
# This action is not really optimized yet, but gets the job done
runs-on: [ self-hosted, gen3, large ]
needs: [ tag ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/base:pinned
permissions:
contents: read
id-token: write
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
- name: Setup go
uses: actions/setup-go@v3
with:
go-version: '1.19'
- name: Set up Depot CLI
uses: depot/setup-action@v1
- name: Install Crane & ECR helper
run: go install github.com/awslabs/amazon-ecr-credential-helper/ecr-login/cli/docker-credential-ecr-login@69c85dc22db6511932bbf119e1a0cc5c90c69a7f # v0.6.0
- name: Configure ECR login
run: |
mkdir /github/home/.docker/
echo "{\"credsStore\":\"ecr-login\"}" > /github/home/.docker/config.json
- name: Build and push
uses: depot/build-push-action@v1
with:
# if no depot.json file is at the root of your repo, you must specify the project id
project: nrdv0s4kcs
push: true
tags: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/neon:depot-${{needs.tag.outputs.build-tag}}
compute-tools-image:
runs-on: [ self-hosted, gen3, large ]
needs: [ tag ]

View File

@@ -31,3 +31,4 @@ jobs:
head: releases/${{ steps.date.outputs.date }}
base: release
title: Release ${{ steps.date.outputs.date }}
team_reviewers: release

3
Cargo.lock generated
View File

@@ -851,6 +851,7 @@ dependencies = [
"futures",
"hyper",
"notify",
"num_cpus",
"opentelemetry",
"postgres",
"regex",
@@ -3333,7 +3334,6 @@ dependencies = [
"humantime",
"hyper",
"metrics",
"nix",
"once_cell",
"parking_lot",
"postgres",
@@ -4532,6 +4532,7 @@ dependencies = [
"metrics",
"nix",
"once_cell",
"pin-project-lite",
"rand",
"routerify",
"sentry",

View File

@@ -64,6 +64,7 @@ md5 = "0.7.0"
memoffset = "0.8"
nix = "0.26"
notify = "5.0.0"
num_cpus = "1.15"
num-traits = "0.2.15"
once_cell = "1.13"
opentelemetry = "0.18.0"

View File

@@ -39,7 +39,7 @@ ARG CACHEPOT_BUCKET=neon-github-dev
COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server
COPY . .
COPY --chown=nonroot . .
# Show build caching stats to check if it was used in the end.
# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.

View File

@@ -323,7 +323,7 @@ RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain stable && \
rm rustup-init && \
cargo install --git https://github.com/vadim2404/pgx --branch neon_abi_v0.6.1 --locked cargo-pgx && \
cargo install --locked --version 0.7.3 cargo-pgx && \
/bin/bash -c 'cargo pgx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'
USER root
@@ -337,11 +337,11 @@ USER root
FROM rust-extensions-build AS pg-jsonschema-pg-build
RUN git clone --depth=1 --single-branch --branch neon_abi_v0.1.4 https://github.com/vadim2404/pg_jsonschema/ && \
cd pg_jsonschema && \
# there is no release tag yet, but we need it due to the superuser fix in the control file
RUN wget https://github.com/supabase/pg_jsonschema/archive/caeab60d70b2fd3ae421ec66466a3abbb37b7ee6.tar.gz -O pg_jsonschema.tar.gz && \
mkdir pg_jsonschema-src && cd pg_jsonschema-src && tar xvzf ../pg_jsonschema.tar.gz --strip-components=1 -C . && \
sed -i 's/pgx = "0.7.1"/pgx = { version = "0.7.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgx install --release && \
# it's needed to enable extension because it uses untrusted C language
sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_jsonschema.control && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_jsonschema.control
#########################################################################################
@@ -353,13 +353,32 @@ RUN git clone --depth=1 --single-branch --branch neon_abi_v0.1.4 https://github.
FROM rust-extensions-build AS pg-graphql-pg-build
RUN git clone --depth=1 --single-branch --branch neon_abi_v1.1.0 https://github.com/vadim2404/pg_graphql && \
cd pg_graphql && \
# Currently pgx version bump to >= 0.7.2 causes "call to unsafe function" compliation errors in
# pgx-contrib-spiext. There is a branch that removes that dependency, so use it. It is on the
# same 1.1 version we've used before.
RUN git clone -b remove-pgx-contrib-spiext --single-branch https://github.com/yrashk/pg_graphql && \
cd pg_graphql && \
sed -i 's/pgx = "~0.7.1"/pgx = { version = "0.7.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
sed -i 's/pgx-tests = "~0.7.1"/pgx-tests = "0.7.3"/g' Cargo.toml && \
cargo pgx install --release && \
# it's needed to enable extension because it uses untrusted C language
sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_graphql.control && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_graphql.control
#########################################################################################
#
# Layer "pg-tiktoken-build"
# Compile "pg_tiktoken" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-tiktoken-pg-build
RUN git clone --depth=1 --single-branch https://github.com/kelvich/pg_tiktoken && \
cd pg_tiktoken && \
cargo pgx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_tiktoken.control
#########################################################################################
#
# Layer "neon-pg-ext-build"
@@ -377,6 +396,7 @@ COPY --from=vector-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pgjwt-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-jsonschema-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-graphql-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-tiktoken-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=hypopg-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-hashids-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=rum-pg-build /usr/local/pgsql/ /usr/local/pgsql/
@@ -389,6 +409,10 @@ COPY pgxn/ pgxn/
RUN make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon \
-s install && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon_utils \
-s install
#########################################################################################

View File

@@ -1,25 +1,70 @@
# Note: this file *mostly* just builds on Dockerfile.compute-node
ARG SRC_IMAGE
ARG VM_INFORMANT_VERSION=v0.1.6
ARG VM_INFORMANT_VERSION=v0.1.14
# on libcgroup update, make sure to check bootstrap.sh for changes
ARG LIBCGROUP_VERSION=v2.0.3
# Pull VM informant and set up inittab
# Pull VM informant, to copy from later
FROM neondatabase/vm-informant:$VM_INFORMANT_VERSION as informant
# Build cgroup-tools
#
# At time of writing (2023-03-14), debian bullseye has a version of cgroup-tools (technically
# libcgroup) that doesn't support cgroup v2 (version 0.41-11). Unfortunately, the vm-informant
# requires cgroup v2, so we'll build cgroup-tools ourselves.
FROM debian:bullseye-slim as libcgroup-builder
ARG LIBCGROUP_VERSION
RUN set -exu \
&& apt update \
&& apt install --no-install-recommends -y \
git \
ca-certificates \
automake \
cmake \
make \
gcc \
byacc \
flex \
libtool \
libpam0g-dev \
&& git clone --depth 1 -b $LIBCGROUP_VERSION https://github.com/libcgroup/libcgroup \
&& INSTALL_DIR="/libcgroup-install" \
&& mkdir -p "$INSTALL_DIR/bin" "$INSTALL_DIR/include" \
&& cd libcgroup \
# extracted from bootstrap.sh, with modified flags:
&& (test -d m4 || mkdir m4) \
&& autoreconf -fi \
&& rm -rf autom4te.cache \
&& CFLAGS="-O3" ./configure --prefix="$INSTALL_DIR" --sysconfdir=/etc --localstatedir=/var --enable-opaque-hierarchy="name=systemd" \
# actually build the thing...
&& make install
# Combine, starting from non-VM compute node image.
FROM $SRC_IMAGE as base
# Temporarily set user back to root so we can run adduser, set inittab
USER root
RUN adduser vm-informant --disabled-password --no-create-home
RUN set -e \
&& rm -f /etc/inittab \
&& touch /etc/inittab
RUN set -e \
&& echo "::respawn:su vm-informant -c '/usr/local/bin/vm-informant --auto-restart'" >> /etc/inittab
&& echo "::sysinit:cgconfigparser -l /etc/cgconfig.conf -s 1664" >> /etc/inittab \
&& CONNSTR="dbname=neondb user=cloud_admin sslmode=disable" \
&& ARGS="--auto-restart --cgroup=neon-postgres --pgconnstr=\"$CONNSTR\"" \
&& echo "::respawn:su vm-informant -c '/usr/local/bin/vm-informant $ARGS'" >> /etc/inittab
# Combine, starting from non-VM compute node image.
FROM $SRC_IMAGE as base
# Temporarily set user back to root so we can run adduser
USER root
RUN adduser vm-informant --disabled-password --no-create-home
USER postgres
COPY --from=informant /etc/inittab /etc/inittab
ADD vm-cgconfig.conf /etc/cgconfig.conf
COPY --from=informant /usr/bin/vm-informant /usr/local/bin/vm-informant
COPY --from=libcgroup-builder /libcgroup-install/bin/* /usr/bin/
COPY --from=libcgroup-builder /libcgroup-install/lib/* /usr/lib/
COPY --from=libcgroup-builder /libcgroup-install/sbin/* /usr/sbin/
ENTRYPOINT ["/usr/sbin/cgexec", "-g", "*:neon-postgres", "/usr/local/bin/compute_ctl"]

View File

@@ -133,6 +133,11 @@ neon-pg-ext-%: postgres-%
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install
+@echo "Compiling neon_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install
.PHONY: neon-pg-ext-clean-%
neon-pg-ext-clean-%:
@@ -145,6 +150,9 @@ neon-pg-ext-clean-%:
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile clean
.PHONY: neon-pg-ext
neon-pg-ext: \

View File

@@ -46,11 +46,14 @@ postgresql-libs cmake postgresql protobuf
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
#### Installing dependencies on OSX (12.3.1)
#### Installing dependencies on macOS (12.3.1)
1. Install XCode and dependencies
```
xcode-select --install
brew install protobuf openssl flex bison
# add openssl to PATH, required for ed25519 keys generation in neon_local
echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc
```
2. [Install Rust](https://www.rust-lang.org/tools/install)

View File

@@ -11,6 +11,7 @@ clap.workspace = true
futures.workspace = true
hyper = { workspace = true, features = ["full"] }
notify.workspace = true
num_cpus.workspace = true
opentelemetry.workspace = true
postgres.workspace = true
regex.workspace = true

View File

@@ -133,6 +133,7 @@ fn main() -> Result<()> {
.settings
.find("neon.pageserver_connstring")
.expect("pageserver connstr should be provided");
let storage_auth_token = spec.storage_auth_token.clone();
let tenant = spec
.cluster
.settings
@@ -153,6 +154,7 @@ fn main() -> Result<()> {
tenant,
timeline,
pageserver_connstr,
storage_auth_token,
metrics: ComputeMetrics::default(),
state: RwLock::new(ComputeState::new()),
};

View File

@@ -18,6 +18,7 @@ use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::process::{Command, Stdio};
use std::str::FromStr;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::RwLock;
@@ -44,6 +45,7 @@ pub struct ComputeNode {
pub tenant: String,
pub timeline: String,
pub pageserver_connstr: String,
pub storage_auth_token: Option<String>,
pub metrics: ComputeMetrics,
/// Volatile part of the `ComputeNode` so should be used under `RwLock`
/// to allow HTTP API server to serve status requests, while configuration
@@ -126,7 +128,18 @@ impl ComputeNode {
fn get_basebackup(&self, lsn: &str) -> Result<()> {
let start_time = Utc::now();
let mut client = Client::connect(&self.pageserver_connstr, NoTls)?;
let mut config = postgres::Config::from_str(&self.pageserver_connstr)?;
// Use the storage auth token from the config file, if given.
// Note: this overrides any password set in the connection string.
if let Some(storage_auth_token) = &self.storage_auth_token {
info!("Got storage auth token from spec file");
config.password(storage_auth_token);
} else {
info!("Storage auth token not set");
}
let mut client = config.connect(NoTls)?;
let basebackup_cmd = match lsn {
"0/0" => format!("basebackup {} {}", &self.tenant, &self.timeline), // First start of the compute
_ => format!("basebackup {} {} {}", &self.tenant, &self.timeline, lsn),
@@ -163,6 +176,11 @@ impl ComputeNode {
let sync_handle = Command::new(&self.pgbin)
.args(["--sync-safekeepers"])
.env("PGDATA", &self.pgdata) // we cannot use -D in this mode
.envs(if let Some(storage_auth_token) = &self.storage_auth_token {
vec![("NEON_AUTH_TOKEN", storage_auth_token)]
} else {
vec![]
})
.stdout(Stdio::piped())
.spawn()
.expect("postgres --sync-safekeepers failed to start");
@@ -240,6 +258,11 @@ impl ComputeNode {
// Run postgres as a child process.
let mut pg = Command::new(&self.pgbin)
.args(["-D", &self.pgdata])
.envs(if let Some(storage_auth_token) = &self.storage_auth_token {
vec![("NEON_AUTH_TOKEN", storage_auth_token)]
} else {
vec![]
})
.spawn()
.expect("cannot start postgres process");

View File

@@ -7,6 +7,7 @@ use crate::compute::ComputeNode;
use anyhow::Result;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use num_cpus;
use serde_json;
use tracing::{error, info};
use tracing_utils::http::OtelName;
@@ -49,6 +50,17 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body
}
}
(&Method::GET, "/info") => {
let num_cpus = num_cpus::get_physical();
info!("serving /info GET request. num_cpus: {}", num_cpus);
Response::new(Body::from(
serde_json::json!({
"num_cpus": num_cpus,
})
.to_string(),
))
}
// Return the `404 Not Found` for any other routes.
_ => {
let mut not_found = Response::new(Body::from("404 Not Found"));

View File

@@ -53,6 +53,21 @@ paths:
schema:
$ref: "#/components/schemas/ComputeInsights"
/info:
get:
tags:
- "info"
summary: Get info about the compute Pod/VM
description: ""
operationId: getInfo
responses:
"200":
description: Info
content:
application/json:
schema:
$ref: "#/components/schemas/Info"
/check_writability:
post:
tags:
@@ -96,6 +111,15 @@ components:
total_startup_ms:
type: integer
Info:
type: object
description: Information about VM/Pod
required:
- num_cpus
properties:
num_cpus:
type: integer
ComputeState:
type: object
required:

View File

@@ -47,12 +47,23 @@ pub struct GenericOption {
/// declare a `trait` on it.
pub type GenericOptions = Option<Vec<GenericOption>>;
/// Escape a string for including it in a SQL literal
fn escape_literal(s: &str) -> String {
s.replace('\'', "''").replace('\\', "\\\\")
}
/// Escape a string so that it can be used in postgresql.conf.
/// Same as escape_literal, currently.
fn escape_conf_value(s: &str) -> String {
s.replace('\'', "''").replace('\\', "\\\\")
}
impl GenericOption {
/// Represent `GenericOption` as SQL statement parameter.
pub fn to_pg_option(&self) -> String {
if let Some(val) = &self.value {
match self.vartype.as_ref() {
"string" => format!("{} '{}'", self.name, val),
"string" => format!("{} '{}'", self.name, escape_literal(val)),
_ => format!("{} {}", self.name, val),
}
} else {
@@ -73,7 +84,7 @@ impl GenericOption {
};
match self.vartype.as_ref() {
"string" => format!("{} = '{}'", name, val),
"string" => format!("{} = '{}'", name, escape_conf_value(val)),
_ => format!("{} = {}", name, val),
}
} else {
@@ -109,6 +120,7 @@ impl PgOptionsSerialize for GenericOptions {
.map(|op| op.to_pg_setting())
.collect::<Vec<String>>()
.join("\n")
+ "\n" // newline after last setting
} else {
"".to_string()
}

View File

@@ -24,6 +24,8 @@ pub struct ComputeSpec {
pub cluster: Cluster,
pub delta_operations: Option<Vec<DeltaOp>>,
pub storage_auth_token: Option<String>,
pub startup_tracing_context: Option<HashMap<String, String>>,
}

View File

@@ -178,6 +178,11 @@
"name": "neon.pageserver_connstring",
"value": "host=127.0.0.1 port=6400",
"vartype": "string"
},
{
"name": "test.escaping",
"value": "here's a backslash \\ and a quote ' and a double-quote \" hooray",
"vartype": "string"
}
]
},

View File

@@ -28,7 +28,30 @@ mod pg_helpers_tests {
assert_eq!(
spec.cluster.settings.as_pg_settings(),
"fsync = off\nwal_level = replica\nhot_standby = on\nneon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'\nwal_log_hints = on\nlog_connections = on\nshared_buffers = 32768\nport = 55432\nmax_connections = 100\nmax_wal_senders = 10\nlisten_addresses = '0.0.0.0'\nwal_sender_timeout = 0\npassword_encryption = md5\nmaintenance_work_mem = 65536\nmax_parallel_workers = 8\nmax_worker_processes = 8\nneon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'\nmax_replication_slots = 10\nneon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'\nshared_preload_libraries = 'neon'\nsynchronous_standby_names = 'walproposer'\nneon.pageserver_connstring = 'host=127.0.0.1 port=6400'"
r#"fsync = off
wal_level = replica
hot_standby = on
neon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'
wal_log_hints = on
log_connections = on
shared_buffers = 32768
port = 55432
max_connections = 100
max_wal_senders = 10
listen_addresses = '0.0.0.0'
wal_sender_timeout = 0
password_encryption = md5
maintenance_work_mem = 65536
max_parallel_workers = 8
max_worker_processes = 8
neon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'
max_replication_slots = 10
neon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'
shared_preload_libraries = 'neon'
synchronous_standby_names = 'walproposer'
neon.pageserver_connstring = 'host=127.0.0.1 port=6400'
test.escaping = 'here''s a backslash \\ and a quote '' and a double-quote " hooray'
"#
);
}

View File

@@ -2,7 +2,8 @@
[pageserver]
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -3,7 +3,8 @@
[pageserver]
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -53,14 +53,15 @@ listen_addr = '{DEFAULT_BROKER_ADDR}'
id = {DEFAULT_PAGESERVER_ID}
listen_pg_addr = '{DEFAULT_PAGESERVER_PG_ADDR}'
listen_http_addr = '{DEFAULT_PAGESERVER_HTTP_ADDR}'
auth_type = '{pageserver_auth_type}'
pg_auth_type = '{trust_auth}'
http_auth_type = '{trust_auth}'
[[safekeepers]]
id = {DEFAULT_SAFEKEEPER_ID}
pg_port = {DEFAULT_SAFEKEEPER_PG_PORT}
http_port = {DEFAULT_SAFEKEEPER_HTTP_PORT}
"#,
pageserver_auth_type = AuthType::Trust,
trust_auth = AuthType::Trust,
)
}
@@ -627,7 +628,7 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
let node = cplane.nodes.get(&(tenant_id, node_name.to_string()));
let auth_token = if matches!(env.pageserver.auth_type, AuthType::NeonJWT) {
let auth_token = if matches!(env.pageserver.pg_auth_type, AuthType::NeonJWT) {
let claims = Claims::new(Some(tenant_id), Scope::Tenant);
Some(env.generate_auth_token(&claims)?)

View File

@@ -11,7 +11,6 @@ use std::sync::Arc;
use std::time::Duration;
use anyhow::{Context, Result};
use postgres_backend::AuthType;
use utils::{
id::{TenantId, TimelineId},
lsn::Lsn,
@@ -97,7 +96,7 @@ impl ComputeControlPlane {
});
node.create_pgdata()?;
node.setup_pg_conf(self.env.pageserver.auth_type)?;
node.setup_pg_conf()?;
self.nodes
.insert((tenant_id, node.name.clone()), Arc::clone(&node));
@@ -278,7 +277,7 @@ impl PostgresNode {
// Write postgresql.conf with default configuration
// and PG_VERSION file to the data directory of a new node.
fn setup_pg_conf(&self, auth_type: AuthType) -> Result<()> {
fn setup_pg_conf(&self) -> Result<()> {
let mut conf = PostgresConf::new();
conf.append("max_wal_senders", "10");
conf.append("wal_log_hints", "off");
@@ -302,29 +301,12 @@ impl PostgresNode {
let config = &self.pageserver.pg_connection_config;
let (host, port) = (config.host(), config.port());
// Set up authentication
//
// $NEON_AUTH_TOKEN will be replaced with value from environment
// variable during compute pg startup. It is done this way because
// otherwise user will be able to retrieve the value using SHOW
// command or pg_settings
let password = if let AuthType::NeonJWT = auth_type {
"$NEON_AUTH_TOKEN"
} else {
""
};
// NOTE avoiding spaces in connection string, because it is less error prone if we forward it somewhere.
// Also note that not all parameters are supported here. Because in compute we substitute $NEON_AUTH_TOKEN
// We parse this string and build it back with token from env var, and for simplicity rebuild
// uses only needed variables namely host, port, user, password.
format!("postgresql://no_user:{password}@{host}:{port}")
// NOTE: avoid spaces in connection string, because it is less error prone if we forward it somewhere.
format!("postgresql://no_user@{host}:{port}")
};
conf.append("shared_preload_libraries", "neon");
conf.append_line("");
conf.append("neon.pageserver_connstring", &pageserver_connstr);
if let AuthType::NeonJWT = auth_type {
conf.append("neon.safekeeper_token_env", "$NEON_AUTH_TOKEN");
}
conf.append("neon.tenant_id", &self.tenant_id.to_string());
conf.append("neon.timeline_id", &self.timeline_id.to_string());
if let Some(lsn) = self.lsn {
@@ -447,6 +429,8 @@ impl PostgresNode {
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version)?.to_str().unwrap(),
);
// Pass authentication token used for the connections to pageserver and safekeepers
if let Some(token) = auth_token {
cmd.env("NEON_AUTH_TOKEN", token);
}

View File

@@ -18,7 +18,7 @@ use std::net::SocketAddr;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use utils::{
auth::{encode_from_key_file, Claims, Scope},
auth::{encode_from_key_file, Claims},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
};
@@ -110,15 +110,14 @@ impl NeonBroker {
pub struct PageServerConf {
// node id
pub id: NodeId,
// Pageserver connection settings
pub listen_pg_addr: String,
pub listen_http_addr: String,
// used to determine which auth type is used
pub auth_type: AuthType,
// jwt auth token used for communication with pageserver
pub auth_token: String,
// auth type used for the PG and HTTP ports
pub pg_auth_type: AuthType,
pub http_auth_type: AuthType,
}
impl Default for PageServerConf {
@@ -127,8 +126,8 @@ impl Default for PageServerConf {
id: NodeId(0),
listen_pg_addr: String::new(),
listen_http_addr: String::new(),
auth_type: AuthType::Trust,
auth_token: String::new(),
pg_auth_type: AuthType::Trust,
http_auth_type: AuthType::Trust,
}
}
}
@@ -401,48 +400,33 @@ impl LocalEnv {
fs::create_dir(base_path)?;
// generate keys for jwt
// openssl genrsa -out private_key.pem 2048
let private_key_path;
// Generate keypair for JWT.
//
// The keypair is only needed if authentication is enabled in any of the
// components. For convenience, we generate the keypair even if authentication
// is not enabled, so that you can easily enable it after the initialization
// step. However, if the key generation fails, we treat it as non-fatal if
// authentication was not enabled.
if self.private_key_path == PathBuf::new() {
private_key_path = base_path.join("auth_private_key.pem");
let keygen_output = Command::new("openssl")
.arg("genrsa")
.args(["-out", private_key_path.to_str().unwrap()])
.arg("2048")
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
self.private_key_path = PathBuf::from("auth_private_key.pem");
let public_key_path = base_path.join("auth_public_key.pem");
// openssl rsa -in private_key.pem -pubout -outform PEM -out public_key.pem
let keygen_output = Command::new("openssl")
.arg("rsa")
.args(["-in", private_key_path.to_str().unwrap()])
.arg("-pubout")
.args(["-outform", "PEM"])
.args(["-out", public_key_path.to_str().unwrap()])
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
match generate_auth_keys(
base_path.join("auth_private_key.pem").as_path(),
base_path.join("auth_public_key.pem").as_path(),
) {
Ok(()) => {
self.private_key_path = PathBuf::from("auth_private_key.pem");
}
Err(e) => {
if !self.auth_keys_needed() {
eprintln!("Could not generate keypair for JWT authentication: {e}");
eprintln!("Continuing anyway because authentication was not enabled");
self.private_key_path = PathBuf::from("auth_private_key.pem");
} else {
return Err(e);
}
}
}
}
self.pageserver.auth_token =
self.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
fs::create_dir_all(self.pg_data_dirs_path())?;
for safekeeper in &self.safekeepers {
@@ -451,6 +435,12 @@ impl LocalEnv {
self.persist_config(base_path)
}
fn auth_keys_needed(&self) -> bool {
self.pageserver.pg_auth_type == AuthType::NeonJWT
|| self.pageserver.http_auth_type == AuthType::NeonJWT
|| self.safekeepers.iter().any(|sk| sk.auth_enabled)
}
}
fn base_path() -> PathBuf {
@@ -460,6 +450,43 @@ fn base_path() -> PathBuf {
}
}
/// Generate a public/private key pair for JWT authentication
fn generate_auth_keys(private_key_path: &Path, public_key_path: &Path) -> anyhow::Result<()> {
// Generate the key pair
//
// openssl genpkey -algorithm ed25519 -out auth_private_key.pem
let keygen_output = Command::new("openssl")
.arg("genpkey")
.args(["-algorithm", "ed25519"])
.args(["-out", private_key_path.to_str().unwrap()])
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
// Extract the public key from the private key file
//
// openssl pkey -in auth_private_key.pem -pubout -out auth_public_key.pem
let keygen_output = Command::new("openssl")
.arg("pkey")
.args(["-in", private_key_path.to_str().unwrap()])
.arg("-pubout")
.args(["-out", public_key_path.to_str().unwrap()])
.output()
.context("failed to extract public key from private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -82,15 +82,8 @@ impl PageServerNode {
let (host, port) = parse_host_port(&env.pageserver.listen_pg_addr)
.expect("Unable to parse listen_pg_addr");
let port = port.unwrap_or(5432);
let password = if env.pageserver.auth_type == AuthType::NeonJWT {
Some(env.pageserver.auth_token.clone())
} else {
None
};
Self {
pg_connection_config: PgConnectionConfig::new_host_port(host, port)
.set_password(password),
pg_connection_config: PgConnectionConfig::new_host_port(host, port),
env: env.clone(),
http_client: Client::new(),
http_base_url: format!("http://{}/v1", env.pageserver.listen_http_addr),
@@ -106,25 +99,32 @@ impl PageServerNode {
self.env.pg_distrib_dir_raw().display()
);
let authg_type_param = format!("auth_type='{}'", self.env.pageserver.auth_type);
let http_auth_type_param =
format!("http_auth_type='{}'", self.env.pageserver.http_auth_type);
let listen_http_addr_param = format!(
"listen_http_addr='{}'",
self.env.pageserver.listen_http_addr
);
let pg_auth_type_param = format!("pg_auth_type='{}'", self.env.pageserver.pg_auth_type);
let listen_pg_addr_param =
format!("listen_pg_addr='{}'", self.env.pageserver.listen_pg_addr);
let broker_endpoint_param = format!("broker_endpoint='{}'", self.env.broker.client_url());
let mut overrides = vec![
id,
pg_distrib_dir_param,
authg_type_param,
http_auth_type_param,
pg_auth_type_param,
listen_http_addr_param,
listen_pg_addr_param,
broker_endpoint_param,
];
if self.env.pageserver.auth_type != AuthType::Trust {
if self.env.pageserver.http_auth_type != AuthType::Trust
|| self.env.pageserver.pg_auth_type != AuthType::Trust
{
overrides.push("auth_validation_public_key_path='auth_public_key.pem'".to_owned());
}
overrides
@@ -247,7 +247,10 @@ impl PageServerNode {
}
fn pageserver_env_variables(&self) -> anyhow::Result<Vec<(String, String)>> {
Ok(if self.env.pageserver.auth_type != AuthType::Trust {
// FIXME: why is this tied to pageserver's auth type? Whether or not the safekeeper
// needs a token, and how to generate that token, seems independent to whether
// the pageserver requires a token in incoming requests.
Ok(if self.env.pageserver.http_auth_type != AuthType::Trust {
// Generate a token to connect from the pageserver to a safekeeper
let token = self
.env
@@ -270,27 +273,30 @@ impl PageServerNode {
background_process::stop_process(immediate, "pageserver", &self.pid_file())
}
pub fn page_server_psql(&self, sql: &str) -> Vec<postgres::SimpleQueryMessage> {
let mut client = self.pg_connection_config.connect_no_tls().unwrap();
println!("Pageserver query: '{sql}'");
client.simple_query(sql).unwrap()
}
pub fn page_server_psql_client(&self) -> result::Result<postgres::Client, postgres::Error> {
self.pg_connection_config.connect_no_tls()
}
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
let mut builder = self.http_client.request(method, url);
if self.env.pageserver.auth_type == AuthType::NeonJWT {
builder = builder.bearer_auth(&self.env.pageserver.auth_token)
pub fn page_server_psql_client(&self) -> anyhow::Result<postgres::Client> {
let mut config = self.pg_connection_config.clone();
if self.env.pageserver.pg_auth_type == AuthType::NeonJWT {
let token = self
.env
.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
config = config.set_password(Some(token));
}
builder
Ok(config.connect_no_tls()?)
}
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> anyhow::Result<RequestBuilder> {
let mut builder = self.http_client.request(method, url);
if self.env.pageserver.http_auth_type == AuthType::NeonJWT {
let token = self
.env
.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
builder = builder.bearer_auth(token)
}
Ok(builder)
}
pub fn check_status(&self) -> Result<()> {
self.http_request(Method::GET, format!("{}/status", self.http_base_url))
self.http_request(Method::GET, format!("{}/status", self.http_base_url))?
.send()?
.error_from_body()?;
Ok(())
@@ -298,7 +304,7 @@ impl PageServerNode {
pub fn tenant_list(&self) -> Result<Vec<TenantInfo>> {
Ok(self
.http_request(Method::GET, format!("{}/tenant", self.http_base_url))
.http_request(Method::GET, format!("{}/tenant", self.http_base_url))?
.send()?
.error_from_body()?
.json()?)
@@ -352,11 +358,16 @@ impl PageServerNode {
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'trace_read_requests' as bool")?,
eviction_policy: settings
.get("eviction_policy")
.map(|x| serde_json::from_str(x))
.transpose()
.context("Failed to parse 'eviction_policy' json")?,
};
if !settings.is_empty() {
bail!("Unrecognized tenant settings: {settings:?}")
}
self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))
self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))?
.json(&request)
.send()?
.error_from_body()?
@@ -373,7 +384,7 @@ impl PageServerNode {
}
pub fn tenant_config(&self, tenant_id: TenantId, settings: HashMap<&str, &str>) -> Result<()> {
self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))
self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))?
.json(&TenantConfigRequest {
tenant_id,
checkpoint_distance: settings
@@ -436,7 +447,7 @@ impl PageServerNode {
.http_request(
Method::GET,
format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
)
)?
.send()?
.error_from_body()?
.json()?;
@@ -455,7 +466,7 @@ impl PageServerNode {
self.http_request(
Method::POST,
format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
)
)?
.json(&TimelineCreateRequest {
new_timeline_id,
ancestor_start_lsn,
@@ -492,7 +503,7 @@ impl PageServerNode {
pg_wal: Option<(Lsn, PathBuf)>,
pg_version: u32,
) -> anyhow::Result<()> {
let mut client = self.pg_connection_config.connect_no_tls().unwrap();
let mut client = self.page_server_psql_client()?;
// Init base reader
let (start_lsn, base_tarfile_path) = base;

View File

@@ -1,7 +1,6 @@
use std::io::Write;
use std::path::PathBuf;
use std::process::Child;
use std::sync::Arc;
use std::{io, result};
use anyhow::Context;
@@ -11,7 +10,6 @@ use reqwest::{IntoUrl, Method};
use thiserror::Error;
use utils::{http::error::HttpErrorBody, id::NodeId};
use crate::pageserver::PageServerNode;
use crate::{
background_process,
local_env::{LocalEnv, SafekeeperConf},
@@ -65,14 +63,10 @@ pub struct SafekeeperNode {
pub env: LocalEnv,
pub http_client: Client,
pub http_base_url: String,
pub pageserver: Arc<PageServerNode>,
}
impl SafekeeperNode {
pub fn from_env(env: &LocalEnv, conf: &SafekeeperConf) -> SafekeeperNode {
let pageserver = Arc::new(PageServerNode::from_env(env));
SafekeeperNode {
id: conf.id,
conf: conf.clone(),
@@ -80,7 +74,6 @@ impl SafekeeperNode {
env: env.clone(),
http_client: Client::new(),
http_base_url: format!("http://127.0.0.1:{}/v1", conf.http_port),
pageserver,
}
}
@@ -115,6 +108,10 @@ impl SafekeeperNode {
let datadir = self.datadir_path();
let id_string = id.to_string();
// TODO: add availability_zone to the config.
// Right now we just specify any value here and use it to check metrics in tests.
let availability_zone = format!("sk-{}", id_string);
let mut args = vec![
"-D",
datadir.to_str().with_context(|| {
@@ -126,6 +123,8 @@ impl SafekeeperNode {
&listen_pg,
"--listen-http",
&listen_http,
"--availability-zone",
&availability_zone,
];
if !self.conf.sync {
args.push("--no-sync");

View File

@@ -160,6 +160,7 @@ services:
build:
context: ./compute_wrapper/
args:
- REPOSITORY=${REPOSITORY:-neondatabase}
- COMPUTE_IMAGE=compute-node-v${PG_VERSION:-14}
- TAG=${TAG:-latest}
- http_proxy=$http_proxy

View File

@@ -29,12 +29,54 @@ These components should not have access to the private key and may only get toke
The key pair is generated once for an installation of compute/pageserver/safekeeper, e.g. by `neon_local init`.
There is currently no way to rotate the key without bringing down all components.
### Best practices
See [RFC 8725: JSON Web Token Best Current Practices](https://www.rfc-editor.org/rfc/rfc8725)
### Token format
The JWT tokens in Neon use "EdDSA" as the algorithm (defined in [RFC8037](https://www.rfc-editor.org/rfc/rfc8037)).
Example:
Header:
```
{
"alg": "EdDSA",
"typ": "JWT"
}
```
Payload:
```
{
"scope": "tenant", # "tenant", "pageserverapi", or "safekeeperdata"
"tenant_id": "5204921ff44f09de8094a1390a6a50f6",
}
```
Meanings of scope:
"tenant": Provides access to all data for a specific tenant
"pageserverapi": Provides blanket access to all tenants on the pageserver plus pageserver-wide APIs.
Should only be used e.g. for status check/tenant creation/list.
"safekeeperdata": Provides blanket access to all data on the safekeeper plus safekeeper-wide APIs.
Should only be used e.g. for status check.
Currently also used for connection from any pageserver to any safekeeper.
### CLI
CLI generates a key pair during call to `neon_local init` with the following commands:
```bash
openssl genrsa -out auth_private_key.pem 2048
openssl rsa -in auth_private_key.pem -pubout -outform PEM -out auth_public_key.pem
openssl genpkey -algorithm ed25519 -out auth_private_key.pem
openssl pkey -in auth_private_key.pem -pubout -out auth_public_key.pem
```
Configuration files for all components point to `public_key.pem` for JWT validation.
@@ -64,20 +106,22 @@ Their authentication is just plain PostgreSQL authentication and out of scope fo
There is no administrative API except those provided by PostgreSQL.
#### Outgoing connections
Compute connects to Pageserver for getting pages.
The connection string is configured by the `neon.pageserver_connstring` PostgreSQL GUC, e.g. `postgresql://no_user:$NEON_AUTH_TOKEN@localhost:15028`.
The environment variable inside the connection string is substituted with
the JWT token.
Compute connects to Pageserver for getting pages. The connection string is
configured by the `neon.pageserver_connstring` PostgreSQL GUC,
e.g. `postgresql://no_user@localhost:15028`. If the `$NEON_AUTH_TOKEN`
environment variable is set, it is used as the password for the connection. (The
pageserver uses JWT tokens for authentication, so the password is really a
token.)
Compute connects to Safekeepers to write and commit data.
The token is the same for all safekeepers.
It's stored in an environment variable, whose name is configured
by the `neon.safekeeper_token_env` PostgreSQL GUC.
If the GUC is unset, no token is passed.
Compute connects to Safekeepers to write and commit data. The list of safekeeper
addresses is given in the `neon.safekeepers` GUC. The connections to the
safekeepers take the password from the `$NEON_AUTH_TOKEN` environment
variable, if set.
Note that both tokens can be (and typically are) the same;
the scope is the tenant and the token is usually passed through the
`$NEON_AUTH_TOKEN` environment variable.
The `compute_ctl` binary that runs before the PostgreSQL server, and launches
PostgreSQL, also makes a connection to the pageserver. It uses it to fetch the
initial "base backup" dump, to initialize the PostgreSQL data directory. It also
uses `$NEON_AUTH_TOKEN` as the password for the connection.
### Pageserver
#### Overview
@@ -102,10 +146,12 @@ Each compute should present a token valid for the timeline's tenant.
Pageserver also has HTTP API: some parts are per-tenant,
some parts are server-wide, these are different scopes.
The `auth_type` configuration variable in Pageserver's config may have
either of three values:
Authentication can be enabled separately for the HTTP mgmt API, and
for the libpq connections from compute. The `http_auth_type` and
`pg_auth_type` configuration variables in Pageserver's config may
have one of these values:
* `Trust` removes all authentication. The outdated `MD5` value does likewise
* `Trust` removes all authentication.
* `NeonJWT` enables JWT validation.
Tokens are validated using the public key which lies in a PEM file
specified in the `auth_validation_public_key_path` config.

View File

@@ -37,9 +37,9 @@ You can specify version of neon cluster using following environment values.
- PG_VERSION: postgres version for compute (default is 14)
- TAG: the tag version of [docker image](https://registry.hub.docker.com/r/neondatabase/neon/tags) (default is latest), which is tagged in [CI test](/.github/workflows/build_and_test.yml)
```
$ cd docker-compose/docker-compose.yml
$ cd docker-compose/
$ docker-compose down # remove the conainers if exists
$ PG_VERSION=15 TAG=2221 docker-compose up --build -d # You can specify the postgres and image version
$ PG_VERSION=15 TAG=2937 docker-compose up --build -d # You can specify the postgres and image version
Creating network "dockercompose_default" with the default driver
Creating docker-compose_storage_broker_1 ... done
(...omit...)

View File

@@ -0,0 +1,269 @@
# Deleting pageserver part of tenants data from s3
Created on 08.03.23
## Motivation
Currently we dont delete pageserver part of the data from s3 when project is deleted. (The same is true for safekeepers, but this outside of the scope of this RFC).
This RFC aims to spin a discussion to come to a robust deletion solution that wont put us in into a corner for features like postponed deletion (when we keep data for user to be able to restore a project if it was deleted by accident)
## Summary
TLDR; There are two options, one based on control plane issuing actual delete requests to s3 and the other one that keeps s3 stuff bound to pageserver. Each one has its pros and cons.
The decision is to stick with pageserver centric approach. For motivation see [Decision](#decision).
## Components
pageserver, control-plane
## Requirements
Deletion should successfully finish (eventually) without leaving dangling files in presense of:
- component restarts
- component outage
- pageserver loss
## Proposed implementation
Before the options are discussed, note that deletion can be quite long process. For deletion from s3 the obvious choice is [DeleteObjects](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) API call. It allows to batch deletion of up to 1k objects in one API call. So deletion operation linearly depends on number of layer files.
Another design limitation is that there is no cheap `mv` operation available for s3. `mv` from `aws s3 mv` uses `copy(src, dst) + delete(src)`. So `mv`-like operation is not feasible as a building block because it actually amplifies the problem with both duration and resulting cost of the operation.
The case when there are multiple pageservers handling the same tenants is largely out of scope of the RFC. We still consider case with migration from one PS to another, but do not consider case when tenant exists on multiple pageservers for extended period of time. The case with multiple pageservers can be reduced to case with one pageservers by calling detach on all pageservers except the last one, for it actual delete needs to be called.
For simplicity lets look into deleting tenants. Differences in deletion process between tenants and timelines are mentioned in paragraph ["Differences between tenants and timelines"](#differences-between-tenants-and-timelines)
### 1. Pageserver owns deletion machinery
#### The sequence
TLDR; With this approach control plane needs to call delete on a tenant and poll for progress. As much as possible is handled on pageserver. Lets see the sequence.
Happy path:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>PS: Create deleted mark file locally
PS->>CP: Accepted
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
loop Poll for status
CP->>PS: Finished?
PS->>CP: True or False
end
```
Why two mark files?
Remote one is needed for cases when pageserver is lost during deletion so other pageserver can learn the deletion from s3 during attach.
Why local mark file is needed?
If we dont have one, we have two choices, delete local data before deleting the remote part or do that after.
If we delete local data before remote then during restart pageserver wont pick up remote tenant at all because nothing is available locally (pageserver looks for remote conuterparts of locally available tenants).
If we delete local data after remote then at the end of the sequence when remote mark file is deleted if pageserver restart happens then the state is the same to situation when pageserver just missing data on remote without knowing the fact that this data is intended to be deleted. In this case the current behavior is upload everything local-only to remote.
Thus we need local record of tenant being deleted as well.
##### Handle pageserver crashes
Lets explore sequences with various crash points.
Pageserver crashes before `deleted` mark file is persisted in s3:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
note over PS: Crash point 1.
CP->>PS: Retry delete request
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>PS: Create deleted mark file locally
PS->>CP: Accepted
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
CP->>PS: Finished?
PS->>CP: True
```
Pageserver crashed when deleted mark was about to be persisted in s3, before Control Plane gets a response:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
note over PS: Crash point 2.
note over PS: During startup we reconcile <br> with remote and see <br> whether the remote mark exists
alt Remote mark exists
PS->>PS: create local mark if its missing
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
end
note over CP: Eventually console should <br> retry delete request
CP->>PS: Retry delete tenant
PS->>CP: Not modified
else Mark is missing
note over PS: Continue to operate the tenant as if deletion didnt happen
note over CP: Eventually console should <br> retry delete request
CP->>PS: Retry delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>CP: Delete tenant
end
PS->>PS: Continue with layer file deletions
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
CP->>PS: Finished?
PS->>CP: True
```
Similar sequence applies when both local and remote marks were persisted but Control Plane still didnt receive a response.
If pageserver crashes after both mark files were deleted then it will reply to control plane status poll request with 404 which should be treated by control plane as success.
The same applies if pageserver crashes in the end, when remote mark is deleted but before local one gets deleted. In this case on restart pageserver moves forward with deletion of local mark and Control Plane will receive 404.
##### Differences between tenants and timelines
For timeline the sequence is the same with the following differences:
- remote delete mark file can be replaced with a boolean "deleted" flag in index_part.json
- local deletion mark is not needed, because whole tenant is kept locally so situation described in motivation for local mark is impossible
##### Handle pageserver loss
If pageseserver is lost then the deleted tenant should be attached to different pageserver and delete request needs to be retried against new pageserver. Then attach logic is shared with one described for pageserver restarts (local deletion mark wont be available so needs to be created).
##### Restrictions for tenant that is in progress of being deleted
I propose to add another state to tenant/timeline - PendingDelete. This state shouldnt allow executing any operations aside from polling the deletion status.
#### Summary
Pros:
- Storage is not dependent on control plane. Storage can be restarted even if control plane is not working.
- Allows for easier dogfooding, console can use Neon backed database as primary operational data store. If storage depends on control plane and control plane depends on storage we're stuck.
- No need to share inner s3 workings with control plane. Pageserver presents api contract and S3 paths are not part of this contract.
- No need to pass list of alive timelines to attach call. This will be solved by pageserver observing deleted flag. See
Cons:
- Logic is a tricky, needs good testing
- Anything else?
### 2. Control plane owns deletion machinery
In this case the only action performed on pageserver is removal of local files.
Everything else is done by control plane. The steps are as follows:
1. Control plane marks tenant as "delete pending" in its database
2. It lists the s3 for all the files and repeatedly calls delete until nothing is left behind
3. When no files are left marks deletion as completed
In case of restart it selects all tenants marked as "delete pending" and continues the deletion.
For tenants it is simple. For timelines there are caveats.
Assume that the same workflow is used for timelines.
If a tenant gets relocated during timeline deletion the attach call with its current logic will pick up deleted timeline in its half deleted state.
Available options:
- require list of alive timelines to be passed to attach call
- use the same schema with flag in index_part.json (again part of the caveats around pageserver restart applies). In this case nothing stops pageserver from implementing deletion inside if we already have these deletion marks.
With first option the following problem becomes apparent:
Who is the source of truth regarding timeline liveness?
Imagine:
PS1 fails.
PS2 gets assigned the tenant.
New branch gets created
PS1 starts up (is it possible or we just recycle it?)
PS1 is unaware of the new branch. It can either fall back to s3 ls, or ask control plane.
So here comes the dependency of storage on control plane. During restart storage needs to know which timelines are valid for operation. If there is nothing on s3 that can answer that question storage neeeds to ask control plane.
### Summary
Cons:
- Potential thundering herd-like problem during storage restart (requests to control plane)
- Potential increase in storage startup time (additional request to control plane)
- Storage startup starts to depend on console
- Erroneous attach call can attach tenant in half deleted state
Pros:
- Easier to reason about if you dont have to account for pageserver restarts
### Extra notes
There was a concern that having deletion code in pageserver is a littlebit scary, but we need to have this code somewhere. So to me it is equally scary to have that in whatever place it ends up at.
Delayed deletion can be done with both approaches. As discussed with Anna (@stepashka) this is only relevant for tenants (projects) not for timelines. For first approach detach can be called immediately and deletion can be done later with attach + delete. With second approach control plane needs to start the deletion whenever necessary.
## Decision
After discussion in comments I see that we settled on two options (though a bit different from ones described in rfc). First one is the same - pageserver owns as much as possible. The second option is that pageserver owns markers thing, but actual deletion happens in control plane by repeatedly calling ls + delete.
To my mind the only benefit of the latter approach is possible code reuse between safekeepers and pageservers. Otherwise poking around integrating s3 library into control plane, configuring shared knowledge abouth paths in s3 - are the downsides. Another downside of relying on control plane is the testing process. Control plane resides in different repository so it is quite hard to test pageserver related changes there. e2e test suite there doesnt support shutting down pageservers, which are separate docker containers there instead of just processes.
With pageserver owning everything we still give the retry logic to control plane but its easier to duplicate if needed compared to sharing inner s3 workings. We will have needed tests for retry logic in neon repo.
So the decision is to proceed with pageserver centric approach.

View File

@@ -129,13 +129,12 @@ Run `poetry shell` to activate the virtual environment.
Alternatively, use `poetry run` to run a single command in the venv, e.g. `poetry run pytest`.
### Obligatory checks
We force code formatting via `black`, `isort` and type hints via `mypy`.
We force code formatting via `black`, `ruff`, and type hints via `mypy`.
Run the following commands in the repository's root (next to `pyproject.toml`):
```bash
poetry run isort . # Imports are reformatted
poetry run black . # All code is reformatted
poetry run flake8 . # Python linter
poetry run ruff . # Python linter
poetry run mypy . # Ensure there are no typing errors
```

View File

@@ -115,6 +115,11 @@ pub struct TenantCreateRequest {
pub lagging_wal_timeout: Option<String>,
pub max_lsn_wal_lag: Option<NonZeroU64>,
pub trace_read_requests: Option<bool>,
// We defer the parsing of the eviction_policy field to the request handler.
// Otherwise we'd have to move the types for eviction policy into this package.
// We might do that once the eviction feature has stabilizied.
// For now, this field is not even documented in the openapi_spec.yml.
pub eviction_policy: Option<serde_json::Value>,
}
#[serde_as]
@@ -341,7 +346,7 @@ pub enum InMemoryLayerInfo {
pub enum HistoricLayerInfo {
Delta {
layer_file_name: String,
layer_file_size: Option<u64>,
layer_file_size: u64,
#[serde_as(as = "DisplayFromStr")]
lsn_start: Lsn,
@@ -352,7 +357,7 @@ pub enum HistoricLayerInfo {
},
Image {
layer_file_name: String,
layer_file_size: Option<u64>,
layer_file_size: u64,
#[serde_as(as = "DisplayFromStr")]
lsn_start: Lsn,

View File

@@ -59,14 +59,14 @@ pub fn is_expected_io_error(e: &io::Error) -> bool {
}
#[async_trait::async_trait]
pub trait Handler {
pub trait Handler<IO> {
/// Handle single query.
/// postgres_backend will issue ReadyForQuery after calling this (this
/// might be not what we want after CopyData streaming, but currently we don't
/// care). It will also flush out the output buffer.
async fn process_query(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackend<IO>,
query_string: &str,
) -> Result<(), QueryError>;
@@ -77,7 +77,7 @@ pub trait Handler {
/// to override whole init logic in implementations.
fn startup(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackend<IO>,
_sm: &FeStartupPacket,
) -> Result<(), QueryError> {
Ok(())
@@ -86,7 +86,7 @@ pub trait Handler {
/// Check auth jwt
fn check_auth_jwt(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackend<IO>,
_jwt_response: &[u8],
) -> Result<(), QueryError> {
Err(QueryError::Other(anyhow::anyhow!("JWT auth failed")))
@@ -115,12 +115,12 @@ pub enum ProcessMsgResult {
}
/// Either plain TCP stream or encrypted one, implementing AsyncRead + AsyncWrite.
pub enum MaybeTlsStream {
Unencrypted(tokio::net::TcpStream),
Tls(Box<tokio_rustls::server::TlsStream<tokio::net::TcpStream>>),
pub enum MaybeTlsStream<IO> {
Unencrypted(IO),
Tls(Box<tokio_rustls::server::TlsStream<IO>>),
}
impl AsyncWrite for MaybeTlsStream {
impl<IO: AsyncRead + AsyncWrite + Unpin> AsyncWrite for MaybeTlsStream<IO> {
fn poll_write(
self: Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
@@ -147,7 +147,7 @@ impl AsyncWrite for MaybeTlsStream {
}
}
}
impl AsyncRead for MaybeTlsStream {
impl<IO: AsyncRead + AsyncWrite + Unpin> AsyncRead for MaybeTlsStream<IO> {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
@@ -192,13 +192,13 @@ impl fmt::Display for AuthType {
/// PostgresBackend after call to `split`. In principle we could always store a
/// pair of splitted handles, but that would force to to pay splitting price
/// (Arc and kinda mutex inside polling) for all uses (e.g. pageserver).
enum MaybeWriteOnly {
Full(Framed<MaybeTlsStream>),
WriteOnly(FramedWriter<MaybeTlsStream>),
enum MaybeWriteOnly<IO> {
Full(Framed<MaybeTlsStream<IO>>),
WriteOnly(FramedWriter<MaybeTlsStream<IO>>),
Broken, // temporary value palmed off during the split
}
impl MaybeWriteOnly {
impl<IO: AsyncRead + AsyncWrite + Unpin> MaybeWriteOnly<IO> {
async fn read_startup_message(&mut self) -> Result<Option<FeStartupPacket>, ConnectionError> {
match self {
MaybeWriteOnly::Full(framed) => framed.read_startup_message().await,
@@ -244,8 +244,8 @@ impl MaybeWriteOnly {
}
}
pub struct PostgresBackend {
framed: MaybeWriteOnly,
pub struct PostgresBackend<IO> {
framed: MaybeWriteOnly<IO>,
pub state: ProtoState,
@@ -255,6 +255,8 @@ pub struct PostgresBackend {
pub tls_config: Option<Arc<rustls::ServerConfig>>,
}
pub type PostgresBackendTCP = PostgresBackend<tokio::net::TcpStream>;
pub fn query_from_cstring(query_string: Bytes) -> Vec<u8> {
let mut query_string = query_string.to_vec();
if let Some(ch) = query_string.last() {
@@ -271,7 +273,7 @@ fn cstr_to_str(bytes: &[u8]) -> anyhow::Result<&str> {
std::str::from_utf8(without_null).map_err(|e| e.into())
}
impl PostgresBackend {
impl PostgresBackend<tokio::net::TcpStream> {
pub fn new(
socket: tokio::net::TcpStream,
auth_type: AuthType,
@@ -288,6 +290,25 @@ impl PostgresBackend {
peer_addr,
})
}
}
impl<IO: AsyncRead + AsyncWrite + Unpin> PostgresBackend<IO> {
pub fn new_from_io(
socket: IO,
peer_addr: SocketAddr,
auth_type: AuthType,
tls_config: Option<Arc<rustls::ServerConfig>>,
) -> io::Result<Self> {
let stream = MaybeTlsStream::Unencrypted(socket);
Ok(Self {
framed: MaybeWriteOnly::Full(Framed::new(stream)),
state: ProtoState::Initialization,
auth_type,
tls_config,
peer_addr,
})
}
pub fn get_peer_addr(&self) -> &SocketAddr {
&self.peer_addr
@@ -346,14 +367,14 @@ impl PostgresBackend {
/// to it in CopyData messages, and writes them to the connection
///
/// The caller is responsible for sending CopyOutResponse and CopyDone messages.
pub fn copyout_writer(&mut self) -> CopyDataWriter {
pub fn copyout_writer(&mut self) -> CopyDataWriter<IO> {
CopyDataWriter { pgb: self }
}
/// Wrapper for run_message_loop() that shuts down socket when we are done
pub async fn run<F, S>(
mut self,
handler: &mut impl Handler,
handler: &mut impl Handler<IO>,
shutdown_watcher: F,
) -> Result<(), QueryError>
where
@@ -369,7 +390,7 @@ impl PostgresBackend {
async fn run_message_loop<F, S>(
&mut self,
handler: &mut impl Handler,
handler: &mut impl Handler<IO>,
shutdown_watcher: F,
) -> Result<(), QueryError>
where
@@ -426,9 +447,9 @@ impl PostgresBackend {
/// Try to upgrade MaybeTlsStream into actual TLS one, performing handshake.
async fn tls_upgrade(
src: MaybeTlsStream,
src: MaybeTlsStream<IO>,
tls_config: Arc<rustls::ServerConfig>,
) -> anyhow::Result<MaybeTlsStream> {
) -> anyhow::Result<MaybeTlsStream<IO>> {
match src {
MaybeTlsStream::Unencrypted(s) => {
let acceptor = TlsAcceptor::from(tls_config);
@@ -466,7 +487,7 @@ impl PostgresBackend {
/// Split off owned read part from which messages can be read in different
/// task/thread.
pub fn split(&mut self) -> anyhow::Result<PostgresBackendReader> {
pub fn split(&mut self) -> anyhow::Result<PostgresBackendReader<IO>> {
// temporary replace stream with fake to cook split one, Indiana Jones style
match std::mem::replace(&mut self.framed, MaybeWriteOnly::Broken) {
MaybeWriteOnly::Full(framed) => {
@@ -482,7 +503,7 @@ impl PostgresBackend {
}
/// Join read part back.
pub fn unsplit(&mut self, reader: PostgresBackendReader) -> anyhow::Result<()> {
pub fn unsplit(&mut self, reader: PostgresBackendReader<IO>) -> anyhow::Result<()> {
// temporary replace stream with fake to cook joined one, Indiana Jones style
match std::mem::replace(&mut self.framed, MaybeWriteOnly::Broken) {
MaybeWriteOnly::Full(_) => {
@@ -499,7 +520,7 @@ impl PostgresBackend {
/// Perform handshake with the client, transitioning to Established.
/// In case of EOF during handshake logs this, sets state to Closed and returns Ok(()).
async fn handshake(&mut self, handler: &mut impl Handler) -> Result<(), QueryError> {
async fn handshake(&mut self, handler: &mut impl Handler<IO>) -> Result<(), QueryError> {
while self.state < ProtoState::Authentication {
match self.framed.read_startup_message().await? {
Some(msg) => {
@@ -565,7 +586,7 @@ impl PostgresBackend {
/// actual startup packet.
async fn process_startup_message(
&mut self,
handler: &mut impl Handler,
handler: &mut impl Handler<IO>,
msg: FeStartupPacket,
) -> Result<(), QueryError> {
assert!(self.state < ProtoState::Authentication);
@@ -629,7 +650,7 @@ impl PostgresBackend {
async fn process_message(
&mut self,
handler: &mut impl Handler,
handler: &mut impl Handler<IO>,
msg: FeMessage,
unnamed_query_string: &mut Bytes,
) -> Result<ProcessMsgResult, QueryError> {
@@ -746,7 +767,7 @@ impl PostgresBackend {
let err_to_send_and_errcode = match &end {
ServerInitiated(_) => Some((end.to_string(), SQLSTATE_SUCCESSFUL_COMPLETION)),
Other(_) => Some((end.to_string(), SQLSTATE_INTERNAL_ERROR)),
Other(_) => Some((format!("{end:#}"), SQLSTATE_INTERNAL_ERROR)),
// Note: CopyFail in duplex copy is somewhat unexpected (at least to
// PG walsender; evidently and per my docs reading client should
// finish it with CopyDone). It is not a problem to recover from it
@@ -776,9 +797,9 @@ impl PostgresBackend {
}
}
pub struct PostgresBackendReader(FramedReader<MaybeTlsStream>);
pub struct PostgresBackendReader<IO>(FramedReader<MaybeTlsStream<IO>>);
impl PostgresBackendReader {
impl<IO: AsyncRead + AsyncWrite + Unpin> PostgresBackendReader<IO> {
/// Read full message or return None if connection is cleanly closed with no
/// unprocessed data.
pub async fn read_message(&mut self) -> Result<Option<FeMessage>, ConnectionError> {
@@ -812,11 +833,11 @@ impl PostgresBackendReader {
/// messages.
///
pub struct CopyDataWriter<'a> {
pgb: &'a mut PostgresBackend,
pub struct CopyDataWriter<'a, IO> {
pgb: &'a mut PostgresBackend<IO>,
}
impl<'a> AsyncWrite for CopyDataWriter<'a> {
impl<'a, IO: AsyncRead + AsyncWrite + Unpin> AsyncWrite for CopyDataWriter<'a, IO> {
fn poll_write(
self: Pin<&mut Self>,
cx: &mut std::task::Context<'_>,

View File

@@ -4,6 +4,7 @@ use postgres_backend::{AuthType, Handler, PostgresBackend, QueryError};
use pq_proto::{BeMessage, RowDescriptor};
use std::io::Cursor;
use std::{future, sync::Arc};
use tokio::io::{AsyncRead, AsyncWrite};
use tokio::net::{TcpListener, TcpStream};
use tokio_postgres::config::SslMode;
use tokio_postgres::tls::MakeTlsConnect;
@@ -22,11 +23,11 @@ async fn make_tcp_pair() -> (TcpStream, TcpStream) {
struct TestHandler {}
#[async_trait::async_trait]
impl Handler for TestHandler {
impl<IO: AsyncRead + AsyncWrite + Unpin + Send> Handler<IO> for TestHandler {
// return single col 'hey' for any query
async fn process_query(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackend<IO>,
_query_string: &str,
) -> Result<(), QueryError> {
pgb.write_message_noflush(&BeMessage::RowDescription(&[RowDescriptor::text_col(

View File

@@ -63,10 +63,7 @@ fn main() -> anyhow::Result<()> {
pg_install_dir_versioned = cwd.join("..").join("..").join(pg_install_dir_versioned);
}
let pg_config_bin = pg_install_dir_versioned
.join(pg_version)
.join("bin")
.join("pg_config");
let pg_config_bin = pg_install_dir_versioned.join("bin").join("pg_config");
let inc_server_path: String = if pg_config_bin.exists() {
let output = Command::new(pg_config_bin)
.arg("--includedir-server")

View File

@@ -63,9 +63,9 @@ impl<S> Framed<S> {
&self.stream
}
/// Extract the underlying stream.
pub fn into_inner(self) -> S {
self.stream
/// Deconstruct into the underlying stream and read buffer.
pub fn into_inner(self) -> (S, BytesMut) {
(self.stream, self.read_buf)
}
/// Return new Framed with stream type transformed by async f, for TLS

View File

@@ -265,11 +265,9 @@ impl FeMessage {
b'c' => Ok(Some(FeMessage::CopyDone)),
b'f' => Ok(Some(FeMessage::CopyFail)),
b'p' => Ok(Some(FeMessage::PasswordMessage(msg))),
tag => {
return Err(ProtocolError::Protocol(format!(
"unknown message tag: {tag},'{msg:?}'"
)))
}
tag => Err(ProtocolError::Protocol(format!(
"unknown message tag: {tag},'{msg:?}'"
))),
}
}
}

View File

@@ -18,6 +18,7 @@ futures = { workspace = true}
jsonwebtoken.workspace = true
nix.workspace = true
once_cell.workspace = true
pin-project-lite.workspace = true
routerify.workspace = true
serde.workspace = true
serde_json.workspace = true

View File

@@ -1,7 +1,4 @@
// For details about authentication see docs/authentication.md
//
// TODO: use ed25519 keys
// Relevant issue: https://github.com/Keats/jsonwebtoken/issues/162
use serde;
use std::fs;
@@ -16,9 +13,10 @@ use serde_with::{serde_as, DisplayFromStr};
use crate::id::TenantId;
const JWT_ALGORITHM: Algorithm = Algorithm::RS256;
/// Algorithm to use. We require EdDSA.
const STORAGE_TOKEN_ALGORITHM: Algorithm = Algorithm::EdDSA;
#[derive(Debug, Serialize, Deserialize, Clone)]
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum Scope {
// Provides access to all data for a specific tenant (specified in `struct Claims` below)
@@ -33,8 +31,9 @@ pub enum Scope {
SafekeeperData,
}
/// JWT payload. See docs/authentication.md for the format
#[serde_as]
#[derive(Debug, Serialize, Deserialize, Clone)]
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
pub struct Claims {
#[serde(default)]
#[serde_as(as = "Option<DisplayFromStr>")]
@@ -55,7 +54,8 @@ pub struct JwtAuth {
impl JwtAuth {
pub fn new(decoding_key: DecodingKey) -> Self {
let mut validation = Validation::new(JWT_ALGORITHM);
let mut validation = Validation::default();
validation.algorithms = vec![STORAGE_TOKEN_ALGORITHM];
// The default 'required_spec_claims' is 'exp'. But we don't want to require
// expiration.
validation.required_spec_claims = [].into();
@@ -67,7 +67,7 @@ impl JwtAuth {
pub fn from_key_path(key_path: &Path) -> Result<Self> {
let public_key = fs::read(key_path)?;
Ok(Self::new(DecodingKey::from_rsa_pem(&public_key)?))
Ok(Self::new(DecodingKey::from_ed_pem(&public_key)?))
}
pub fn decode(&self, token: &str) -> Result<TokenData<Claims>> {
@@ -85,6 +85,75 @@ impl std::fmt::Debug for JwtAuth {
// this function is used only for testing purposes in CLI e g generate tokens during init
pub fn encode_from_key_file(claims: &Claims, key_data: &[u8]) -> Result<String> {
let key = EncodingKey::from_rsa_pem(key_data)?;
Ok(encode(&Header::new(JWT_ALGORITHM), claims, &key)?)
let key = EncodingKey::from_ed_pem(key_data)?;
Ok(encode(&Header::new(STORAGE_TOKEN_ALGORITHM), claims, &key)?)
}
#[cfg(test)]
mod tests {
use super::*;
use std::str::FromStr;
// Generated with:
//
// openssl genpkey -algorithm ed25519 -out ed25519-priv.pem
// openssl pkey -in ed25519-priv.pem -pubout -out ed25519-pub.pem
const TEST_PUB_KEY_ED25519: &[u8] = br#"
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEARYwaNBayR+eGI0iXB4s3QxE3Nl2g1iWbr6KtLWeVD/w=
-----END PUBLIC KEY-----
"#;
const TEST_PRIV_KEY_ED25519: &[u8] = br#"
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEID/Drmc1AA6U/znNRWpF3zEGegOATQxfkdWxitcOMsIH
-----END PRIVATE KEY-----
"#;
#[test]
fn test_decode() -> Result<(), anyhow::Error> {
let expected_claims = Claims {
tenant_id: Some(TenantId::from_str("3d1f7595b468230304e0b73cecbcb081")?),
scope: Scope::Tenant,
};
// A test token containing the following payload, signed using TEST_PRIV_KEY_ED25519:
//
// ```
// {
// "scope": "tenant",
// "tenant_id": "3d1f7595b468230304e0b73cecbcb081",
// "iss": "neon.controlplane",
// "exp": 1709200879,
// "iat": 1678442479
// }
// ```
//
let encoded_eddsa = "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJzY29wZSI6InRlbmFudCIsInRlbmFudF9pZCI6IjNkMWY3NTk1YjQ2ODIzMDMwNGUwYjczY2VjYmNiMDgxIiwiaXNzIjoibmVvbi5jb250cm9scGxhbmUiLCJleHAiOjE3MDkyMDA4NzksImlhdCI6MTY3ODQ0MjQ3OX0.U3eA8j-uU-JnhzeO3EDHRuXLwkAUFCPxtGHEgw6p7Ccc3YRbFs2tmCdbD9PZEXP-XsxSeBQi1FY0YPcT3NXADw";
// Check it can be validated with the public key
let auth = JwtAuth::new(DecodingKey::from_ed_pem(TEST_PUB_KEY_ED25519)?);
let claims_from_token = auth.decode(encoded_eddsa)?.claims;
assert_eq!(claims_from_token, expected_claims);
Ok(())
}
#[test]
fn test_encode() -> Result<(), anyhow::Error> {
let claims = Claims {
tenant_id: Some(TenantId::from_str("3d1f7595b468230304e0b73cecbcb081")?),
scope: Scope::Tenant,
};
let encoded = encode_from_key_file(&claims, TEST_PRIV_KEY_ED25519)?;
// decode it back
let auth = JwtAuth::new(DecodingKey::from_ed_pem(TEST_PUB_KEY_ED25519)?);
let decoded = auth.decode(&encoded)?;
assert_eq!(decoded.claims, claims);
Ok(())
}
}

View File

@@ -11,7 +11,7 @@ where
P: AsRef<Path>,
{
fn is_empty_dir(&self) -> io::Result<bool> {
Ok(fs::read_dir(self)?.into_iter().next().is_none())
Ok(fs::read_dir(self)?.next().is_none())
}
}

View File

@@ -49,6 +49,8 @@ pub mod fs_ext;
pub mod history_buffer;
pub mod measured_stream;
/// use with fail::cfg("$name", "return(2000)")
#[macro_export]
macro_rules! failpoint_sleep_millis_async {

View File

@@ -0,0 +1,77 @@
use pin_project_lite::pin_project;
use std::pin::Pin;
use std::{io, task};
use tokio::io::{AsyncRead, AsyncWrite, ReadBuf};
pin_project! {
/// This stream tracks all writes and calls user provided
/// callback when the underlying stream is flushed.
pub struct MeasuredStream<S, R, W> {
#[pin]
stream: S,
write_count: usize,
inc_read_count: R,
inc_write_count: W,
}
}
impl<S, R, W> MeasuredStream<S, R, W> {
pub fn new(stream: S, inc_read_count: R, inc_write_count: W) -> Self {
Self {
stream,
write_count: 0,
inc_read_count,
inc_write_count,
}
}
}
impl<S: AsyncRead + Unpin, R: FnMut(usize), W> AsyncRead for MeasuredStream<S, R, W> {
fn poll_read(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
buf: &mut ReadBuf<'_>,
) -> task::Poll<io::Result<()>> {
let this = self.project();
let filled = buf.filled().len();
this.stream.poll_read(context, buf).map_ok(|()| {
let cnt = buf.filled().len() - filled;
// Increment the read count.
(this.inc_read_count)(cnt);
})
}
}
impl<S: AsyncWrite + Unpin, R, W: FnMut(usize)> AsyncWrite for MeasuredStream<S, R, W> {
fn poll_write(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
buf: &[u8],
) -> task::Poll<io::Result<usize>> {
let this = self.project();
this.stream.poll_write(context, buf).map_ok(|cnt| {
// Increment the write count.
*this.write_count += cnt;
cnt
})
}
fn poll_flush(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
) -> task::Poll<io::Result<()>> {
let this = self.project();
this.stream.poll_flush(context).map_ok(|()| {
// Call the user provided callback and reset the write count.
(this.inc_write_count)(*this.write_count);
*this.write_count = 0;
})
}
fn poll_shutdown(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
) -> task::Poll<io::Result<()>> {
self.project().stream.poll_shutdown(context)
}
}

View File

@@ -0,0 +1,541 @@
use anyhow::Result;
use pageserver::repository::Key;
use serde::{Deserialize, Serialize};
use std::cmp::Ordering;
use std::io::{self, BufRead};
use std::{
collections::{BTreeMap, BTreeSet, HashMap},
fmt::Write,
ops::Range,
};
use svg_fmt::{rgb, BeginSvg, EndSvg, Fill, Stroke, Style};
use utils::{lsn::Lsn, project_git_version};
project_git_version!(GIT_VERSION);
// Map values to their compressed coordinate - the index the value
// would have in a sorted and deduplicated list of all values.
struct CoordinateMap<T: Ord + Copy> {
map: BTreeMap<T, usize>,
stretch: f32
}
impl<T: Ord + Copy> CoordinateMap<T> {
fn new(coords: Vec<T>, stretch: f32) -> Self {
let set: BTreeSet<T> = coords.into_iter().collect();
let mut map: BTreeMap<T, usize> = BTreeMap::new();
for (i, e) in set.iter().enumerate() {
map.insert(*e, i);
}
Self { map, stretch }
}
fn map(&self, val: T) -> f32 {
*self.map.get(&val).unwrap() as f32 * self.stretch
}
fn max(&self) -> f32 {
self.map.len() as f32 * self.stretch
}
}
fn parse_filename(name: &str) -> (Range<Key>, Range<Lsn>) {
let split: Vec<&str> = name.split("__").collect();
let keys: Vec<&str> = split[0].split('-').collect();
let mut lsns: Vec<&str> = split[1].split('-').collect();
if lsns.len() == 1 {
lsns.push(lsns[0]);
}
let keys = Key::from_hex(keys[0]).unwrap()..Key::from_hex(keys[1]).unwrap();
let lsns = Lsn::from_hex(lsns[0]).unwrap()..Lsn::from_hex(lsns[1]).unwrap();
(keys, lsns)
}
#[derive(Serialize, Deserialize, PartialEq)]
enum LayerTraceOp {
#[serde(rename = "evict")]
Evict,
#[serde(rename = "flush")]
Flush,
#[serde(rename = "compact_create")]
CompactCreate,
#[serde(rename = "compact_delete")]
CompactDelete,
#[serde(rename = "image_create")]
ImageCreate,
#[serde(rename = "gc_delete")]
GcDelete,
#[serde(rename = "gc_start")]
GcStart,
}
impl std::fmt::Display for LayerTraceOp {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
let op_str = match self {
LayerTraceOp::Evict => "evict",
LayerTraceOp::Flush => "flush",
LayerTraceOp::CompactCreate => "compact_create",
LayerTraceOp::CompactDelete => "compact_delete",
LayerTraceOp::ImageCreate => "image_create",
LayerTraceOp::GcDelete => "gc_delete",
LayerTraceOp::GcStart => "gc_start",
};
f.write_str(op_str)
}
}
#[serde_with::serde_as]
#[derive(Serialize, Deserialize)]
struct LayerTraceLine {
time: u64,
op: LayerTraceOp,
#[serde(default)]
filename: String,
#[serde_as(as = "Option<serde_with::DisplayFromStr>")]
cutoff: Option<Lsn>,
}
struct LayerTraceFile {
filename: String,
key_range: Range<Key>,
lsn_range: Range<Lsn>,
}
impl LayerTraceFile {
fn is_image(&self) -> bool {
self.lsn_range.start == self.lsn_range.end
}
}
struct LayerTraceEvent {
time_rel: u64,
op: LayerTraceOp,
filename: String,
}
struct GcEvent {
time_rel: u64,
cutoff: Lsn,
}
fn main() -> Result<()> {
// Parse trace lines from stdin
let stdin = io::stdin();
let mut files: HashMap<String, LayerTraceFile> = HashMap::new();
let mut layer_events: Vec<LayerTraceEvent> = Vec::new();
let mut gc_events: Vec<GcEvent> = Vec::new();
let mut first_time: Option<u64> = None;
for line in stdin.lock().lines() {
let line = line.unwrap();
let parsed_line: LayerTraceLine = serde_json::from_str(&line)?;
let time_rel = if let Some(first_time) = first_time {
parsed_line.time - first_time
} else {
first_time = Some(parsed_line.time);
0
};
if parsed_line.op == LayerTraceOp::GcStart {
gc_events.push(GcEvent {
time_rel,
cutoff: parsed_line.cutoff.unwrap(),
});
} else {
layer_events.push(LayerTraceEvent {
time_rel,
filename: parsed_line.filename.clone(),
op: parsed_line.op,
});
if !files.contains_key(&parsed_line.filename) {
let (key_range, lsn_range) = parse_filename(&parsed_line.filename);
files.insert(parsed_line.filename.clone(), LayerTraceFile {
filename: parsed_line.filename.clone(),
key_range,
lsn_range,
});
};
}
}
let mut last_time_rel = layer_events.last().unwrap().time_rel;
if let Some(last_gc) = gc_events.last() {
last_time_rel = std::cmp::min(last_gc.time_rel, last_time_rel);
}
// Collect all coordinates
let mut keys: Vec<Key> = vec![];
let mut lsns: Vec<Lsn> = vec![];
for f in files.values() {
keys.push(f.key_range.start);
keys.push(f.key_range.end);
lsns.push(f.lsn_range.start);
lsns.push(f.lsn_range.end);
}
for gc_event in &gc_events {
lsns.push(gc_event.cutoff);
}
// Analyze
let key_map = CoordinateMap::new(keys, 2.0);
// Stretch out vertically for better visibility
let lsn_map = CoordinateMap::new(lsns, 3.0);
// Initialize stats
let mut num_deltas = 0;
let mut num_images = 0;
let mut svg = String::new();
// Draw
writeln!(svg,
"{}",
BeginSvg {
w: key_map.max(),
h: lsn_map.max(),
}
)?;
let lsn_max = lsn_map.max();
// Sort the files by LSN, but so that image layers go after all delta layers
// The SVG is painted in the order the elements appear, and we want to draw
// image layers on top of the delta layers if they overlap
let mut files_sorted: Vec<LayerTraceFile> = files.into_values().collect();
files_sorted.sort_by(|a, b| {
if a.is_image() && !b.is_image() {
Ordering::Greater
} else if !a.is_image() && b.is_image() {
Ordering::Less
} else {
a.lsn_range.end.cmp(&b.lsn_range.end)
}
});
for f in files_sorted {
let key_start = key_map.map(f.key_range.start);
let key_end = key_map.map(f.key_range.end);
let key_diff = key_end - key_start;
if key_start >= key_end {
panic!("Invalid key range {}-{}", key_start, key_end);
}
let lsn_start = lsn_map.map(f.lsn_range.start);
let lsn_end = lsn_map.map(f.lsn_range.end);
// Fill in and thicken rectangle if it's an
// image layer so that we can see it.
let mut style = Style::default();
style.fill = Fill::Color(rgb(0x80, 0x80, 0x80));
style.stroke = Stroke::Color(rgb(0, 0, 0), 0.5);
let y_start = (lsn_max - lsn_start) as f32;
let y_end = (lsn_max - lsn_end) as f32;
let x_margin = 0.25;
let y_margin = 0.5;
match f.lsn_range.start.cmp(&f.lsn_range.end) {
Ordering::Less => {
num_deltas += 1;
write!(svg,
r#" <rect id="layer_{}" x="{}" y="{}" width="{}" height="{}" ry="{}" style="{}">"#,
f.filename,
key_start as f32 + x_margin,
y_end + y_margin,
key_diff as f32 - x_margin * 2.0,
y_start - y_end - y_margin * 2.0,
1.0, // border_radius,
style.to_string(),
)?;
write!(svg, "<title>{}<br>{} - {}</title>", f.filename, lsn_end, y_end)?;
writeln!(svg, "</rect>")?;
}
Ordering::Equal => {
num_images += 1;
//lsn_diff = 0.3;
//lsn_offset = -lsn_diff / 2.0;
//margin = 0.05;
style.fill = Fill::Color(rgb(0x80, 0, 0x80));
style.stroke = Stroke::Color(rgb(0x80, 0, 0x80), 3.0);
write!(svg,
r#" <line id="layer_{}" x1="{}" y1="{}" x2="{}" y2="{}" style="{}">"#,
f.filename,
key_start as f32 + x_margin,
y_end,
key_end as f32 - x_margin,
y_end,
style.to_string(),
)?;
write!(svg, "<title>{}<br>{} - {}</title>", f.filename, lsn_end, y_end)?;
writeln!(svg, "</line>")?;
}
Ordering::Greater => panic!("Invalid lsn range {}-{}", lsn_start, lsn_end),
}
}
for (idx, gc) in gc_events.iter().enumerate() {
let cutoff_lsn = lsn_map.map(gc.cutoff);
let mut style = Style::default();
style.fill = Fill::None;
style.stroke = Stroke::Color(rgb(0xff, 0, 0), 0.5);
let y = lsn_max - cutoff_lsn;
writeln!(svg,
r#" <line id="gc_{}" x1="{}" y1="{}" x2="{}" y2="{}" style="{}" />"#,
idx,
0,
y,
key_map.max(),
y,
style.to_string(),
)?;
}
writeln!(svg, "{}", EndSvg)?;
let mut layer_events_str = String::new();
let mut first = true;
for e in layer_events {
if !first {
writeln!(layer_events_str, ",")?;
}
write!(layer_events_str,
r#" {{"time_rel": {}, "filename": "{}", "op": "{}"}}"#,
e.time_rel, e.filename, e.op)?;
first = false;
}
writeln!(layer_events_str)?;
let mut gc_events_str = String::new();
let mut first = true;
for e in gc_events {
if !first {
writeln!(gc_events_str, ",")?;
}
write!(gc_events_str,
r#" {{"time_rel": {}, "cutoff_lsn": "{}"}}"#,
e.time_rel, e.cutoff)?;
first = false;
}
writeln!(gc_events_str)?;
println!(r#"<!DOCTYPE html>
<html>
<head>
<style>
/* Keep the slider pinned at top */
.topbar {{
display: block;
overflow: hidden;
background-color: lightgrey;
position: fixed;
top: 0;
width: 100%;
/* width: 500px; */
}}
.slidercontainer {{
float: left;
width: 50%;
margin-right: 200px;
}}
.slider {{
float: left;
width: 100%;
}}
.legend {{
width: 200px;
float: right;
}}
/* Main content */
.main {{
margin-top: 50px; /* Add a top margin to avoid content overlay */
}}
</style>
</head>
<body onload="init()">
<script type="text/javascript">
var layer_events = [{layer_events_str}]
var gc_events = [{gc_events_str}]
let ticker;
function init() {{
moveSlider({last_time_rel})
moveSlider(0)
moveSlider(last_slider_pos)
}}
function startAnimation() {{
ticker = setInterval(animateStep, 100);
}}
function stopAnimation() {{
clearInterval(ticker);
}}
function animateStep() {{
if (last_layer_event < layer_events.length - 1) {{
var slider = document.getElementById("time-slider");
let prevPos = slider.value
let nextEvent = last_layer_event
while (nextEvent < layer_events.length - 1) {{
if (layer_events[nextEvent].time_rel > prevPos) {{
break;
}}
nextEvent += 1;
}}
let nextPos = layer_events[nextEvent].time_rel
slider.value = nextPos
moveSlider(nextPos)
}}
}}
function redoLayerEvent(n, dir) {{
var layer = document.getElementById("layer_" + layer_events[n].filename);
switch (layer_events[n].op) {{
case "evict":
break;
case "flush":
layer.style.visibility = "visible";
break;
case "compact_create":
layer.style.visibility = "visible";
break;
case "image_create":
layer.style.visibility = "visible";
break;
case "compact_delete":
layer.style.visibility = "hidden";
break;
case "gc_delete":
layer.style.visibility = "hidden";
break;
case "gc_start":
layer.style.visibility = "hidden";
break;
}}
}}
function undoLayerEvent(n) {{
var layer = document.getElementById("layer_" + layer_events[n].filename);
switch (layer_events[n].op) {{
case "evict":
break;
case "flush":
layer.style.visibility = "hidden";
break;
case "compact_create":
layer.style.visibility = "hidden";
break;
case "image_create":
layer.style.visibility = "hidden";
break;
case "compact_delete":
layer.style.visibility = "visible";
break;
case "gc_delete":
layer.style.visibility = "visible";
break;
}}
}}
function redoGcEvent(n) {{
var prev_gc_bar = document.getElementById("gc_" + (n - 1));
var new_gc_bar = document.getElementById("gc_" + n);
prev_gc_bar.style.visibility = "hidden"
new_gc_bar.style.visibility = "visible"
}}
function undoGcEvent(n) {{
var prev_gc_bar = document.getElementById("gc_" + n);
var new_gc_bar = document.getElementById("gc_" + (n - 1));
prev_gc_bar.style.visibility = "hidden"
new_gc_bar.style.visibility = "visible"
}}
var last_slider_pos = 0
var last_layer_event = 0
var last_gc_event = 0
var moveSlider = function(new_pos) {{
if (new_pos > last_slider_pos) {{
while (last_layer_event < layer_events.length - 1) {{
if (layer_events[last_layer_event + 1].time_rel > new_pos) {{
break;
}}
last_layer_event += 1;
redoLayerEvent(last_layer_event)
}}
while (last_gc_event < gc_events.length - 1) {{
if (gc_events[last_gc_event + 1].time_rel > new_pos) {{
break;
}}
last_gc_event += 1;
redoGcEvent(last_gc_event)
}}
}}
if (new_pos < last_slider_pos) {{
while (last_layer_event > 0) {{
if (layer_events[last_layer_event - 1].time_rel < new_pos) {{
break;
}}
undoLayerEvent(last_layer_event)
last_layer_event -= 1;
}}
while (last_gc_event > 0) {{
if (gc_events[last_gc_event - 1].time_rel < new_pos) {{
break;
}}
undoGcEvent(last_gc_event)
last_gc_event -= 1;
}}
}}
last_slider_pos = new_pos;
document.getElementById("debug_pos").textContent=new_pos;
document.getElementById("debug_layer_event").textContent=last_layer_event + " " + layer_events[last_layer_event].time_rel + " " + layer_events[last_layer_event].op;
document.getElementById("debug_gc_event").textContent=last_gc_event + " " + gc_events[last_gc_event].time_rel;
}}
</script>
<div class="topbar">
<div class="slidercontainer">
<label for="time-slider">TIME</label>:
<input id="time-slider" class="slider" type="range" min="0" max="{last_time_rel}" value="0" oninput="moveSlider(this.value)"><br>
pos: <span id="debug_pos"></span><br>
event: <span id="debug_layer_event"></span><br>
gc: <span id="debug_gc_event"></span><br>
</div>
<button onclick="startAnimation()">Play</button>
<button onclick="stopAnimation()">Stop</button>
<svg class="legend">
<rect x=5 y=0 width=20 height=20 style="fill:rgb(128,128,128);stroke:rgb(0,0,0);stroke-width:0.5;fill-opacity:1;stroke-opacity:1;"/>
<line x1=5 y1=30 x2=25 y2=30 style="fill:rgb(128,0,128);stroke:rgb(128,0,128);stroke-width:3;fill-opacity:1;stroke-opacity:1;"/>
<line x1=0 y1=40 x2=30 y2=40 style="fill:none;stroke:rgb(255,0,0);stroke-width:0.5;fill-opacity:1;stroke-opacity:1;"/>
</svg>
</div>
<div class="main">
{svg}
</div>
</body>
</html>
"#);
eprintln!("num_images: {}", num_images);
eprintln!("num_deltas: {}", num_deltas);
Ok(())
}

View File

@@ -270,15 +270,31 @@ fn start_pageserver(
WALRECEIVER_RUNTIME.block_on(pageserver::broker_client::init_broker_client(conf))?;
// Initialize authentication for incoming connections
let auth = match &conf.auth_type {
AuthType::Trust => None,
AuthType::NeonJWT => {
// unwrap is ok because check is performed when creating config, so path is set and file exists
let key_path = conf.auth_validation_public_key_path.as_ref().unwrap();
Some(JwtAuth::from_key_path(key_path)?.into())
}
};
info!("Using auth: {:#?}", conf.auth_type);
let http_auth;
let pg_auth;
if conf.http_auth_type == AuthType::NeonJWT || conf.pg_auth_type == AuthType::NeonJWT {
// unwrap is ok because check is performed when creating config, so path is set and file exists
let key_path = conf.auth_validation_public_key_path.as_ref().unwrap();
info!(
"Loading public key for verifying JWT tokens from {:#?}",
key_path
);
let auth: Arc<JwtAuth> = Arc::new(JwtAuth::from_key_path(key_path)?);
http_auth = match &conf.http_auth_type {
AuthType::Trust => None,
AuthType::NeonJWT => Some(auth.clone()),
};
pg_auth = match &conf.pg_auth_type {
AuthType::Trust => None,
AuthType::NeonJWT => Some(auth),
};
} else {
http_auth = None;
pg_auth = None;
}
info!("Using auth for http API: {:#?}", conf.http_auth_type);
info!("Using auth for pg connections: {:#?}", conf.pg_auth_type);
match var("NEON_AUTH_TOKEN") {
Ok(v) => {
@@ -308,7 +324,7 @@ fn start_pageserver(
{
let _rt_guard = MGMT_REQUEST_RUNTIME.enter();
let router = http::make_router(conf, launch_ts, auth.clone(), remote_storage)?
let router = http::make_router(conf, launch_ts, http_auth, remote_storage)?
.build()
.map_err(|err| anyhow!(err))?;
let service = utils::http::RouterService::new(router).unwrap();
@@ -382,9 +398,9 @@ fn start_pageserver(
async move {
page_service::libpq_listener_main(
conf,
auth,
pg_auth,
pageserver_listener,
conf.auth_type,
conf.pg_auth_type,
libpq_ctx,
)
.await

View File

@@ -61,6 +61,7 @@ pub mod defaults {
pub const DEFAULT_CACHED_METRIC_COLLECTION_INTERVAL: &str = "1 hour";
pub const DEFAULT_METRIC_COLLECTION_ENDPOINT: Option<reqwest::Url> = None;
pub const DEFAULT_SYNTHETIC_SIZE_CALCULATION_INTERVAL: &str = "10 min";
pub const DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD: &str = "24 hour";
///
/// Default built-in configuration file.
@@ -89,6 +90,8 @@ pub mod defaults {
#cached_metric_collection_interval = '{DEFAULT_CACHED_METRIC_COLLECTION_INTERVAL}'
#synthetic_size_calculation_interval = '{DEFAULT_SYNTHETIC_SIZE_CALCULATION_INTERVAL}'
#evictions_low_residence_duration_metric_threshold = '{DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD}'
# [tenant_config]
#checkpoint_distance = {DEFAULT_CHECKPOINT_DISTANCE} # in bytes
#checkpoint_timeout = {DEFAULT_CHECKPOINT_TIMEOUT}
@@ -118,6 +121,9 @@ pub struct PageServerConf {
/// Example (default): 127.0.0.1:9898
pub listen_http_addr: String,
/// Current availability zone. Used for traffic metrics.
pub availability_zone: Option<String>,
// Timeout when waiting for WAL receiver to catch up to an LSN given in a GetPage@LSN call.
pub wait_lsn_timeout: Duration,
// How long to wait for WAL redo to complete.
@@ -138,9 +144,15 @@ pub struct PageServerConf {
pub pg_distrib_dir: PathBuf,
pub auth_type: AuthType,
// Authentication
/// authentication method for the HTTP mgmt API
pub http_auth_type: AuthType,
/// authentication method for libpq connections from compute
pub pg_auth_type: AuthType,
/// Path to a file containing public key for verifying JWT tokens.
/// Used for both mgmt and compute auth, if enabled.
pub auth_validation_public_key_path: Option<PathBuf>,
pub remote_storage_config: Option<RemoteStorageConfig>,
pub default_tenant_conf: TenantConf,
@@ -161,6 +173,9 @@ pub struct PageServerConf {
pub metric_collection_endpoint: Option<Url>,
pub synthetic_size_calculation_interval: Duration,
// See the corresponding metric's help string.
pub evictions_low_residence_duration_metric_threshold: Duration,
pub test_remote_failures: u64,
pub ondemand_download_behavior_treat_error_as_warn: bool,
@@ -196,6 +211,8 @@ struct PageServerConfigBuilder {
listen_http_addr: BuilderValue<String>,
availability_zone: BuilderValue<Option<String>>,
wait_lsn_timeout: BuilderValue<Duration>,
wal_redo_timeout: BuilderValue<Duration>,
@@ -208,7 +225,8 @@ struct PageServerConfigBuilder {
pg_distrib_dir: BuilderValue<PathBuf>,
auth_type: BuilderValue<AuthType>,
http_auth_type: BuilderValue<AuthType>,
pg_auth_type: BuilderValue<AuthType>,
//
auth_validation_public_key_path: BuilderValue<Option<PathBuf>>,
@@ -228,6 +246,8 @@ struct PageServerConfigBuilder {
metric_collection_endpoint: BuilderValue<Option<Url>>,
synthetic_size_calculation_interval: BuilderValue<Duration>,
evictions_low_residence_duration_metric_threshold: BuilderValue<Duration>,
test_remote_failures: BuilderValue<u64>,
ondemand_download_behavior_treat_error_as_warn: BuilderValue<bool>,
@@ -240,6 +260,7 @@ impl Default for PageServerConfigBuilder {
Self {
listen_pg_addr: Set(DEFAULT_PG_LISTEN_ADDR.to_string()),
listen_http_addr: Set(DEFAULT_HTTP_LISTEN_ADDR.to_string()),
availability_zone: Set(None),
wait_lsn_timeout: Set(humantime::parse_duration(DEFAULT_WAIT_LSN_TIMEOUT)
.expect("cannot parse default wait lsn timeout")),
wal_redo_timeout: Set(humantime::parse_duration(DEFAULT_WAL_REDO_TIMEOUT)
@@ -251,7 +272,8 @@ impl Default for PageServerConfigBuilder {
pg_distrib_dir: Set(env::current_dir()
.expect("cannot access current directory")
.join("pg_install")),
auth_type: Set(AuthType::Trust),
http_auth_type: Set(AuthType::Trust),
pg_auth_type: Set(AuthType::Trust),
auth_validation_public_key_path: Set(None),
remote_storage_config: Set(None),
id: NotSet,
@@ -279,6 +301,11 @@ impl Default for PageServerConfigBuilder {
.expect("cannot parse default synthetic size calculation interval")),
metric_collection_endpoint: Set(DEFAULT_METRIC_COLLECTION_ENDPOINT),
evictions_low_residence_duration_metric_threshold: Set(humantime::parse_duration(
DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD,
)
.expect("cannot parse DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD")),
test_remote_failures: Set(0),
ondemand_download_behavior_treat_error_as_warn: Set(false),
@@ -295,6 +322,10 @@ impl PageServerConfigBuilder {
self.listen_http_addr = BuilderValue::Set(listen_http_addr)
}
pub fn availability_zone(&mut self, availability_zone: Option<String>) {
self.availability_zone = BuilderValue::Set(availability_zone)
}
pub fn wait_lsn_timeout(&mut self, wait_lsn_timeout: Duration) {
self.wait_lsn_timeout = BuilderValue::Set(wait_lsn_timeout)
}
@@ -323,8 +354,12 @@ impl PageServerConfigBuilder {
self.pg_distrib_dir = BuilderValue::Set(pg_distrib_dir)
}
pub fn auth_type(&mut self, auth_type: AuthType) {
self.auth_type = BuilderValue::Set(auth_type)
pub fn http_auth_type(&mut self, auth_type: AuthType) {
self.http_auth_type = BuilderValue::Set(auth_type)
}
pub fn pg_auth_type(&mut self, auth_type: AuthType) {
self.pg_auth_type = BuilderValue::Set(auth_type)
}
pub fn auth_validation_public_key_path(
@@ -386,6 +421,10 @@ impl PageServerConfigBuilder {
self.test_remote_failures = BuilderValue::Set(fail_first);
}
pub fn evictions_low_residence_duration_metric_threshold(&mut self, value: Duration) {
self.evictions_low_residence_duration_metric_threshold = BuilderValue::Set(value);
}
pub fn ondemand_download_behavior_treat_error_as_warn(
&mut self,
ondemand_download_behavior_treat_error_as_warn: bool,
@@ -402,6 +441,9 @@ impl PageServerConfigBuilder {
listen_http_addr: self
.listen_http_addr
.ok_or(anyhow!("missing listen_http_addr"))?,
availability_zone: self
.availability_zone
.ok_or(anyhow!("missing availability_zone"))?,
wait_lsn_timeout: self
.wait_lsn_timeout
.ok_or(anyhow!("missing wait_lsn_timeout"))?,
@@ -419,7 +461,10 @@ impl PageServerConfigBuilder {
pg_distrib_dir: self
.pg_distrib_dir
.ok_or(anyhow!("missing pg_distrib_dir"))?,
auth_type: self.auth_type.ok_or(anyhow!("missing auth_type"))?,
http_auth_type: self
.http_auth_type
.ok_or(anyhow!("missing http_auth_type"))?,
pg_auth_type: self.pg_auth_type.ok_or(anyhow!("missing pg_auth_type"))?,
auth_validation_public_key_path: self
.auth_validation_public_key_path
.ok_or(anyhow!("missing auth_validation_public_key_path"))?,
@@ -453,6 +498,11 @@ impl PageServerConfigBuilder {
synthetic_size_calculation_interval: self
.synthetic_size_calculation_interval
.ok_or(anyhow!("missing synthetic_size_calculation_interval"))?,
evictions_low_residence_duration_metric_threshold: self
.evictions_low_residence_duration_metric_threshold
.ok_or(anyhow!(
"missing evictions_low_residence_duration_metric_threshold"
))?,
test_remote_failures: self
.test_remote_failures
.ok_or(anyhow!("missing test_remote_failuers"))?,
@@ -599,6 +649,7 @@ impl PageServerConf {
match key {
"listen_pg_addr" => builder.listen_pg_addr(parse_toml_string(key, item)?),
"listen_http_addr" => builder.listen_http_addr(parse_toml_string(key, item)?),
"availability_zone" => builder.availability_zone(Some(parse_toml_string(key, item)?)),
"wait_lsn_timeout" => builder.wait_lsn_timeout(parse_toml_duration(key, item)?),
"wal_redo_timeout" => builder.wal_redo_timeout(parse_toml_duration(key, item)?),
"initial_superuser_name" => builder.superuser(parse_toml_string(key, item)?),
@@ -612,7 +663,8 @@ impl PageServerConf {
"auth_validation_public_key_path" => builder.auth_validation_public_key_path(Some(
PathBuf::from(parse_toml_string(key, item)?),
)),
"auth_type" => builder.auth_type(parse_toml_from_str(key, item)?),
"http_auth_type" => builder.http_auth_type(parse_toml_from_str(key, item)?),
"pg_auth_type" => builder.pg_auth_type(parse_toml_from_str(key, item)?),
"remote_storage" => {
builder.remote_storage_config(RemoteStorageConfig::from_toml(item)?)
}
@@ -640,6 +692,7 @@ impl PageServerConf {
"synthetic_size_calculation_interval" =>
builder.synthetic_size_calculation_interval(parse_toml_duration(key, item)?),
"test_remote_failures" => builder.test_remote_failures(parse_toml_u64(key, item)?),
"evictions_low_residence_duration_metric_threshold" => builder.evictions_low_residence_duration_metric_threshold(parse_toml_duration(key, item)?),
"ondemand_download_behavior_treat_error_as_warn" => builder.ondemand_download_behavior_treat_error_as_warn(parse_toml_bool(key, item)?),
_ => bail!("unrecognized pageserver option '{key}'"),
}
@@ -647,7 +700,7 @@ impl PageServerConf {
let mut conf = builder.build().context("invalid config")?;
if conf.auth_type == AuthType::NeonJWT {
if conf.http_auth_type == AuthType::NeonJWT || conf.pg_auth_type == AuthType::NeonJWT {
let auth_validation_public_key_path = conf
.auth_validation_public_key_path
.get_or_insert_with(|| workdir.join("auth_public_key.pem"));
@@ -763,10 +816,12 @@ impl PageServerConf {
max_file_descriptors: defaults::DEFAULT_MAX_FILE_DESCRIPTORS,
listen_pg_addr: defaults::DEFAULT_PG_LISTEN_ADDR.to_string(),
listen_http_addr: defaults::DEFAULT_HTTP_LISTEN_ADDR.to_string(),
availability_zone: None,
superuser: "cloud_admin".to_string(),
workdir: repo_dir,
pg_distrib_dir,
auth_type: AuthType::Trust,
http_auth_type: AuthType::Trust,
pg_auth_type: AuthType::Trust,
auth_validation_public_key_path: None,
remote_storage_config: None,
default_tenant_conf: TenantConf::default(),
@@ -778,6 +833,10 @@ impl PageServerConf {
cached_metric_collection_interval: Duration::from_secs(60 * 60),
metric_collection_endpoint: defaults::DEFAULT_METRIC_COLLECTION_ENDPOINT,
synthetic_size_calculation_interval: Duration::from_secs(60),
evictions_low_residence_duration_metric_threshold: humantime::parse_duration(
defaults::DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD,
)
.unwrap(),
test_remote_failures: 0,
ondemand_download_behavior_treat_error_as_warn: false,
}
@@ -919,6 +978,9 @@ metric_collection_interval = '222 s'
cached_metric_collection_interval = '22200 s'
metric_collection_endpoint = 'http://localhost:80/metrics'
synthetic_size_calculation_interval = '333 s'
evictions_low_residence_duration_metric_threshold = '444 s'
log_format = 'json'
"#;
@@ -944,6 +1006,7 @@ log_format = 'json'
id: NodeId(10),
listen_pg_addr: defaults::DEFAULT_PG_LISTEN_ADDR.to_string(),
listen_http_addr: defaults::DEFAULT_HTTP_LISTEN_ADDR.to_string(),
availability_zone: None,
wait_lsn_timeout: humantime::parse_duration(defaults::DEFAULT_WAIT_LSN_TIMEOUT)?,
wal_redo_timeout: humantime::parse_duration(defaults::DEFAULT_WAL_REDO_TIMEOUT)?,
superuser: defaults::DEFAULT_SUPERUSER.to_string(),
@@ -951,7 +1014,8 @@ log_format = 'json'
max_file_descriptors: defaults::DEFAULT_MAX_FILE_DESCRIPTORS,
workdir,
pg_distrib_dir,
auth_type: AuthType::Trust,
http_auth_type: AuthType::Trust,
pg_auth_type: AuthType::Trust,
auth_validation_public_key_path: None,
remote_storage_config: None,
default_tenant_conf: TenantConf::default(),
@@ -971,6 +1035,9 @@ log_format = 'json'
synthetic_size_calculation_interval: humantime::parse_duration(
defaults::DEFAULT_SYNTHETIC_SIZE_CALCULATION_INTERVAL
)?,
evictions_low_residence_duration_metric_threshold: humantime::parse_duration(
defaults::DEFAULT_EVICTIONS_LOW_RESIDENCE_DURATION_METRIC_THRESHOLD
)?,
test_remote_failures: 0,
ondemand_download_behavior_treat_error_as_warn: false,
},
@@ -1001,6 +1068,7 @@ log_format = 'json'
id: NodeId(10),
listen_pg_addr: "127.0.0.1:64000".to_string(),
listen_http_addr: "127.0.0.1:9898".to_string(),
availability_zone: None,
wait_lsn_timeout: Duration::from_secs(111),
wal_redo_timeout: Duration::from_secs(111),
superuser: "zzzz".to_string(),
@@ -1008,7 +1076,8 @@ log_format = 'json'
max_file_descriptors: 333,
workdir,
pg_distrib_dir,
auth_type: AuthType::Trust,
http_auth_type: AuthType::Trust,
pg_auth_type: AuthType::Trust,
auth_validation_public_key_path: None,
remote_storage_config: None,
default_tenant_conf: TenantConf::default(),
@@ -1020,6 +1089,7 @@ log_format = 'json'
cached_metric_collection_interval: Duration::from_secs(22200),
metric_collection_endpoint: Some(Url::parse("http://localhost:80/metrics")?),
synthetic_size_calculation_interval: Duration::from_secs(333),
evictions_low_residence_duration_metric_threshold: Duration::from_secs(444),
test_remote_failures: 0,
ondemand_download_behavior_treat_error_as_warn: false,
},

View File

@@ -351,6 +351,13 @@ paths:
schema:
type: string
format: hex
- name: detach_ignored
in: query
required: false
schema:
type: boolean
description: |
When true, allow to detach a tenant which state is ignored.
post:
description: |
Remove tenant data (including all corresponding timelines) from pageserver's memory and file system.

View File

@@ -21,7 +21,7 @@ use crate::context::{DownloadBehavior, RequestContext};
use crate::pgdatadir_mapping::LsnForTimestamp;
use crate::task_mgr::TaskKind;
use crate::tenant::config::TenantConfOpt;
use crate::tenant::mgr::TenantMapInsertError;
use crate::tenant::mgr::{TenantMapInsertError, TenantStateError};
use crate::tenant::size::ModelInputs;
use crate::tenant::storage_layer::LayerAccessStatsReset;
use crate::tenant::{PageReconstructError, Timeline};
@@ -82,38 +82,52 @@ fn get_config(request: &Request<Body>) -> &'static PageServerConf {
get_state(request).conf
}
/// Check that the requester is authorized to operate on given tenant
fn check_permission(request: &Request<Body>, tenant_id: Option<TenantId>) -> Result<(), ApiError> {
check_permission_with(request, |claims| {
crate::auth::check_permission(claims, tenant_id)
})
}
fn apierror_from_prerror(err: PageReconstructError) -> ApiError {
match err {
PageReconstructError::Other(err) => ApiError::InternalServerError(err),
PageReconstructError::NeedsDownload(_, _) => {
// This shouldn't happen, because we use a RequestContext that requests to
// download any missing layer files on-demand.
ApiError::InternalServerError(anyhow::anyhow!("need to download remote layer file"))
}
PageReconstructError::Cancelled => {
ApiError::InternalServerError(anyhow::anyhow!("request was cancelled"))
}
PageReconstructError::WalRedo(err) => {
ApiError::InternalServerError(anyhow::Error::new(err))
impl From<PageReconstructError> for ApiError {
fn from(pre: PageReconstructError) -> ApiError {
match pre {
PageReconstructError::Other(pre) => ApiError::InternalServerError(pre),
PageReconstructError::NeedsDownload(_, _) => {
// This shouldn't happen, because we use a RequestContext that requests to
// download any missing layer files on-demand.
ApiError::InternalServerError(anyhow::anyhow!("need to download remote layer file"))
}
PageReconstructError::Cancelled => {
ApiError::InternalServerError(anyhow::anyhow!("request was cancelled"))
}
PageReconstructError::WalRedo(pre) => {
ApiError::InternalServerError(anyhow::Error::new(pre))
}
}
}
}
fn apierror_from_tenant_map_insert_error(e: TenantMapInsertError) -> ApiError {
match e {
TenantMapInsertError::StillInitializing | TenantMapInsertError::ShuttingDown => {
ApiError::InternalServerError(anyhow::Error::new(e))
impl From<TenantMapInsertError> for ApiError {
fn from(tmie: TenantMapInsertError) -> ApiError {
match tmie {
TenantMapInsertError::StillInitializing | TenantMapInsertError::ShuttingDown => {
ApiError::InternalServerError(anyhow::Error::new(tmie))
}
TenantMapInsertError::TenantAlreadyExists(id, state) => {
ApiError::Conflict(format!("tenant {id} already exists, state: {state:?}"))
}
TenantMapInsertError::Closure(e) => ApiError::InternalServerError(e),
}
TenantMapInsertError::TenantAlreadyExists(id, state) => {
ApiError::Conflict(format!("tenant {id} already exists, state: {state:?}"))
}
}
impl From<TenantStateError> for ApiError {
fn from(tse: TenantStateError) -> ApiError {
match tse {
TenantStateError::NotFound(tid) => ApiError::NotFound(anyhow!("tenant {}", tid)),
_ => ApiError::InternalServerError(anyhow::Error::new(tse)),
}
TenantMapInsertError::Closure(e) => ApiError::InternalServerError(e),
}
}
@@ -171,7 +185,7 @@ fn build_timeline_info_common(
None
}
};
let current_physical_size = Some(timeline.layer_size_sum().approximate_is_ok());
let current_physical_size = Some(timeline.layer_size_sum());
let state = timeline.current_state();
let remote_consistent_lsn = timeline.get_remote_consistent_lsn().unwrap_or(Lsn(0));
@@ -217,9 +231,7 @@ async fn timeline_create_handler(mut request: Request<Body>) -> Result<Response<
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Error);
let tenant = mgr::get_tenant(tenant_id, true)
.await
.map_err(ApiError::NotFound)?;
let tenant = mgr::get_tenant(tenant_id, true).await?;
match tenant.create_timeline(
new_timeline_id,
request_data.ancestor_timeline_id.map(TimelineId::from),
@@ -249,9 +261,7 @@ async fn timeline_list_handler(request: Request<Body>) -> Result<Response<Body>,
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Download);
let response_data = async {
let tenant = mgr::get_tenant(tenant_id, true)
.await
.map_err(ApiError::NotFound)?;
let tenant = mgr::get_tenant(tenant_id, true).await?;
let timelines = tenant.list_timelines();
let mut response_data = Vec::with_capacity(timelines.len());
@@ -267,7 +277,7 @@ async fn timeline_list_handler(request: Request<Body>) -> Result<Response<Body>,
response_data.push(timeline_info);
}
Ok(response_data)
Ok::<Vec<TimelineInfo>, ApiError>(response_data)
}
.instrument(info_span!("timeline_list", tenant = %tenant_id))
.await?;
@@ -286,9 +296,7 @@ async fn timeline_detail_handler(request: Request<Body>) -> Result<Response<Body
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Download);
let timeline_info = async {
let tenant = mgr::get_tenant(tenant_id, true)
.await
.map_err(ApiError::NotFound)?;
let tenant = mgr::get_tenant(tenant_id, true).await?;
let timeline = tenant
.get_timeline(timeline_id, false)
@@ -324,10 +332,7 @@ async fn get_lsn_by_timestamp_handler(request: Request<Body>) -> Result<Response
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Download);
let timeline = active_timeline_of_active_tenant(tenant_id, timeline_id).await?;
let result = timeline
.find_lsn_for_timestamp(timestamp_pg, &ctx)
.await
.map_err(apierror_from_prerror)?;
let result = timeline.find_lsn_for_timestamp(timestamp_pg, &ctx).await?;
let result = match result {
LsnForTimestamp::Present(lsn) => format!("{lsn}"),
@@ -352,8 +357,7 @@ async fn tenant_attach_handler(request: Request<Body>) -> Result<Response<Body>,
if let Some(remote_storage) = &state.remote_storage {
mgr::attach_tenant(state.conf, tenant_id, remote_storage.clone(), &ctx)
.instrument(info_span!("tenant_attach", tenant = %tenant_id))
.await
.map_err(apierror_from_tenant_map_insert_error)?;
.await?;
} else {
return Err(ApiError::BadRequest(anyhow!(
"attach_tenant is not possible because pageserver was configured without remote storage"
@@ -372,11 +376,7 @@ async fn timeline_delete_handler(request: Request<Body>) -> Result<Response<Body
mgr::delete_timeline(tenant_id, timeline_id, &ctx)
.instrument(info_span!("timeline_delete", tenant = %tenant_id, timeline = %timeline_id))
.await
// FIXME: Errors from `delete_timeline` can occur for a number of reasons, incuding both
// user and internal errors. Replace this with better handling once the error type permits
// it.
.map_err(ApiError::InternalServerError)?;
.await?;
json_response(StatusCode::OK, ())
}
@@ -384,15 +384,13 @@ async fn timeline_delete_handler(request: Request<Body>) -> Result<Response<Body
async fn tenant_detach_handler(request: Request<Body>) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
let detach_ignored: Option<bool> = parse_query_param(&request, "detach_ignored")?;
let state = get_state(&request);
let conf = state.conf;
mgr::detach_tenant(conf, tenant_id)
mgr::detach_tenant(conf, tenant_id, detach_ignored.unwrap_or(false))
.instrument(info_span!("tenant_detach", tenant = %tenant_id))
.await
// FIXME: Errors from `detach_tenant` can be caused by both both user and internal errors.
// Replace this with better handling once the error type permits it.
.map_err(ApiError::InternalServerError)?;
.await?;
json_response(StatusCode::OK, ())
}
@@ -406,8 +404,7 @@ async fn tenant_load_handler(request: Request<Body>) -> Result<Response<Body>, A
let state = get_state(&request);
mgr::load_tenant(state.conf, tenant_id, state.remote_storage.clone(), &ctx)
.instrument(info_span!("load", tenant = %tenant_id))
.await
.map_err(apierror_from_tenant_map_insert_error)?;
.await?;
json_response(StatusCode::ACCEPTED, ())
}
@@ -420,10 +417,7 @@ async fn tenant_ignore_handler(request: Request<Body>) -> Result<Response<Body>,
let conf = state.conf;
mgr::ignore_tenant(conf, tenant_id)
.instrument(info_span!("ignore_tenant", tenant = %tenant_id))
.await
// FIXME: Errors from `ignore_tenant` can be caused by both both user and internal errors.
// Replace this with better handling once the error type permits it.
.map_err(ApiError::InternalServerError)?;
.await?;
json_response(StatusCode::OK, ())
}
@@ -458,7 +452,7 @@ async fn tenant_status(request: Request<Body>) -> Result<Response<Body>, ApiErro
// Calculate total physical size of all timelines
let mut current_physical_size = 0;
for timeline in tenant.list_timelines().iter() {
current_physical_size += timeline.layer_size_sum().approximate_is_ok();
current_physical_size += timeline.layer_size_sum();
}
let state = tenant.current_state();
@@ -497,9 +491,7 @@ async fn tenant_size_handler(request: Request<Body>) -> Result<Response<Body>, A
let headers = request.headers();
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Download);
let tenant = mgr::get_tenant(tenant_id, true)
.await
.map_err(ApiError::InternalServerError)?;
let tenant = mgr::get_tenant(tenant_id, true).await?;
// this can be long operation
let inputs = tenant
@@ -747,6 +739,14 @@ async fn tenant_create_handler(mut request: Request<Body>) -> Result<Response<Bo
);
}
if let Some(eviction_policy) = request_data.eviction_policy {
tenant_conf.eviction_policy = Some(
serde_json::from_value(eviction_policy)
.context("parse field `eviction_policy`")
.map_err(ApiError::BadRequest)?,
);
}
let target_tenant_id = request_data
.new_tenant_id
.map(TenantId::from)
@@ -762,8 +762,7 @@ async fn tenant_create_handler(mut request: Request<Body>) -> Result<Response<Bo
&ctx,
)
.instrument(info_span!("tenant_create", tenant = ?target_tenant_id))
.await
.map_err(apierror_from_tenant_map_insert_error)?;
.await?;
// We created the tenant. Existing API semantics are that the tenant
// is Active when this function returns.
@@ -787,9 +786,7 @@ async fn get_tenant_config_handler(request: Request<Body>) -> Result<Response<Bo
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
let tenant = mgr::get_tenant(tenant_id, false)
.await
.map_err(ApiError::NotFound)?;
let tenant = mgr::get_tenant(tenant_id, false).await?;
let response = HashMap::from([
(
@@ -884,10 +881,7 @@ async fn update_tenant_config_handler(
let state = get_state(&request);
mgr::set_new_tenant_config(state.conf, tenant_conf, tenant_id)
.instrument(info_span!("tenant_config", tenant = ?tenant_id))
.await
// FIXME: `update_tenant_config` can fail because of both user and internal errors.
// Replace this `map_err` with better error handling once the type permits it
.map_err(ApiError::InternalServerError)?;
.await?;
json_response(StatusCode::OK, ())
}
@@ -1024,9 +1018,7 @@ async fn active_timeline_of_active_tenant(
tenant_id: TenantId,
timeline_id: TimelineId,
) -> Result<Arc<Timeline>, ApiError> {
let tenant = mgr::get_tenant(tenant_id, true)
.await
.map_err(ApiError::NotFound)?;
let tenant = mgr::get_tenant(tenant_id, true).await?;
tenant
.get_timeline(timeline_id, true)
.map_err(ApiError::NotFound)

View File

@@ -1,11 +1,12 @@
use crate::repository::{key_range_size, singleton_range, Key};
use postgres_ffi::BLCKSZ;
use std::ops::Range;
use tracing::debug;
///
/// Represents a set of Keys, in a compact form.
///
#[derive(Clone, Debug)]
#[derive(Clone, Debug, Default)]
pub struct KeySpace {
/// Contiguous ranges of keys that belong to the key space. In key order,
/// and with no overlap.
@@ -61,6 +62,60 @@ impl KeySpace {
KeyPartitioning { parts }
}
/// Add range to keyspace.
///
/// Unlike KeySpaceAccum, it accepts key ranges in any order and overlapping ranges.
pub fn add_range(&mut self, range: Range<Key>) {
let start = range.start;
let mut end = range.end;
let mut prev_index = match self.ranges.binary_search_by_key(&end, |r| r.start) {
Ok(index) => index,
Err(0) => {
self.ranges.insert(0, range);
return;
}
Err(index) => index - 1,
};
loop {
let mut prev = &mut self.ranges[prev_index];
if prev.end >= start {
// two ranges overlap
if prev.start <= start {
// combine with prev range
if prev.end < end {
prev.end = end;
debug!("Extend wanted image {}..{}", prev.start, end);
}
return;
} else {
if prev.end > end {
end = prev.end;
}
self.ranges.remove(prev_index);
}
} else {
break;
}
if prev_index == 0 {
break;
}
prev_index -= 1;
}
debug!("Wanted image {}..{}", start, end);
self.ranges.insert(prev_index, start..end);
}
///
/// Check if key space contains overlapping range
///
pub fn overlaps(&self, range: &Range<Key>) -> bool {
match self.ranges.binary_search_by_key(&range.end, |r| r.start) {
Ok(_) => false,
Err(0) => false,
Err(index) => self.ranges[index - 1].end > range.start,
}
}
}
///

View File

@@ -9,22 +9,18 @@ use once_cell::sync::Lazy;
use pageserver_api::models::state;
use utils::id::{TenantId, TimelineId};
/// Prometheus histogram buckets (in seconds) that capture the majority of
/// latencies in the microsecond range but also extend far enough up to distinguish
/// "bad" from "really bad".
fn get_buckets_for_critical_operations() -> Vec<f64> {
let buckets_per_digit = 5;
let min_exponent = -6;
let max_exponent = 2;
let mut buckets = vec![];
// Compute 10^(exp / buckets_per_digit) instead of 10^(1/buckets_per_digit)^exp
// because it's more numerically stable and doesn't result in numbers like 9.999999
for exp in (min_exponent * buckets_per_digit)..=(max_exponent * buckets_per_digit) {
buckets.push(10_f64.powf(exp as f64 / buckets_per_digit as f64))
}
buckets
}
/// Prometheus histogram buckets (in seconds) for operations in the critical
/// path. In other words, operations that directly affect that latency of user
/// queries.
///
/// The buckets capture the majority of latencies in the microsecond and
/// millisecond range but also extend far enough up to distinguish "bad" from
/// "really bad".
const CRITICAL_OP_BUCKETS: &[f64] = &[
0.000_001, 0.000_010, 0.000_100, // 1 us, 10 us, 100 us
0.001_000, 0.010_000, 0.100_000, // 1 ms, 10 ms, 100 ms
1.0, 10.0, 100.0, // 1 s, 10 s, 100 s
];
// Metrics collected on operations on the storage repository.
const STORAGE_TIME_OPERATIONS: &[&str] = &[
@@ -55,12 +51,15 @@ pub static STORAGE_TIME_COUNT_PER_TIMELINE: Lazy<IntCounterVec> = Lazy::new(|| {
.expect("failed to define a metric")
});
// Buckets for background operations like compaction, GC, size calculation
const STORAGE_OP_BUCKETS: &[f64] = &[0.010, 0.100, 1.0, 10.0, 100.0, 1000.0];
pub static STORAGE_TIME_GLOBAL: Lazy<HistogramVec> = Lazy::new(|| {
register_histogram_vec!(
"pageserver_storage_operations_seconds_global",
"Time spent on storage operations",
&["operation"],
get_buckets_for_critical_operations(),
STORAGE_OP_BUCKETS.into(),
)
.expect("failed to define a metric")
});
@@ -71,7 +70,7 @@ static RECONSTRUCT_TIME: Lazy<HistogramVec> = Lazy::new(|| {
"pageserver_getpage_reconstruct_seconds",
"Time spent in reconstruct_value",
&["tenant_id", "timeline_id"],
get_buckets_for_critical_operations(),
CRITICAL_OP_BUCKETS.into(),
)
.expect("failed to define a metric")
});
@@ -90,7 +89,7 @@ static WAIT_LSN_TIME: Lazy<HistogramVec> = Lazy::new(|| {
"pageserver_wait_lsn_seconds",
"Time spent waiting for WAL to arrive",
&["tenant_id", "timeline_id"],
get_buckets_for_critical_operations(),
CRITICAL_OP_BUCKETS.into(),
)
.expect("failed to define a metric")
});
@@ -123,6 +122,22 @@ static REMOTE_PHYSICAL_SIZE: Lazy<UIntGaugeVec> = Lazy::new(|| {
.expect("failed to define a metric")
});
pub static REMOTE_ONDEMAND_DOWNLOADED_LAYERS: Lazy<IntCounter> = Lazy::new(|| {
register_int_counter!(
"pageserver_remote_ondemand_downloaded_layers_total",
"Total on-demand downloaded layers"
)
.unwrap()
});
pub static REMOTE_ONDEMAND_DOWNLOADED_BYTES: Lazy<IntCounter> = Lazy::new(|| {
register_int_counter!(
"pageserver_remote_ondemand_downloaded_bytes_total",
"Total bytes of layers on-demand downloaded",
)
.unwrap()
});
static CURRENT_LOGICAL_SIZE: Lazy<UIntGaugeVec> = Lazy::new(|| {
register_uint_gauge_vec!(
"pageserver_current_logical_size",
@@ -179,15 +194,101 @@ static PERSISTENT_BYTES_WRITTEN: Lazy<IntCounterVec> = Lazy::new(|| {
.expect("failed to define a metric")
});
static EVICTIONS: Lazy<IntCounterVec> = Lazy::new(|| {
register_int_counter_vec!(
"pageserver_evictions",
"Number of layers evicted from the pageserver",
&["tenant_id", "timeline_id"]
)
.expect("failed to define a metric")
});
static EVICTIONS_WITH_LOW_RESIDENCE_DURATION: Lazy<IntCounterVec> = Lazy::new(|| {
register_int_counter_vec!(
"pageserver_evictions_with_low_residence_duration",
"If a layer is evicted that was resident for less than `low_threshold`, it is counted to this counter. \
Residence duration is determined using the `residence_duration_data_source`.",
&["tenant_id", "timeline_id", "residence_duration_data_source", "low_threshold_secs"]
)
.expect("failed to define a metric")
});
/// Each [`Timeline`]'s [`EVICTIONS_WITH_LOW_RESIDENCE_DURATION`] metric.
#[derive(Debug)]
pub struct EvictionsWithLowResidenceDuration {
data_source: &'static str,
threshold: Duration,
counter: Option<IntCounter>,
}
pub struct EvictionsWithLowResidenceDurationBuilder {
data_source: &'static str,
threshold: Duration,
}
impl EvictionsWithLowResidenceDurationBuilder {
pub fn new(data_source: &'static str, threshold: Duration) -> Self {
Self {
data_source,
threshold,
}
}
fn build(&self, tenant_id: &str, timeline_id: &str) -> EvictionsWithLowResidenceDuration {
let counter = EVICTIONS_WITH_LOW_RESIDENCE_DURATION
.get_metric_with_label_values(&[
tenant_id,
timeline_id,
self.data_source,
&EvictionsWithLowResidenceDuration::threshold_label_value(self.threshold),
])
.unwrap();
EvictionsWithLowResidenceDuration {
data_source: self.data_source,
threshold: self.threshold,
counter: Some(counter),
}
}
}
impl EvictionsWithLowResidenceDuration {
fn threshold_label_value(threshold: Duration) -> String {
format!("{}", threshold.as_secs())
}
pub fn observe(&self, observed_value: Duration) {
if self.threshold < observed_value {
self.counter
.as_ref()
.expect("nobody calls this function after `remove_from_vec`")
.inc();
}
}
// This could be a `Drop` impl, but, we need the `tenant_id` and `timeline_id`.
fn remove(&mut self, tenant_id: &str, timeline_id: &str) {
let Some(_counter) = self.counter.take() else {
return;
};
EVICTIONS_WITH_LOW_RESIDENCE_DURATION
.remove_label_values(&[
tenant_id,
timeline_id,
self.data_source,
&Self::threshold_label_value(self.threshold),
])
.expect("we own the metric, no-one else should remove it");
}
}
// Metrics collected on disk IO operations
//
// Roughly logarithmic scale.
const STORAGE_IO_TIME_BUCKETS: &[f64] = &[
0.000001, // 1 usec
0.00001, // 10 usec
0.0001, // 100 usec
0.001, // 1 msec
0.01, // 10 msec
0.1, // 100 msec
1.0, // 1 sec
0.000030, // 30 usec
0.001000, // 1000 usec
0.030, // 30 ms
1.000, // 1000 ms
];
const STORAGE_IO_TIME_OPERATIONS: &[&str] = &[
@@ -222,20 +323,12 @@ const SMGR_QUERY_TIME_OPERATIONS: &[&str] = &[
"get_db_size",
];
const SMGR_QUERY_TIME_BUCKETS: &[f64] = &[
0.00001, // 1/100000 s
0.0001, 0.00015, 0.0002, 0.00025, 0.0003, 0.00035, 0.0005, 0.00075, // 1/10000 s
0.001, 0.0025, 0.005, 0.0075, // 1/1000 s
0.01, 0.0125, 0.015, 0.025, 0.05, // 1/100 s
0.1, // 1/10 s
];
pub static SMGR_QUERY_TIME: Lazy<HistogramVec> = Lazy::new(|| {
register_histogram_vec!(
"pageserver_smgr_query_seconds",
"Time spent on smgr query handling",
&["smgr_query_type", "tenant_id", "timeline_id"],
SMGR_QUERY_TIME_BUCKETS.into()
CRITICAL_OP_BUCKETS.into(),
)
.expect("failed to define a metric")
});
@@ -504,10 +597,16 @@ pub struct TimelineMetrics {
pub current_logical_size_gauge: UIntGauge,
pub num_persistent_files_created: IntCounter,
pub persistent_bytes_written: IntCounter,
pub evictions: IntCounter,
pub evictions_with_low_residence_duration: EvictionsWithLowResidenceDuration,
}
impl TimelineMetrics {
pub fn new(tenant_id: &TenantId, timeline_id: &TimelineId) -> Self {
pub fn new(
tenant_id: &TenantId,
timeline_id: &TimelineId,
evictions_with_low_residence_duration_builder: EvictionsWithLowResidenceDurationBuilder,
) -> Self {
let tenant_id = tenant_id.to_string();
let timeline_id = timeline_id.to_string();
let reconstruct_time_histo = RECONSTRUCT_TIME
@@ -544,6 +643,11 @@ impl TimelineMetrics {
let persistent_bytes_written = PERSISTENT_BYTES_WRITTEN
.get_metric_with_label_values(&[&tenant_id, &timeline_id])
.unwrap();
let evictions = EVICTIONS
.get_metric_with_label_values(&[&tenant_id, &timeline_id])
.unwrap();
let evictions_with_low_residence_duration =
evictions_with_low_residence_duration_builder.build(&tenant_id, &timeline_id);
TimelineMetrics {
tenant_id,
@@ -563,6 +667,8 @@ impl TimelineMetrics {
current_logical_size_gauge,
num_persistent_files_created,
persistent_bytes_written,
evictions,
evictions_with_low_residence_duration,
}
}
}
@@ -579,7 +685,9 @@ impl Drop for TimelineMetrics {
let _ = CURRENT_LOGICAL_SIZE.remove_label_values(&[tenant_id, timeline_id]);
let _ = NUM_PERSISTENT_FILES_CREATED.remove_label_values(&[tenant_id, timeline_id]);
let _ = PERSISTENT_BYTES_WRITTEN.remove_label_values(&[tenant_id, timeline_id]);
let _ = EVICTIONS.remove_label_values(&[tenant_id, timeline_id]);
self.evictions_with_low_residence_duration
.remove(tenant_id, timeline_id);
for op in STORAGE_TIME_OPERATIONS {
let _ =
STORAGE_TIME_SUM_PER_TIMELINE.remove_label_values(&[op, tenant_id, timeline_id]);
@@ -614,7 +722,7 @@ use std::collections::HashMap;
use std::pin::Pin;
use std::sync::{Arc, Mutex};
use std::task::{Context, Poll};
use std::time::Instant;
use std::time::{Duration, Instant};
pub struct RemoteTimelineClientMetrics {
tenant_id: String,

View File

@@ -12,7 +12,7 @@
use anyhow::Context;
use bytes::Buf;
use bytes::Bytes;
use futures::{Stream, StreamExt};
use futures::Stream;
use pageserver_api::models::TenantState;
use pageserver_api::models::{
PagestreamBeMessage, PagestreamDbSizeRequest, PagestreamDbSizeResponse,
@@ -20,6 +20,7 @@ use pageserver_api::models::{
PagestreamFeMessage, PagestreamGetPageRequest, PagestreamGetPageResponse,
PagestreamNblocksRequest, PagestreamNblocksResponse,
};
use postgres_backend::PostgresBackendTCP;
use postgres_backend::{self, is_expected_io_error, AuthType, PostgresBackend, QueryError};
use pq_proto::framed::ConnectionError;
use pq_proto::FeStartupPacket;
@@ -30,6 +31,7 @@ use std::str;
use std::str::FromStr;
use std::sync::Arc;
use std::time::Duration;
use tokio_util::io::StreamReader;
use tracing::*;
use utils::id::ConnectionId;
use utils::{
@@ -54,7 +56,7 @@ use crate::trace::Tracer;
use postgres_ffi::pg_constants::DEFAULTTABLESPACE_OID;
use postgres_ffi::BLCKSZ;
fn copyin_stream(pgb: &mut PostgresBackend) -> impl Stream<Item = io::Result<Bytes>> + '_ {
fn copyin_stream(pgb: &mut PostgresBackendTCP) -> impl Stream<Item = io::Result<Bytes>> + '_ {
async_stream::try_stream! {
loop {
let msg = tokio::select! {
@@ -114,6 +116,49 @@ fn copyin_stream(pgb: &mut PostgresBackend) -> impl Stream<Item = io::Result<Byt
}
}
/// Read the end of a tar archive.
///
/// A tar archive normally ends with two consecutive blocks of zeros, 512 bytes each.
/// `tokio_tar` already read the first such block. Read the second all-zeros block,
/// and check that there is no more data after the EOF marker.
///
/// XXX: Currently, any trailing data after the EOF marker prints a warning.
/// Perhaps it should be a hard error?
async fn read_tar_eof(mut reader: (impl tokio::io::AsyncRead + Unpin)) -> anyhow::Result<()> {
use tokio::io::AsyncReadExt;
let mut buf = [0u8; 512];
// Read the all-zeros block, and verify it
let mut total_bytes = 0;
while total_bytes < 512 {
let nbytes = reader.read(&mut buf[total_bytes..]).await?;
total_bytes += nbytes;
if nbytes == 0 {
break;
}
}
if total_bytes < 512 {
anyhow::bail!("incomplete or invalid tar EOF marker");
}
if !buf.iter().all(|&x| x == 0) {
anyhow::bail!("invalid tar EOF marker");
}
// Drain any data after the EOF marker
let mut trailing_bytes = 0;
loop {
let nbytes = reader.read(&mut buf).await?;
trailing_bytes += nbytes;
if nbytes == 0 {
break;
}
}
if trailing_bytes > 0 {
warn!("ignored {trailing_bytes} unexpected bytes after the tar archive");
}
Ok(())
}
///////////////////////////////////////////////////////////////////////////////
///
@@ -288,7 +333,7 @@ impl PageServerHandler {
#[instrument(skip(self, pgb, ctx))]
async fn handle_pagerequests(
&self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
tenant_id: TenantId,
timeline_id: TimelineId,
ctx: RequestContext,
@@ -392,7 +437,7 @@ impl PageServerHandler {
#[instrument(skip(self, pgb, ctx))]
async fn handle_import_basebackup(
&self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
tenant_id: TenantId,
timeline_id: TimelineId,
base_lsn: Lsn,
@@ -421,19 +466,14 @@ impl PageServerHandler {
pgb.write_message_noflush(&BeMessage::CopyInResponse)?;
pgb.flush().await?;
let mut copyin_stream = Box::pin(copyin_stream(pgb));
let copyin_reader = StreamReader::new(copyin_stream(pgb));
tokio::pin!(copyin_reader);
timeline
.import_basebackup_from_tar(&mut copyin_stream, base_lsn, &ctx)
.import_basebackup_from_tar(&mut copyin_reader, base_lsn, &ctx)
.await?;
// Drain the rest of the Copy data
let mut bytes_after_tar = 0;
while let Some(bytes) = copyin_stream.next().await {
bytes_after_tar += bytes?.len();
}
if bytes_after_tar > 0 {
warn!("ignored {bytes_after_tar} unexpected bytes after the tar archive");
}
// Read the end of the tar archive.
read_tar_eof(copyin_reader).await?;
// TODO check checksum
// Meanwhile you can verify client-side by taking fullbackup
@@ -448,7 +488,7 @@ impl PageServerHandler {
#[instrument(skip(self, pgb, ctx))]
async fn handle_import_wal(
&self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
tenant_id: TenantId,
timeline_id: TimelineId,
start_lsn: Lsn,
@@ -472,19 +512,13 @@ impl PageServerHandler {
info!("importing wal");
pgb.write_message_noflush(&BeMessage::CopyInResponse)?;
pgb.flush().await?;
let mut copyin_stream = Box::pin(copyin_stream(pgb));
let mut reader = tokio_util::io::StreamReader::new(&mut copyin_stream);
import_wal_from_tar(&timeline, &mut reader, start_lsn, end_lsn, &ctx).await?;
let copyin_reader = StreamReader::new(copyin_stream(pgb));
tokio::pin!(copyin_reader);
import_wal_from_tar(&timeline, &mut copyin_reader, start_lsn, end_lsn, &ctx).await?;
info!("wal import complete");
// Drain the rest of the Copy data
let mut bytes_after_tar = 0;
while let Some(bytes) = copyin_stream.next().await {
bytes_after_tar += bytes?.len();
}
if bytes_after_tar > 0 {
warn!("ignored {bytes_after_tar} unexpected bytes after the tar archive");
}
// Read the end of the tar archive.
read_tar_eof(copyin_reader).await?;
// TODO Does it make sense to overshoot?
if timeline.get_last_record_lsn() < end_lsn {
@@ -659,7 +693,7 @@ impl PageServerHandler {
#[instrument(skip(self, pgb, ctx))]
async fn handle_basebackup_request(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
tenant_id: TenantId,
timeline_id: TimelineId,
lsn: Option<Lsn>,
@@ -723,10 +757,10 @@ impl PageServerHandler {
}
#[async_trait::async_trait]
impl postgres_backend::Handler for PageServerHandler {
impl postgres_backend::Handler<tokio::net::TcpStream> for PageServerHandler {
fn check_auth_jwt(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackendTCP,
jwt_response: &[u8],
) -> Result<(), QueryError> {
// this unwrap is never triggered, because check_auth_jwt only called when auth_type is NeonJWT
@@ -754,7 +788,7 @@ impl postgres_backend::Handler for PageServerHandler {
fn startup(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackendTCP,
_sm: &FeStartupPacket,
) -> Result<(), QueryError> {
Ok(())
@@ -762,7 +796,7 @@ impl postgres_backend::Handler for PageServerHandler {
async fn process_query(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
query_string: &str,
) -> Result<(), QueryError> {
let ctx = self.connection_ctx.attached_child();
@@ -1073,7 +1107,10 @@ async fn get_active_tenant_with_timeout(
tenant_id: TenantId,
_ctx: &RequestContext, /* require get a context to support cancellation in the future */
) -> Result<Arc<Tenant>, GetActiveTenantError> {
let tenant = mgr::get_tenant(tenant_id, false).await?;
let tenant = match mgr::get_tenant(tenant_id, false).await {
Ok(tenant) => tenant,
Err(e) => return Err(GetActiveTenantError::Other(e.into())),
};
let wait_time = Duration::from_secs(30);
match tokio::time::timeout(wait_time, tenant.wait_to_become_active()).await {
Ok(Ok(())) => Ok(tenant),

View File

@@ -12,9 +12,7 @@
//!
use anyhow::{bail, Context};
use bytes::Bytes;
use futures::FutureExt;
use futures::Stream;
use pageserver_api::models::TimelineState;
use remote_storage::DownloadError;
use remote_storage::GenericRemoteStorage;
@@ -239,14 +237,13 @@ impl UninitializedTimeline<'_> {
/// Prepares timeline data by loading it from the basebackup archive.
pub async fn import_basebackup_from_tar(
self,
copyin_stream: &mut (impl Stream<Item = io::Result<Bytes>> + Sync + Send + Unpin),
copyin_read: &mut (impl tokio::io::AsyncRead + Send + Sync + Unpin),
base_lsn: Lsn,
ctx: &RequestContext,
) -> anyhow::Result<Arc<Timeline>> {
let raw_timeline = self.raw_timeline()?;
let mut reader = tokio_util::io::StreamReader::new(copyin_stream);
import_datadir::import_basebackup_from_tar(raw_timeline, &mut reader, base_lsn, ctx)
import_datadir::import_basebackup_from_tar(raw_timeline, copyin_read, base_lsn, ctx)
.await
.context("Failed to import basebackup")?;
@@ -481,7 +478,7 @@ impl Tenant {
let dummy_timeline = self.create_timeline_data(
timeline_id,
up_to_date_metadata.clone(),
up_to_date_metadata,
ancestor.clone(),
remote_client,
)?;
@@ -506,7 +503,7 @@ impl Tenant {
let broken_timeline = self
.create_timeline_data(
timeline_id,
up_to_date_metadata.clone(),
up_to_date_metadata,
ancestor.clone(),
None,
)
@@ -1145,7 +1142,7 @@ impl Tenant {
);
self.prepare_timeline(
new_timeline_id,
new_metadata,
&new_metadata,
timeline_uninit_mark,
true,
None,
@@ -1243,11 +1240,8 @@ impl Tenant {
"Cannot run GC iteration on inactive tenant"
);
let gc_result = self
.gc_iteration_internal(target_timeline_id, horizon, pitr, ctx)
.await;
gc_result
self.gc_iteration_internal(target_timeline_id, horizon, pitr, ctx)
.await
}
/// Perform one compaction iteration.
@@ -1706,7 +1700,7 @@ impl Tenant {
fn create_timeline_data(
&self,
new_timeline_id: TimelineId,
new_metadata: TimelineMetadata,
new_metadata: &TimelineMetadata,
ancestor: Option<Arc<Timeline>>,
remote_client: Option<RemoteTimelineClient>,
) -> anyhow::Result<Arc<Timeline>> {
@@ -2166,13 +2160,25 @@ impl Tenant {
let new_timeline = self
.prepare_timeline(
dst_id,
metadata,
&metadata,
timeline_uninit_mark,
false,
Some(Arc::clone(src_timeline)),
)?
.initialize_with_lock(&mut timelines, true, true)?;
drop(timelines);
// Root timeline gets its layers during creation and uploads them along with the metadata.
// A branch timeline though, when created, can get no writes for some time, hence won't get any layers created.
// We still need to upload its metadata eagerly: if other nodes `attach` the tenant and miss this timeline, their GC
// could get incorrect information and remove more layers, than needed.
// See also https://github.com/neondatabase/neon/issues/3865
if let Some(remote_client) = new_timeline.remote_client.as_ref() {
remote_client
.schedule_index_upload_for_metadata_update(&metadata)
.context("branch initial metadata upload")?;
}
info!("branched timeline {dst_id} from {src_id} at {start_lsn}");
Ok(new_timeline)
@@ -2235,7 +2241,7 @@ impl Tenant {
pg_version,
);
let raw_timeline =
self.prepare_timeline(timeline_id, new_metadata, timeline_uninit_mark, true, None)?;
self.prepare_timeline(timeline_id, &new_metadata, timeline_uninit_mark, true, None)?;
let tenant_id = raw_timeline.owning_tenant.tenant_id;
let unfinished_timeline = raw_timeline.raw_timeline()?;
@@ -2289,7 +2295,7 @@ impl Tenant {
fn prepare_timeline(
&self,
new_timeline_id: TimelineId,
new_metadata: TimelineMetadata,
new_metadata: &TimelineMetadata,
uninit_mark: TimelineUninitMark,
init_layers: bool,
ancestor: Option<Arc<Timeline>>,
@@ -2303,7 +2309,7 @@ impl Tenant {
tenant_id,
new_timeline_id,
);
remote_client.init_upload_queue_for_empty_remote(&new_metadata)?;
remote_client.init_upload_queue_for_empty_remote(new_metadata)?;
Some(remote_client)
} else {
None
@@ -2342,17 +2348,12 @@ impl Tenant {
&self,
timeline_path: &Path,
new_timeline_id: TimelineId,
new_metadata: TimelineMetadata,
new_metadata: &TimelineMetadata,
ancestor: Option<Arc<Timeline>>,
remote_client: Option<RemoteTimelineClient>,
) -> anyhow::Result<Arc<Timeline>> {
let timeline_data = self
.create_timeline_data(
new_timeline_id,
new_metadata.clone(),
ancestor,
remote_client,
)
.create_timeline_data(new_timeline_id, new_metadata, ancestor, remote_client)
.context("Failed to create timeline data structure")?;
crashsafe::create_dir_all(timeline_path).context("Failed to create timeline directory")?;
@@ -2364,7 +2365,7 @@ impl Tenant {
self.conf,
new_timeline_id,
self.tenant_id,
&new_metadata,
new_metadata,
true,
)
.context("Failed to create timeline metadata")?;
@@ -3176,6 +3177,44 @@ mod tests {
}
*/
#[tokio::test]
async fn test_get_branchpoints_from_an_inactive_timeline() -> anyhow::Result<()> {
let (tenant, ctx) =
TenantHarness::create("test_get_branchpoints_from_an_inactive_timeline")?
.load()
.await;
let tline = tenant
.create_empty_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?
.initialize(&ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
tenant
.branch_timeline(&tline, NEW_TIMELINE_ID, Some(Lsn(0x40)), &ctx)
.await?;
let newtline = tenant
.get_timeline(NEW_TIMELINE_ID, true)
.expect("Should have a local timeline");
make_some_layers(newtline.as_ref(), Lsn(0x60)).await?;
tline.set_state(TimelineState::Broken);
tenant
.gc_iteration(Some(TIMELINE_ID), 0x10, Duration::ZERO, &ctx)
.await?;
assert_eq!(
newtline.get(*TEST_KEY, Lsn(0x50), &ctx).await?,
TEST_IMG(&format!("foo at {}", Lsn(0x40)))
);
let branchpoints = &tline.gc_info.read().unwrap().retain_lsns;
assert_eq!(branchpoints.len(), 1);
assert_eq!(branchpoints[0], Lsn(0x40));
Ok(())
}
#[tokio::test]
async fn test_retain_data_in_parent_which_is_needed_for_child() -> anyhow::Result<()> {
let (tenant, ctx) =

View File

@@ -51,9 +51,6 @@ where
///
/// A "cursor" for efficiently reading multiple pages from a BlockReader
///
/// A cursor caches the last accessed page, allowing for faster access if the
/// same block is accessed repeatedly.
///
/// You can access the last page with `*cursor`. 'read_blk' returns 'self', so
/// that in many cases you can use a BlockCursor as a drop-in replacement for
/// the underlying BlockReader. For example:
@@ -73,8 +70,6 @@ where
R: BlockReader,
{
reader: R,
/// last accessed page
cache: Option<(u32, R::BlockLease)>,
}
impl<R> BlockCursor<R>
@@ -82,40 +77,13 @@ where
R: BlockReader,
{
pub fn new(reader: R) -> Self {
BlockCursor {
reader,
cache: None,
}
BlockCursor { reader }
}
pub fn read_blk(&mut self, blknum: u32) -> Result<&Self, std::io::Error> {
// Fast return if this is the same block as before
if let Some((cached_blk, _buf)) = &self.cache {
if *cached_blk == blknum {
return Ok(self);
}
}
// Read the block from the underlying reader, and cache it
self.cache = None;
let buf = self.reader.read_blk(blknum)?;
self.cache = Some((blknum, buf));
Ok(self)
pub fn read_blk(&mut self, blknum: u32) -> Result<R::BlockLease, std::io::Error> {
self.reader.read_blk(blknum)
}
}
impl<R> Deref for BlockCursor<R>
where
R: BlockReader,
{
type Target = [u8; PAGE_SZ];
fn deref(&self) -> &<Self as Deref>::Target {
&self.cache.as_ref().unwrap().1
}
}
static NEXT_ID: AtomicU64 = AtomicU64::new(1);
/// An adapter for reading a (virtual) file using the page cache.

View File

@@ -222,48 +222,6 @@ impl TenantConfOpt {
eviction_policy: self.eviction_policy.unwrap_or(global_conf.eviction_policy),
}
}
pub fn update(&mut self, other: &TenantConfOpt) {
if let Some(checkpoint_distance) = other.checkpoint_distance {
self.checkpoint_distance = Some(checkpoint_distance);
}
if let Some(checkpoint_timeout) = other.checkpoint_timeout {
self.checkpoint_timeout = Some(checkpoint_timeout);
}
if let Some(compaction_target_size) = other.compaction_target_size {
self.compaction_target_size = Some(compaction_target_size);
}
if let Some(compaction_period) = other.compaction_period {
self.compaction_period = Some(compaction_period);
}
if let Some(compaction_threshold) = other.compaction_threshold {
self.compaction_threshold = Some(compaction_threshold);
}
if let Some(gc_horizon) = other.gc_horizon {
self.gc_horizon = Some(gc_horizon);
}
if let Some(gc_period) = other.gc_period {
self.gc_period = Some(gc_period);
}
if let Some(image_creation_threshold) = other.image_creation_threshold {
self.image_creation_threshold = Some(image_creation_threshold);
}
if let Some(pitr_interval) = other.pitr_interval {
self.pitr_interval = Some(pitr_interval);
}
if let Some(walreceiver_connect_timeout) = other.walreceiver_connect_timeout {
self.walreceiver_connect_timeout = Some(walreceiver_connect_timeout);
}
if let Some(lagging_wal_timeout) = other.lagging_wal_timeout {
self.lagging_wal_timeout = Some(lagging_wal_timeout);
}
if let Some(max_lsn_wal_lag) = other.max_lsn_wal_lag {
self.max_lsn_wal_lag = Some(max_lsn_wal_lag);
}
if let Some(trace_read_requests) = other.trace_read_requests {
self.trace_read_requests = Some(trace_read_requests);
}
}
}
impl Default for TenantConf {

View File

@@ -2,9 +2,7 @@
//! used to keep in-memory layers spilled on disk.
use crate::config::PageServerConf;
use crate::page_cache;
use crate::page_cache::PAGE_SZ;
use crate::page_cache::{ReadBufResult, WriteBufResult};
use crate::page_cache::{self, ReadBufResult, WriteBufResult, PAGE_SZ};
use crate::tenant::blob_io::BlobWriter;
use crate::tenant::block_io::BlockReader;
use crate::virtual_file::VirtualFile;
@@ -427,7 +425,6 @@ mod tests {
let actual = cursor.read_blob(pos)?;
assert_eq!(actual, expected);
}
drop(cursor);
// Test a large blob that spans multiple pages
let mut large_data = Vec::new();

View File

@@ -154,11 +154,7 @@ where
expected: &Arc<L>,
new: Arc<L>,
) -> anyhow::Result<Replacement<Arc<L>>> {
fail::fail_point!("layermap-replace-notfound", |_| Ok(
// this is not what happens if an L0 layer was not found a anyhow error but perhaps
// that should be changed. this is good enough to show a replacement failure.
Replacement::NotFound
));
fail::fail_point!("layermap-replace-notfound", |_| Ok(Replacement::NotFound));
self.layer_map.replace_historic_noflush(expected, new)
}
@@ -340,12 +336,15 @@ where
let l0_index = if expected_l0 {
// find the index in case replace worked, we need to replace that as well
Some(
self.l0_delta_layers
.iter()
.position(|slot| Self::compare_arced_layers(slot, expected))
.ok_or_else(|| anyhow::anyhow!("existing l0 delta layer was not found"))?,
)
let pos = self
.l0_delta_layers
.iter()
.position(|slot| Self::compare_arced_layers(slot, expected));
if pos.is_none() {
return Ok(Replacement::NotFound);
}
pos
} else {
None
};
@@ -804,6 +803,26 @@ mod tests {
)
}
#[test]
fn replacing_missing_l0_is_notfound() {
// original impl had an oversight, and L0 was an anyhow::Error. anyhow::Error should
// however only happen for precondition failures.
let layer = "000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000053423C21-0000000053424D69";
let layer = LayerFileName::from_str(layer).unwrap();
let layer = LayerDescriptor::from(layer);
// same skeletan construction; see scenario below
let not_found: Arc<dyn Layer> = Arc::new(layer.clone());
let new_version: Arc<dyn Layer> = Arc::new(layer);
let mut map = LayerMap::default();
let res = map.batch_update().replace_historic(&not_found, new_version);
assert!(matches!(res, Ok(Replacement::NotFound)), "{res:?}");
}
fn l0_delta_layers_updated_scenario(layer_name: &str, expected_l0: bool) {
let name = LayerFileName::from_str(layer_name).unwrap();
let skeleton = LayerDescriptor::from(name);
@@ -813,7 +832,8 @@ mod tests {
let mut map = LayerMap::default();
// two disjoint Arcs in different lifecycle phases.
// two disjoint Arcs in different lifecycle phases. even if it seems they must be the
// same layer, we use LayerMap::compare_arced_layers as the identity of layers.
assert!(!LayerMap::compare_arced_layers(&remote, &downloaded));
let expected_in_counts = (1, usize::from(expected_l0));

View File

@@ -289,7 +289,7 @@ pub async fn set_new_tenant_config(
conf: &'static PageServerConf,
new_tenant_conf: TenantConfOpt,
tenant_id: TenantId,
) -> anyhow::Result<()> {
) -> Result<(), TenantStateError> {
info!("configuring tenant {tenant_id}");
let tenant = get_tenant(tenant_id, true).await?;
@@ -306,16 +306,16 @@ pub async fn set_new_tenant_config(
/// Gets the tenant from the in-memory data, erroring if it's absent or is not fitting to the query.
/// `active_only = true` allows to query only tenants that are ready for operations, erroring on other kinds of tenants.
pub async fn get_tenant(tenant_id: TenantId, active_only: bool) -> anyhow::Result<Arc<Tenant>> {
pub async fn get_tenant(
tenant_id: TenantId,
active_only: bool,
) -> Result<Arc<Tenant>, TenantStateError> {
let m = TENANTS.read().await;
let tenant = m
.get(&tenant_id)
.with_context(|| format!("Tenant {tenant_id} not found in the local state"))?;
.ok_or(TenantStateError::NotFound(tenant_id))?;
if active_only && !tenant.is_active() {
anyhow::bail!(
"Tenant {tenant_id} is not active. Current state: {:?}",
tenant.current_state()
)
Err(TenantStateError::NotActive(tenant_id))
} else {
Ok(Arc::clone(tenant))
}
@@ -325,31 +325,56 @@ pub async fn delete_timeline(
tenant_id: TenantId,
timeline_id: TimelineId,
ctx: &RequestContext,
) -> anyhow::Result<()> {
match get_tenant(tenant_id, true).await {
Ok(tenant) => {
tenant.delete_timeline(timeline_id, ctx).await?;
}
Err(e) => anyhow::bail!("Cannot access tenant {tenant_id} in local tenant state: {e:?}"),
}
) -> Result<(), TenantStateError> {
let tenant = get_tenant(tenant_id, true).await?;
tenant.delete_timeline(timeline_id, ctx).await?;
Ok(())
}
#[derive(Debug, thiserror::Error)]
pub enum TenantStateError {
#[error("Tenant {0} not found")]
NotFound(TenantId),
#[error("Tenant {0} is stopping")]
IsStopping(TenantId),
#[error("Tenant {0} is not active")]
NotActive(TenantId),
#[error(transparent)]
Other(#[from] anyhow::Error),
}
pub async fn detach_tenant(
conf: &'static PageServerConf,
tenant_id: TenantId,
) -> anyhow::Result<()> {
remove_tenant_from_memory(tenant_id, async {
let local_tenant_directory = conf.tenant_path(&tenant_id);
detach_ignored: bool,
) -> Result<(), TenantStateError> {
let local_files_cleanup_operation = |tenant_id_to_clean| async move {
let local_tenant_directory = conf.tenant_path(&tenant_id_to_clean);
fs::remove_dir_all(&local_tenant_directory)
.await
.with_context(|| {
format!("Failed to remove local tenant directory {local_tenant_directory:?}")
format!("local tenant directory {local_tenant_directory:?} removal")
})?;
Ok(())
})
.await
};
let removal_result =
remove_tenant_from_memory(tenant_id, local_files_cleanup_operation(tenant_id)).await;
// Ignored tenants are not present in memory and will bail the removal from memory operation.
// Before returning the error, check for ignored tenant removal case — we only need to clean its local files then.
if detach_ignored && matches!(removal_result, Err(TenantStateError::NotFound(_))) {
let tenant_ignore_mark = conf.tenant_ignore_mark_file_path(tenant_id);
if tenant_ignore_mark.exists() {
info!("Detaching an ignored tenant");
local_files_cleanup_operation(tenant_id)
.await
.with_context(|| format!("Ignored tenant {tenant_id} local files cleanup"))?;
return Ok(());
}
}
removal_result
}
pub async fn load_tenant(
@@ -379,7 +404,7 @@ pub async fn load_tenant(
pub async fn ignore_tenant(
conf: &'static PageServerConf,
tenant_id: TenantId,
) -> anyhow::Result<()> {
) -> Result<(), TenantStateError> {
remove_tenant_from_memory(tenant_id, async {
let ignore_mark_file = conf.tenant_ignore_mark_file_path(tenant_id);
fs::File::create(&ignore_mark_file)
@@ -489,7 +514,7 @@ where
async fn remove_tenant_from_memory<V, F>(
tenant_id: TenantId,
tenant_cleanup: F,
) -> anyhow::Result<V>
) -> Result<V, TenantStateError>
where
F: std::future::Future<Output = anyhow::Result<V>>,
{
@@ -505,11 +530,9 @@ where
| TenantState::Loading
| TenantState::Broken
| TenantState::Active => tenant.set_stopping(),
TenantState::Stopping => {
anyhow::bail!("Tenant {tenant_id} is stopping already")
}
TenantState::Stopping => return Err(TenantStateError::IsStopping(tenant_id)),
},
None => anyhow::bail!("Tenant not found for id {tenant_id}"),
None => return Err(TenantStateError::NotFound(tenant_id)),
}
}
@@ -532,10 +555,15 @@ where
Err(e) => {
let tenants_accessor = TENANTS.read().await;
match tenants_accessor.get(&tenant_id) {
Some(tenant) => tenant.set_broken(&e.to_string()),
None => warn!("Tenant {tenant_id} got removed from memory"),
Some(tenant) => {
tenant.set_broken(&e.to_string());
}
None => {
warn!("Tenant {tenant_id} got removed from memory");
return Err(TenantStateError::NotFound(tenant_id));
}
}
Err(e)
Err(TenantStateError::Other(e))
}
}
}
@@ -555,7 +583,7 @@ pub async fn immediate_gc(
let tenant = guard
.get(&tenant_id)
.map(Arc::clone)
.with_context(|| format!("Tenant {tenant_id} not found"))
.with_context(|| format!("tenant {tenant_id}"))
.map_err(ApiError::NotFound)?;
let gc_horizon = gc_req.gc_horizon.unwrap_or_else(|| tenant.get_gc_horizon());
@@ -605,7 +633,7 @@ pub async fn immediate_compact(
let tenant = guard
.get(&tenant_id)
.map(Arc::clone)
.with_context(|| format!("Tenant {tenant_id} not found"))
.with_context(|| format!("tenant {tenant_id}"))
.map_err(ApiError::NotFound)?;
let timeline = tenant

View File

@@ -210,7 +210,6 @@ pub use download::{is_temp_download_file, list_remote_timelines};
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Arc, Mutex};
use anyhow::ensure;
use remote_storage::{DownloadError, GenericRemoteStorage};
use std::ops::DerefMut;
use tokio::runtime::Runtime;
@@ -218,9 +217,10 @@ use tracing::{debug, info, warn};
use tracing::{info_span, Instrument};
use utils::lsn::Lsn;
use crate::metrics::RemoteOpFileKind;
use crate::metrics::RemoteOpKind;
use crate::metrics::{MeasureRemoteOp, RemoteTimelineClientMetrics};
use crate::metrics::{
MeasureRemoteOp, RemoteOpFileKind, RemoteOpKind, RemoteTimelineClientMetrics,
REMOTE_ONDEMAND_DOWNLOADED_BYTES, REMOTE_ONDEMAND_DOWNLOADED_LAYERS,
};
use crate::tenant::remote_timeline_client::index::LayerFileMetadata;
use crate::{
config::PageServerConf,
@@ -346,7 +346,7 @@ impl RemoteTimelineClient {
.layer_metadata
.values()
// If we don't have the file size for the layer, don't account for it in the metric.
.map(|ilmd| ilmd.file_size.unwrap_or(0))
.map(|ilmd| ilmd.file_size)
.sum()
} else {
0
@@ -419,33 +419,9 @@ impl RemoteTimelineClient {
.await?
};
// Update the metadata for given layer file. The remote index file
// might be missing some information for the file; this allows us
// to fill in the missing details.
if layer_metadata.file_size().is_none() {
let new_metadata = LayerFileMetadata::new(downloaded_size);
let mut guard = self.upload_queue.lock().unwrap();
let upload_queue = guard.initialized_mut()?;
if let Some(upgraded) = upload_queue.latest_files.get_mut(layer_file_name) {
if upgraded.merge(&new_metadata) {
upload_queue.latest_files_changes_since_metadata_upload_scheduled += 1;
}
// If we don't do an index file upload inbetween here and restart,
// the value will go back down after pageserver restart, since we will
// have lost this data point.
// But, we upload index part fairly frequently, and restart pageserver rarely.
// So, by accounting eagerly, we present a most-of-the-time-more-accurate value sooner.
self.metrics
.remote_physical_size_gauge()
.add(downloaded_size);
} else {
// The file should exist, since we just downloaded it.
warn!(
"downloaded file {:?} not found in local copy of the index file",
layer_file_name
);
}
}
REMOTE_ONDEMAND_DOWNLOADED_LAYERS.inc();
REMOTE_ONDEMAND_DOWNLOADED_BYTES.inc_by(downloaded_size);
Ok(downloaded_size)
}
@@ -545,13 +521,6 @@ impl RemoteTimelineClient {
let mut guard = self.upload_queue.lock().unwrap();
let upload_queue = guard.initialized_mut()?;
// The file size can be missing for files that were created before we tracked that
// in the metadata, but it should be present for any new files we create.
ensure!(
layer_metadata.file_size().is_some(),
"file size not initialized in metadata"
);
upload_queue
.latest_files
.insert(layer_file_name.clone(), layer_metadata.clone());

View File

@@ -21,7 +21,7 @@ use remote_storage::{DownloadError, GenericRemoteStorage};
use utils::crashsafe::path_with_suffix_extension;
use utils::id::{TenantId, TimelineId};
use super::index::{IndexPart, IndexPartUnclean, LayerFileMetadata};
use super::index::{IndexPart, LayerFileMetadata};
use super::{FAILED_DOWNLOAD_RETRIES, FAILED_DOWNLOAD_WARN_THRESHOLD};
async fn fsync_path(path: impl AsRef<std::path::Path>) -> Result<(), std::io::Error> {
@@ -113,16 +113,11 @@ pub async fn download_layer_file<'a>(
})
.map_err(DownloadError::Other)?;
match layer_metadata.file_size() {
Some(expected) if expected != bytes_amount => {
return Err(DownloadError::Other(anyhow!(
"According to layer file metadata should have downloaded {expected} bytes but downloaded {bytes_amount} bytes into file '{}'",
temp_file_path.display()
)));
}
Some(_) | None => {
// matches, or upgrading from an earlier IndexPart version
}
let expected = layer_metadata.file_size();
if expected != bytes_amount {
return Err(DownloadError::Other(anyhow!(
"According to layer file metadata should have downloaded {expected} bytes but downloaded {bytes_amount} bytes into file {temp_file_path:?}",
)));
}
// not using sync_data because it can lose file size update
@@ -261,14 +256,12 @@ pub(super) async fn download_index_part(
)
.await?;
let index_part: IndexPartUnclean = serde_json::from_slice(&index_part_bytes)
let index_part: IndexPart = serde_json::from_slice(&index_part_bytes)
.with_context(|| {
format!("Failed to deserialize index part file into file {index_part_path:?}")
})
.map_err(DownloadError::Other)?;
let index_part = index_part.remove_unclean_layer_file_names();
Ok(index_part)
}

View File

@@ -6,7 +6,6 @@ use std::collections::{HashMap, HashSet};
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
use tracing::warn;
use crate::tenant::metadata::TimelineMetadata;
use crate::tenant::storage_layer::LayerFileName;
@@ -20,7 +19,7 @@ use utils::lsn::Lsn;
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
#[cfg_attr(test, derive(Default))]
pub struct LayerFileMetadata {
file_size: Option<u64>,
file_size: u64,
}
impl From<&'_ IndexLayerMetadata> for LayerFileMetadata {
@@ -33,36 +32,16 @@ impl From<&'_ IndexLayerMetadata> for LayerFileMetadata {
impl LayerFileMetadata {
pub fn new(file_size: u64) -> Self {
LayerFileMetadata {
file_size: Some(file_size),
}
LayerFileMetadata { file_size }
}
/// This is used to initialize the metadata for remote layers, for which
/// the metadata was missing from the index part file.
pub const MISSING: Self = LayerFileMetadata { file_size: None };
pub fn file_size(&self) -> Option<u64> {
pub fn file_size(&self) -> u64 {
self.file_size
}
/// Metadata has holes due to version upgrades. This method is called to upgrade self with the
/// other value.
///
/// This is called on the possibly outdated version. Returns true if any changes
/// were made.
pub fn merge(&mut self, other: &Self) -> bool {
let mut changed = false;
if self.file_size != other.file_size {
self.file_size = other.file_size.or(self.file_size);
changed = true;
}
changed
}
}
// TODO seems like another part of the remote storage file format
// compatibility issue, see https://github.com/neondatabase/neon/issues/3072
/// In-memory representation of an `index_part.json` file
///
/// Contains the data about all files in the timeline, present remotely and its metadata.
@@ -71,10 +50,7 @@ impl LayerFileMetadata {
/// remember to add a test case for the changed version.
#[serde_as]
#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)]
pub struct IndexPartImpl<L>
where
L: std::hash::Hash + PartialEq + Eq,
{
pub struct IndexPart {
/// Debugging aid describing the version of this type.
#[serde(default)]
version: usize,
@@ -82,14 +58,13 @@ where
/// Layer names, which are stored on the remote storage.
///
/// Additional metadata can might exist in `layer_metadata`.
pub timeline_layers: HashSet<L>,
pub timeline_layers: HashSet<LayerFileName>,
/// Per layer file name metadata, which can be present for a present or missing layer file.
///
/// Older versions of `IndexPart` will not have this property or have only a part of metadata
/// that latest version stores.
#[serde(default = "HashMap::default")]
pub layer_metadata: HashMap<L, IndexLayerMetadata>,
pub layer_metadata: HashMap<LayerFileName, IndexLayerMetadata>,
// 'disk_consistent_lsn' is a copy of the 'disk_consistent_lsn' in the metadata.
// It's duplicated here for convenience.
@@ -98,101 +73,6 @@ where
metadata_bytes: Vec<u8>,
}
// TODO seems like another part of the remote storage file format
// compatibility issue, see https://github.com/neondatabase/neon/issues/3072
pub type IndexPart = IndexPartImpl<LayerFileName>;
pub type IndexPartUnclean = IndexPartImpl<UncleanLayerFileName>;
#[derive(Debug, PartialEq, Eq, Hash, Clone)]
pub enum UncleanLayerFileName {
Clean(LayerFileName),
BackupFile(String),
}
impl<'de> serde::Deserialize<'de> for UncleanLayerFileName {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
deserializer.deserialize_string(UncleanLayerFileNameVisitor)
}
}
struct UncleanLayerFileNameVisitor;
impl<'de> serde::de::Visitor<'de> for UncleanLayerFileNameVisitor {
type Value = UncleanLayerFileName;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(
formatter,
"a string that is a valid LayerFileName or '.old' backup file name"
)
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
let maybe_clean: Result<LayerFileName, _> = v.parse();
match maybe_clean {
Ok(clean) => Ok(UncleanLayerFileName::Clean(clean)),
Err(e) => {
if v.ends_with(".old") || v == "metadata_backup" {
Ok(UncleanLayerFileName::BackupFile(v.to_owned()))
} else {
Err(E::custom(e))
}
}
}
}
}
impl UncleanLayerFileName {
fn into_clean(self) -> Option<LayerFileName> {
match self {
UncleanLayerFileName::Clean(clean) => Some(clean),
UncleanLayerFileName::BackupFile(_) => None,
}
}
}
impl IndexPartUnclean {
pub fn remove_unclean_layer_file_names(self) -> IndexPart {
let IndexPartUnclean {
version,
timeline_layers,
layer_metadata,
disk_consistent_lsn,
metadata_bytes,
} = self;
IndexPart {
version,
timeline_layers: timeline_layers
.into_iter()
.filter_map(|unclean_file_name| match unclean_file_name {
UncleanLayerFileName::Clean(clean_name) => Some(clean_name),
UncleanLayerFileName::BackupFile(backup_file_name) => {
// For details see https://github.com/neondatabase/neon/issues/3024
warn!(
"got backup file on the remote storage, ignoring it {backup_file_name}"
);
None
}
})
.collect(),
layer_metadata: layer_metadata
.into_iter()
.filter_map(|(l, m)| l.into_clean().map(|l| (l, m)))
.collect(),
disk_consistent_lsn,
metadata_bytes,
}
}
}
impl IndexPart {
/// When adding or modifying any parts of `IndexPart`, increment the version so that it can be
/// used to understand later versions.
@@ -232,7 +112,7 @@ impl IndexPart {
/// Serialized form of [`LayerFileMetadata`].
#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, Default)]
pub struct IndexLayerMetadata {
pub(super) file_size: Option<u64>,
pub(super) file_size: u64,
}
impl From<&'_ LayerFileMetadata> for IndexLayerMetadata {
@@ -247,27 +127,6 @@ impl From<&'_ LayerFileMetadata> for IndexLayerMetadata {
mod tests {
use super::*;
#[test]
fn v0_indexpart_is_parsed() {
let example = r#"{
"timeline_layers":["000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9"],
"disk_consistent_lsn":"0/16960E8",
"metadata_bytes":[113,11,159,210,0,54,0,4,0,0,0,0,1,105,96,232,1,0,0,0,0,1,105,96,112,0,0,0,0,0,0,0,0,0,0,0,0,0,1,105,96,112,0,0,0,0,1,105,96,112,0,0,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
}"#;
let expected = IndexPart {
version: 0,
timeline_layers: HashSet::from(["000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap()]),
layer_metadata: HashMap::default(),
disk_consistent_lsn: "0/16960E8".parse::<Lsn>().unwrap(),
metadata_bytes: [113,11,159,210,0,54,0,4,0,0,0,0,1,105,96,232,1,0,0,0,0,1,105,96,112,0,0,0,0,0,0,0,0,0,0,0,0,0,1,105,96,112,0,0,0,0,1,105,96,112,0,0,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].to_vec(),
};
let part: IndexPartUnclean = serde_json::from_str(example).unwrap();
let part = part.remove_unclean_layer_file_names();
assert_eq!(part, expected);
}
#[test]
fn v1_indexpart_is_parsed() {
let example = r#"{
@@ -287,21 +146,19 @@ mod tests {
timeline_layers: HashSet::from(["000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap()]),
layer_metadata: HashMap::from([
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap(), IndexLayerMetadata {
file_size: Some(25600000),
file_size: 25600000,
}),
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016B59D8-00000000016B5A51".parse().unwrap(), IndexLayerMetadata {
// serde_json should always parse this but this might be a double with jq for
// example.
file_size: Some(9007199254741001),
file_size: 9007199254741001,
})
]),
disk_consistent_lsn: "0/16960E8".parse::<Lsn>().unwrap(),
metadata_bytes: [113,11,159,210,0,54,0,4,0,0,0,0,1,105,96,232,1,0,0,0,0,1,105,96,112,0,0,0,0,0,0,0,0,0,0,0,0,0,1,105,96,112,0,0,0,0,1,105,96,112,0,0,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].to_vec(),
};
let part = serde_json::from_str::<IndexPartUnclean>(example)
.unwrap()
.remove_unclean_layer_file_names();
let part = serde_json::from_str::<IndexPart>(example).unwrap();
assert_eq!(part, expected);
}
@@ -325,20 +182,64 @@ mod tests {
timeline_layers: HashSet::from(["000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap()]),
layer_metadata: HashMap::from([
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9".parse().unwrap(), IndexLayerMetadata {
file_size: Some(25600000),
file_size: 25600000,
}),
("000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016B59D8-00000000016B5A51".parse().unwrap(), IndexLayerMetadata {
// serde_json should always parse this but this might be a double with jq for
// example.
file_size: Some(9007199254741001),
file_size: 9007199254741001,
})
]),
disk_consistent_lsn: "0/16960E8".parse::<Lsn>().unwrap(),
metadata_bytes: [112,11,159,210,0,54,0,4,0,0,0,0,1,105,96,232,1,0,0,0,0,1,105,96,112,0,0,0,0,0,0,0,0,0,0,0,0,0,1,105,96,112,0,0,0,0,1,105,96,112,0,0,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].to_vec(),
};
let part = serde_json::from_str::<IndexPartUnclean>(example).unwrap();
let part = part.remove_unclean_layer_file_names();
let part = serde_json::from_str::<IndexPart>(example).unwrap();
assert_eq!(part, expected);
}
#[test]
fn empty_layers_are_parsed() {
let empty_layers_json = r#"{
"version":1,
"timeline_layers":[],
"layer_metadata":{},
"disk_consistent_lsn":"0/2532648",
"metadata_bytes":[136,151,49,208,0,70,0,4,0,0,0,0,2,83,38,72,1,0,0,0,0,2,83,38,32,1,87,198,240,135,97,119,45,125,38,29,155,161,140,141,255,210,0,0,0,0,2,83,38,72,0,0,0,0,1,73,240,192,0,0,0,0,1,73,240,192,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
}"#;
let expected = IndexPart {
version: 1,
timeline_layers: HashSet::new(),
layer_metadata: HashMap::new(),
disk_consistent_lsn: "0/2532648".parse::<Lsn>().unwrap(),
metadata_bytes: [
136, 151, 49, 208, 0, 70, 0, 4, 0, 0, 0, 0, 2, 83, 38, 72, 1, 0, 0, 0, 0, 2, 83,
38, 32, 1, 87, 198, 240, 135, 97, 119, 45, 125, 38, 29, 155, 161, 140, 141, 255,
210, 0, 0, 0, 0, 2, 83, 38, 72, 0, 0, 0, 0, 1, 73, 240, 192, 0, 0, 0, 0, 1, 73,
240, 192, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0,
]
.to_vec(),
};
let empty_layers_parsed = serde_json::from_str::<IndexPart>(empty_layers_json).unwrap();
assert_eq!(empty_layers_parsed, expected);
}
}

View File

@@ -64,13 +64,9 @@ pub(super) async fn upload_timeline_layer<'a>(
})?
.len();
// FIXME: this looks bad
if let Some(metadata_size) = known_metadata.file_size() {
if metadata_size != fs_size {
bail!("File {source_path:?} has its current FS size {fs_size} diferent from initially determined {metadata_size}");
}
} else {
// this is a silly state we would like to avoid
let metadata_size = known_metadata.file_size();
if metadata_size != fs_size {
bail!("File {source_path:?} has its current FS size {fs_size} diferent from initially determined {metadata_size}");
}
let fs_size = usize::try_from(fs_size).with_context(|| {

View File

@@ -378,7 +378,7 @@ pub trait PersistentLayer: Layer {
///
/// Should not change over the lifetime of the layer object because
/// current_physical_size is computed as the som of this value.
fn file_size(&self) -> Option<u64>;
fn file_size(&self) -> u64;
fn info(&self, reset: LayerAccessStatsReset) -> HistoricLayerInfo;

View File

@@ -444,8 +444,8 @@ impl PersistentLayer for DeltaLayer {
Ok(())
}
fn file_size(&self) -> Option<u64> {
Some(self.file_size)
fn file_size(&self) -> u64 {
self.file_size
}
fn info(&self, reset: LayerAccessStatsReset) -> HistoricLayerInfo {
@@ -456,7 +456,7 @@ impl PersistentLayer for DeltaLayer {
HistoricLayerInfo::Delta {
layer_file_name,
layer_file_size: Some(self.file_size),
layer_file_size: self.file_size,
lsn_start: lsn_range.start,
lsn_end: lsn_range.end,
remote: false,

View File

@@ -258,6 +258,15 @@ impl serde::Serialize for LayerFileName {
}
}
impl<'de> serde::Deserialize<'de> for LayerFileName {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
deserializer.deserialize_string(LayerFileNameVisitor)
}
}
struct LayerFileNameVisitor;
impl<'de> serde::de::Visitor<'de> for LayerFileNameVisitor {

View File

@@ -258,8 +258,8 @@ impl PersistentLayer for ImageLayer {
Ok(())
}
fn file_size(&self) -> Option<u64> {
Some(self.file_size)
fn file_size(&self) -> u64 {
self.file_size
}
fn info(&self, reset: LayerAccessStatsReset) -> HistoricLayerInfo {
@@ -268,7 +268,7 @@ impl PersistentLayer for ImageLayer {
HistoricLayerInfo::Image {
layer_file_name,
layer_file_size: Some(self.file_size),
layer_file_size: self.file_size,
lsn_start: lsn_range.start,
remote: false,
access_stats: self.access_stats.as_api_model(reset),

View File

@@ -167,7 +167,7 @@ impl PersistentLayer for RemoteLayer {
true
}
fn file_size(&self) -> Option<u64> {
fn file_size(&self) -> u64 {
self.layer_metadata.file_size()
}

View File

@@ -2,6 +2,7 @@
mod eviction_task;
mod walreceiver;
mod layer_trace;
use anyhow::{anyhow, bail, ensure, Context};
use bytes::Bytes;
@@ -19,8 +20,7 @@ use tracing::*;
use utils::id::TenantTimelineId;
use std::cmp::{max, min, Ordering};
use std::collections::BinaryHeap;
use std::collections::HashMap;
use std::collections::{BinaryHeap, HashMap};
use std::fs;
use std::ops::{Deref, Range};
use std::path::{Path, PathBuf};
@@ -115,6 +115,17 @@ pub struct Timeline {
pub(super) layers: RwLock<LayerMap<dyn PersistentLayer>>,
/// Set of key ranges which should be covered by image layers to
/// allow GC to remove old layers. This set is created by GC and its cutoff LSN is also stored.
/// It is used by compaction task when it checks if new image layer should be created.
/// Newly created image layer doesn't help to remove the delta layer, until the
/// newly created image layer falls off the PITR horizon. So on next GC cycle,
/// gc_timeline may still want the new image layer to be created. To avoid redundant
/// image layers creation we should check if image layer exists but beyond PITR horizon.
/// This is why we need remember GC cutoff LSN.
///
wanted_image_layers: Mutex<Option<(Lsn, KeySpace)>>,
last_freeze_at: AtomicLsn,
// Atomic would be more appropriate here.
last_freeze_ts: RwLock<Instant>,
@@ -216,6 +227,8 @@ pub struct Timeline {
download_all_remote_layers_task_info: RwLock<Option<DownloadRemoteLayersTaskInfo>>,
state: watch::Sender<TimelineState>,
layer_trace_file: Mutex<Option<std::fs::File>>,
}
/// Internal structure to hold all data needed for logical size calculation.
@@ -312,7 +325,7 @@ impl LogicalSize {
// we change the type.
match self.initial_logical_size.get() {
Some(initial_size) => {
initial_size.checked_add_signed(size_increment)
initial_size.checked_add(size_increment.try_into().unwrap())
.with_context(|| format!("Overflow during logical size calculation, initial_size: {initial_size}, size_increment: {size_increment}"))
.map(CurrentLogicalSize::Exact)
}
@@ -334,25 +347,6 @@ impl LogicalSize {
}
}
/// Returned by [`Timeline::layer_size_sum`]
pub enum LayerSizeSum {
/// The result is accurate.
Accurate(u64),
// We don't know the layer file size of one or more layers.
// They contribute to the sum with a value of 0.
// Hence, the sum is a lower bound for the actualy layer file size sum.
ApproximateLowerBound(u64),
}
impl LayerSizeSum {
pub fn approximate_is_ok(self) -> u64 {
match self {
LayerSizeSum::Accurate(v) => v,
LayerSizeSum::ApproximateLowerBound(v) => v,
}
}
}
pub struct WalReceiverInfo {
pub wal_source_connconf: PgConnectionConfig,
pub last_received_msg_lsn: Lsn,
@@ -550,20 +544,13 @@ impl Timeline {
/// The sum of the file size of all historic layers in the layer map.
/// This method makes no distinction between local and remote layers.
/// Hence, the result **does not represent local filesystem usage**.
pub fn layer_size_sum(&self) -> LayerSizeSum {
pub fn layer_size_sum(&self) -> u64 {
let layer_map = self.layers.read().unwrap();
let mut size = 0;
let mut no_size_cnt = 0;
for l in layer_map.iter_historic_layers() {
let (l_size, l_no_size) = l.file_size().map(|s| (s, 0)).unwrap_or((0, 1));
size += l_size;
no_size_cnt += l_no_size;
}
if no_size_cnt == 0 {
LayerSizeSum::Accurate(size)
} else {
LayerSizeSum::ApproximateLowerBound(size)
size += l.file_size();
}
size
}
pub fn get_resident_physical_size(&self) -> u64 {
@@ -883,6 +870,7 @@ impl Timeline {
}
pub fn activate(self: &Arc<Self>) {
self.start_layer_tracing();
self.set_state(TimelineState::Active);
self.launch_wal_receiver();
self.launch_eviction_task();
@@ -1047,9 +1035,23 @@ impl Timeline {
return Ok(false);
}
let layer_file_size = local_layer
.file_size()
.expect("Local layer should have a file size");
let layer_file_size = local_layer.file_size();
let local_layer_mtime = local_layer
.local_path()
.expect("local layer should have a local path")
.metadata()
.context("get local layer file stat")?
.modified()
.context("get mtime of layer file")?;
let local_layer_residence_duration =
match SystemTime::now().duration_since(local_layer_mtime) {
Err(e) => {
warn!("layer mtime is in the future: {}", e);
None
}
Ok(delta) => Some(delta),
};
let layer_metadata = LayerFileMetadata::new(layer_file_size);
@@ -1092,6 +1094,15 @@ impl Timeline {
self.metrics
.resident_physical_size_gauge
.sub(layer_file_size);
self.trace_layer_evict(&local_layer.filename());
self.metrics.evictions.inc();
if let Some(delta) = local_layer_residence_duration {
self.metrics
.evictions_with_low_residence_duration
.observe(delta);
}
true
}
@@ -1167,7 +1178,7 @@ impl Timeline {
pub(super) fn new(
conf: &'static PageServerConf,
tenant_conf: Arc<RwLock<TenantConfOpt>>,
metadata: TimelineMetadata,
metadata: &TimelineMetadata,
ancestor: Option<Arc<Timeline>>,
timeline_id: TimelineId,
tenant_id: TenantId,
@@ -1190,6 +1201,7 @@ impl Timeline {
tenant_id,
pg_version,
layers: RwLock::new(LayerMap::default()),
wanted_image_layers: Mutex::new(None),
walredo_mgr,
@@ -1208,7 +1220,14 @@ impl Timeline {
ancestor_timeline: ancestor,
ancestor_lsn: metadata.ancestor_lsn(),
metrics: TimelineMetrics::new(&tenant_id, &timeline_id),
metrics: TimelineMetrics::new(
&tenant_id,
&timeline_id,
crate::metrics::EvictionsWithLowResidenceDurationBuilder::new(
"mtime",
conf.evictions_low_residence_duration_metric_threshold,
),
),
flush_loop_state: Mutex::new(FlushLoopState::NotStarted),
@@ -1245,6 +1264,8 @@ impl Timeline {
download_all_remote_layers_task_info: RwLock::new(None),
state,
layer_trace_file: Mutex::new(None),
};
result.repartition_threshold = result.get_checkpoint_distance() / 10;
result
@@ -1335,6 +1356,7 @@ impl Timeline {
lagging_wal_timeout,
max_lsn_wal_lag,
crate::config::SAFEKEEPER_AUTH_TOKEN.get().cloned(),
self.conf.availability_zone.clone(),
background_ctx,
);
}
@@ -1482,7 +1504,12 @@ impl Timeline {
.layer_metadata
.get(remote_layer_name)
.map(LayerFileMetadata::from)
.unwrap_or(LayerFileMetadata::MISSING);
.with_context(|| {
format!(
"No remote layer metadata found for layer {}",
remote_layer_name.file_name()
)
})?;
// Is the local layer's size different from the size stored in the
// remote index file?
@@ -1498,34 +1525,27 @@ impl Timeline {
local_layer_path.display()
);
if let Some(remote_size) = remote_layer_metadata.file_size() {
let metadata = local_layer_path.metadata().with_context(|| {
format!(
"get file size of local layer {}",
local_layer_path.display()
)
})?;
let local_size = metadata.len();
if local_size != remote_size {
warn!("removing local file {local_layer_path:?} because it has unexpected length {local_size}; length in remote index is {remote_size}");
if let Err(err) = rename_to_backup(&local_layer_path) {
assert!(local_layer_path.exists(), "we would leave the local_layer without a file if this does not hold: {}", local_layer_path.display());
anyhow::bail!("could not rename file {local_layer_path:?}: {err:?}");
} else {
self.metrics.resident_physical_size_gauge.sub(local_size);
updates.remove_historic(local_layer);
// fall-through to adding the remote layer
}
let remote_size = remote_layer_metadata.file_size();
let metadata = local_layer_path.metadata().with_context(|| {
format!(
"get file size of local layer {}",
local_layer_path.display()
)
})?;
let local_size = metadata.len();
if local_size != remote_size {
warn!("removing local file {local_layer_path:?} because it has unexpected length {local_size}; length in remote index is {remote_size}");
if let Err(err) = rename_to_backup(&local_layer_path) {
assert!(local_layer_path.exists(), "we would leave the local_layer without a file if this does not hold: {}", local_layer_path.display());
anyhow::bail!("could not rename file {local_layer_path:?}: {err:?}");
} else {
debug!(
"layer is present locally and file size matches remote, using it: {}",
local_layer_path.display()
);
continue;
self.metrics.resident_physical_size_gauge.sub(local_size);
updates.remove_historic(local_layer);
// fall-through to adding the remote layer
}
} else {
debug!(
"layer is present locally and remote does not have file size, using it: {}",
"layer is present locally and file size matches remote, using it: {}",
local_layer_path.display()
);
continue;
@@ -1627,6 +1647,8 @@ impl Timeline {
.map(|l| (l.filename(), l))
.collect::<HashMap<_, _>>();
// If no writes happen, new branches do not have any layers, only the metadata file.
let has_local_layers = !local_layers.is_empty();
let local_only_layers = match index_part {
Some(index_part) => {
info!(
@@ -1644,21 +1666,40 @@ impl Timeline {
}
};
// Are there local files that don't exist remotely? Schedule uploads for them
for (layer_name, layer) in &local_only_layers {
// XXX solve this in the type system
let layer_path = layer
.local_path()
.expect("local_only_layers only contains local layers");
let layer_size = layer_path
.metadata()
.with_context(|| format!("failed to get file {layer_path:?} metadata"))?
.len();
info!("scheduling {layer_path:?} for upload");
remote_client
.schedule_layer_file_upload(layer_name, &LayerFileMetadata::new(layer_size))?;
if has_local_layers {
// Are there local files that don't exist remotely? Schedule uploads for them.
// Local timeline metadata will get uploaded to remove along witht he layers.
for (layer_name, layer) in &local_only_layers {
// XXX solve this in the type system
let layer_path = layer
.local_path()
.expect("local_only_layers only contains local layers");
let layer_size = layer_path
.metadata()
.with_context(|| format!("failed to get file {layer_path:?} metadata"))?
.len();
info!("scheduling {layer_path:?} for upload");
remote_client
.schedule_layer_file_upload(layer_name, &LayerFileMetadata::new(layer_size))?;
}
remote_client.schedule_index_upload_for_file_changes()?;
} else if index_part.is_none() {
// No data on the remote storage, no local layers, local metadata file.
//
// TODO https://github.com/neondatabase/neon/issues/3865
// Currently, console does not wait for the timeline data upload to the remote storage
// and considers the timeline created, expecting other pageserver nodes to work with it.
// Branch metadata upload could get interrupted (e.g pageserver got killed),
// hence any locally existing branch metadata with no remote counterpart should be uploaded,
// otherwise any other pageserver won't see the branch on `attach`.
//
// After the issue gets implemented, pageserver should rather remove the branch,
// since absence on S3 means we did not acknowledge the branch creation and console will have to retry,
// no need to keep the old files.
remote_client.schedule_index_upload_for_metadata_update(up_to_date_metadata)?;
} else {
// Local timeline has a metadata file, remote one too, both have no layers to sync.
}
remote_client.schedule_index_upload_for_file_changes()?;
info!("Done");
@@ -1724,7 +1765,7 @@ impl Timeline {
.size_added_after_initial
.load(AtomicOrdering::Relaxed);
let sum = calculated_size.saturating_add_signed(added);
let sum = calculated_size.saturating_add(added.try_into().unwrap());
// set the gauge value before it can be set in `update_current_logical_size`.
self_clone.metrics.current_logical_size_gauge.set(sum);
@@ -1952,9 +1993,7 @@ impl Timeline {
) -> anyhow::Result<()> {
if !layer.is_remote_layer() {
layer.delete_resident_layer_file()?;
let layer_file_size = layer
.file_size()
.expect("Local layer should have a file size");
let layer_file_size = layer.file_size();
self.metrics
.resident_physical_size_gauge
.sub(layer_file_size);
@@ -2607,6 +2646,8 @@ impl Timeline {
self.conf.timeline_path(&self.timeline_id, &self.tenant_id),
])?;
self.trace_layer_flush(&new_delta.filename());
// Add it to the layer map
self.layers
.write()
@@ -2662,6 +2703,30 @@ impl Timeline {
let layers = self.layers.read().unwrap();
let mut max_deltas = 0;
let wanted_image_layers = self.wanted_image_layers.lock().unwrap();
if let Some((cutoff_lsn, wanted)) = &*wanted_image_layers {
let img_range =
partition.ranges.first().unwrap().start..partition.ranges.last().unwrap().end;
if wanted.overlaps(&img_range) {
//
// gc_timeline only pays attention to image layers that are older than the GC cutoff,
// but create_image_layers creates image layers at last-record-lsn.
// So it's possible that gc_timeline decides that it wants new image layer to be created for a key range,
// and on next compcation create_image_layers creates the image layer.
// But on next GC cycle, gc_timeline still wantes the new image layer to be created,
// because the newly created image layer doesn't help to remove the delta layer,
// until the newly created image layer falls off the PITR horizon.
//
// So we should check if image layer beyond cutoff LSN already exists.
if !layers.image_layer_exists(&img_range, &(*cutoff_lsn..lsn))? {
debug!(
"Force generation of layer {}-{} wanted by GC)",
img_range.start, img_range.end
);
return Ok(true);
}
}
}
for part_range in &partition.ranges {
let image_coverage = layers.image_coverage(part_range, lsn)?;
@@ -2715,10 +2780,22 @@ impl Timeline {
) -> Result<HashMap<LayerFileName, LayerFileMetadata>, PageReconstructError> {
let timer = self.metrics.create_images_time_histo.start_timer();
let mut image_layers: Vec<ImageLayer> = Vec::new();
// We need to avoid holes between generated image layers.
// Otherwise LayerMap::image_layer_exists will return false if key range of some layer is covered by more than one
// image layer with hole between them. In this case such layer can not be utilized by GC.
//
// How such hole between partitions can appear?
// if we have relation with relid=1 and size 100 and relation with relid=2 with size 200 then result of
// KeySpace::partition may contain partitions <100000000..100000099> and <200000000..200000199>.
// If there is delta layer <100000000..300000000> then it never be garbage collected because
// image layers <100000000..100000099> and <200000000..200000199> are not completely covering it.
let mut start = Key::MIN;
for partition in partitioning.parts.iter() {
let img_range = start..partition.ranges.last().unwrap().end;
start = img_range.end;
if force || self.time_for_new_image_layer(partition, lsn)? {
let img_range =
partition.ranges.first().unwrap().start..partition.ranges.last().unwrap().end;
let mut image_layer_writer = ImageLayerWriter::new(
self.conf,
self.timeline_id,
@@ -2732,7 +2809,6 @@ impl Timeline {
"failpoint image-layer-writer-fail-before-finish"
)))
});
for range in &partition.ranges {
let mut key = range.start;
while key < range.end {
@@ -2770,6 +2846,11 @@ impl Timeline {
image_layers.push(image_layer);
}
}
// All wanted layers are taken in account by time_for_new_image_layer.
// The wanted_image_layers could get updated out of turn and we could
// clear something which hasn't been looked at all. This is fine, because
// next gc round any wanted would get added back in.
*self.wanted_image_layers.lock().unwrap() = None;
// Sync the new layer to disk before adding it to the layer map, to make sure
// we don't garbage collect something based on the new layer, before it has
@@ -2806,6 +2887,7 @@ impl Timeline {
self.metrics
.resident_physical_size_gauge
.add(metadata.len());
self.trace_layer_image_create(&l.filename());
updates.insert_historic(Arc::new(l));
}
updates.flush();
@@ -3147,9 +3229,7 @@ impl Timeline {
}
fail_point!("delta-layer-writer-fail-before-finish", |_| {
return Err(
anyhow::anyhow!("failpoint delta-layer-writer-fail-before-finish").into(),
);
Err(anyhow::anyhow!("failpoint delta-layer-writer-fail-before-finish").into())
});
writer.as_mut().unwrap().put_value(key, lsn, value)?;
@@ -3238,6 +3318,7 @@ impl Timeline {
self.metrics
.resident_physical_size_gauge
.add(metadata.len());
self.trace_layer_compact_create(&l.filename());
new_layer_paths.insert(new_delta_path, LayerFileMetadata::new(metadata.len()));
let x: Arc<dyn PersistentLayer + 'static> = Arc::new(l);
@@ -3248,6 +3329,7 @@ impl Timeline {
// delete the old ones
let mut layer_names_to_delete = Vec::with_capacity(deltas_to_compact.len());
for l in deltas_to_compact {
self.trace_layer_compact_delete(&l.filename());
layer_names_to_delete.push(l.filename());
self.delete_historic_layer(layer_removal_cs, l, &mut updates)?;
}
@@ -3444,6 +3526,8 @@ impl Timeline {
info!("GC starting");
self.trace_gc_start(new_gc_cutoff);
debug!("retain_lsns: {:?}", retain_lsns);
// Before deleting any layers, we need to wait for their upload ops to finish.
@@ -3458,6 +3542,7 @@ impl Timeline {
}
let mut layers_to_remove = Vec::new();
let mut wanted_image_layers = KeySpace::default();
// Scan all layers in the timeline (remote or on-disk).
//
@@ -3541,6 +3626,15 @@ impl Timeline {
"keeping {} because it is the latest layer",
l.filename().file_name()
);
// Collect delta key ranges that need image layers to allow garbage
// collecting the layers.
// It is not so obvious whether we need to propagate information only about
// delta layers. Image layers can form "stairs" preventing old image from been deleted.
// But image layers are in any case less sparse than delta layers. Also we need some
// protection from replacing recent image layers with new one after each GC iteration.
if l.is_incremental() && !LayerMap::is_l0(&*l) {
wanted_image_layers.add_range(l.get_key_range());
}
result.layers_not_updated += 1;
continue 'outer;
}
@@ -3553,6 +3647,10 @@ impl Timeline {
);
layers_to_remove.push(Arc::clone(&l));
}
self.wanted_image_layers
.lock()
.unwrap()
.replace((new_gc_cutoff, wanted_image_layers));
let mut updates = layers.batch_update();
if !layers_to_remove.is_empty() {
@@ -3567,6 +3665,7 @@ impl Timeline {
{
for doomed_layer in layers_to_remove {
layer_names_to_delete.push(doomed_layer.filename());
self.trace_layer_gc_delete(&doomed_layer.filename());
self.delete_historic_layer(layer_removal_cs, doomed_layer, &mut updates)?; // FIXME: schedule succeeded deletions before returning?
result.layers_removed += 1;
}

View File

@@ -1,5 +1,18 @@
//! The per-timeline layer eviction task.
//! The per-timeline layer eviction task, which evicts data which has not been accessed for more
//! than a given threshold.
//!
//! Data includes all kinds of caches, namely:
//! - (in-memory layers)
//! - on-demand downloaded layer files on disk
//! - (cached layer file pages)
//! - derived data from layer file contents, namely:
//! - initial logical size
//! - partitioning
//! - (other currently missing unknowns)
//!
//! Items with parentheses are not (yet) touched by this task.
//!
//! See write-up on restart on-demand download spike: <https://gist.github.com/problame/2265bf7b8dc398be834abfead36c76b5>
use std::{
ops::ControlFlow,
sync::Arc,
@@ -12,6 +25,7 @@ use tokio_util::sync::CancellationToken;
use tracing::{debug, error, info, instrument, warn};
use crate::{
context::{DownloadBehavior, RequestContext},
task_mgr::{self, TaskKind, BACKGROUND_RUNTIME},
tenant::{
config::{EvictionPolicy, EvictionPolicyLayerAccessThreshold},
@@ -54,9 +68,10 @@ impl Timeline {
}
}
let ctx = RequestContext::new(TaskKind::Eviction, DownloadBehavior::Warn);
loop {
let policy = self.get_eviction_policy();
let cf = self.eviction_iteration(&policy, cancel.clone()).await;
let cf = self.eviction_iteration(&policy, &cancel, &ctx).await;
match cf {
ControlFlow::Break(()) => break,
@@ -77,7 +92,8 @@ impl Timeline {
async fn eviction_iteration(
self: &Arc<Self>,
policy: &EvictionPolicy,
cancel: CancellationToken,
cancel: &CancellationToken,
ctx: &RequestContext,
) -> ControlFlow<(), Instant> {
debug!("eviction iteration: {policy:?}");
match policy {
@@ -87,7 +103,7 @@ impl Timeline {
}
EvictionPolicy::LayerAccessThreshold(p) => {
let start = Instant::now();
match self.eviction_iteration_threshold(p, cancel).await {
match self.eviction_iteration_threshold(p, cancel, ctx).await {
ControlFlow::Break(()) => return ControlFlow::Break(()),
ControlFlow::Continue(()) => (),
}
@@ -101,7 +117,8 @@ impl Timeline {
async fn eviction_iteration_threshold(
self: &Arc<Self>,
p: &EvictionPolicyLayerAccessThreshold,
cancel: CancellationToken,
cancel: &CancellationToken,
ctx: &RequestContext,
) -> ControlFlow<()> {
let now = SystemTime::now();
@@ -114,6 +131,20 @@ impl Timeline {
not_evictable: usize,
skipped_for_shutdown: usize,
}
// what we want is to invalidate any caches which haven't been accessed for `p.threshold`,
// but we cannot actually do it for current limitations except by restarting pageserver. we
// just recompute the values which would be recomputed on startup.
//
// for active tenants this will likely materialized page cache or in-memory layers. for
// inactive tenants it will refresh the last_access timestamps so that we will not evict
// and re-download on restart these layers.
self.refresh_layers_required_in_restart(cancel, ctx).await;
if cancel.is_cancelled() {
return ControlFlow::Break(());
}
let mut stats = EvictionStats::default();
// Gather layers for eviction.
// NB: all the checks can be invalidated as soon as we release the layer map lock.
@@ -174,7 +205,7 @@ impl Timeline {
};
let results = match self
.evict_layer_batch(remote_client, &candidates[..], cancel)
.evict_layer_batch(remote_client, &candidates[..], cancel.clone())
.await
{
Err(pre_err) => {
@@ -216,4 +247,40 @@ impl Timeline {
}
ControlFlow::Continue(())
}
/// Recompute the values which would cause on-demand downloads during restart.
async fn refresh_layers_required_in_restart(
&self,
cancel: &CancellationToken,
ctx: &RequestContext,
) {
let lsn = self.get_last_record_lsn();
// imitiate on-restart initial logical size
let size = self.calculate_logical_size(lsn, cancel.clone(), ctx).await;
match &size {
Ok(_size) => {
// good, don't log it to avoid confusion
}
Err(_) => {
// we have known issues for which we already log this on consumption metrics,
// gc, and compaction. leave logging out for now.
//
// https://github.com/neondatabase/neon/issues/2539
}
}
// imitiate repartiting on first compactation
if let Err(e) = self.collect_keyspace(lsn, ctx).await {
// if this failed, we probably failed logical size because these use the same keys
if size.is_err() {
// ignore, see above comment
} else {
warn!(
"failed to collect keyspace but succeeded in calculating logical size: {e:#}"
);
}
}
}
}

View File

@@ -0,0 +1,81 @@
use crate::tenant::timeline::LayerFileName;
use crate::tenant::Timeline;
use std::io::Write;
use std::time::UNIX_EPOCH;
use tracing::*;
use std::fs::File;
use utils::lsn::Lsn;
impl Timeline {
pub(super) fn start_layer_tracing(&self) {
let timeline_path = self.conf.timeline_path(&self.timeline_id, &self.tenant_id);
let path = timeline_path.join("layer_trace");
match File::options()
.create(true)
.append(true)
.open(&path)
{
Ok(file) => {
info!("enabled layer tracing");
self.layer_trace_file.lock().unwrap().replace(file);
},
Err(e) => {
warn!("could not open layer tracing file \"{}\": {}", path.display(), e);
}
}
}
fn trace_op(&self, op: &str, filename: &str) {
let opt_out = &self.layer_trace_file.lock().unwrap();
if let Some(mut out) = opt_out.as_ref() {
if let Ok(elapsed) = UNIX_EPOCH.elapsed() {
let time = elapsed.as_millis();
let _ = writeln!(out, "{{ \"time\": {time}, \"op\": \"{op}\", \"filename\": \"{filename}\"}}");
}
else {
warn!("could not get current timestamp");
}
}
}
pub(super) fn trace_layer_evict(&self, filename: &LayerFileName) {
self.trace_op("evict", &filename.file_name())
}
pub(super) fn trace_layer_flush(&self, filename: &LayerFileName) {
self.trace_op("flush", &filename.file_name())
}
pub(super) fn trace_layer_compact_create(&self, filename: &LayerFileName) {
self.trace_op("compact_create", &filename.file_name())
}
pub(super) fn trace_layer_compact_delete(&self, filename: &LayerFileName) {
self.trace_op("compact_delete", &filename.file_name())
}
pub(super) fn trace_layer_image_create(&self, filename: &LayerFileName) {
self.trace_op("image_create", &filename.file_name())
}
pub(super) fn trace_layer_gc_delete(&self, filename: &LayerFileName) {
self.trace_op("gc_delete", &filename.file_name())
}
// TODO: also report 'retain_lsns'
pub(super) fn trace_gc_start(&self, cutoff_lsn: Lsn) {
let opt_out = &self.layer_trace_file.lock().unwrap();
if let Some(mut out) = opt_out.as_ref() {
if let Ok(elapsed) = UNIX_EPOCH.elapsed() {
let time = elapsed.as_millis();
let _ = writeln!(out, "{{ \"time\": {time}, \"op\": \"gc_start\", \"cutoff\": \"{cutoff_lsn}\"}}");
}
else {
warn!("could not get current timestamp");
}
}
}
}

View File

@@ -45,6 +45,7 @@ pub fn spawn_connection_manager_task(
lagging_wal_timeout: Duration,
max_lsn_wal_lag: NonZeroU64,
auth_token: Option<Arc<String>>,
availability_zone: Option<String>,
ctx: RequestContext,
) {
let mut broker_client = get_broker_client().clone();
@@ -67,6 +68,7 @@ pub fn spawn_connection_manager_task(
lagging_wal_timeout,
max_lsn_wal_lag,
auth_token,
availability_zone,
);
loop {
select! {
@@ -334,6 +336,7 @@ struct WalreceiverState {
/// Data about all timelines, available for connection, fetched from storage broker, grouped by their corresponding safekeeper node id.
wal_stream_candidates: HashMap<NodeId, BrokerSkTimeline>,
auth_token: Option<Arc<String>>,
availability_zone: Option<String>,
}
/// Current connection data.
@@ -381,6 +384,7 @@ impl WalreceiverState {
lagging_wal_timeout: Duration,
max_lsn_wal_lag: NonZeroU64,
auth_token: Option<Arc<String>>,
availability_zone: Option<String>,
) -> Self {
let id = TenantTimelineId {
tenant_id: timeline.tenant_id,
@@ -396,6 +400,7 @@ impl WalreceiverState {
wal_stream_candidates: HashMap::new(),
wal_connection_retries: HashMap::new(),
auth_token,
availability_zone,
}
}
@@ -740,6 +745,7 @@ impl WalreceiverState {
None => None,
Some(x) => Some(x),
},
self.availability_zone.as_deref(),
) {
Ok(connstr) => Some((*sk_id, info, connstr)),
Err(e) => {
@@ -824,17 +830,24 @@ fn wal_stream_connection_config(
}: TenantTimelineId,
listen_pg_addr_str: &str,
auth_token: Option<&str>,
availability_zone: Option<&str>,
) -> anyhow::Result<PgConnectionConfig> {
let (host, port) =
parse_host_port(listen_pg_addr_str).context("Unable to parse listen_pg_addr_str")?;
let port = port.unwrap_or(5432);
Ok(PgConnectionConfig::new_host_port(host, port)
let mut connstr = PgConnectionConfig::new_host_port(host, port)
.extend_options([
"-c".to_owned(),
format!("timeline_id={}", timeline_id),
format!("tenant_id={}", tenant_id),
])
.set_password(auth_token.map(|s| s.to_owned())))
.set_password(auth_token.map(|s| s.to_owned()));
if let Some(availability_zone) = availability_zone {
connstr = connstr.extend_options([format!("availability_zone={}", availability_zone)]);
}
Ok(connstr)
}
#[cfg(test)]
@@ -1273,6 +1286,7 @@ mod tests {
wal_stream_candidates: HashMap::new(),
wal_connection_retries: HashMap::new(),
auth_token: None,
availability_zone: None,
}
}
}

View File

@@ -127,12 +127,21 @@ impl UploadQueue {
let mut files = HashMap::with_capacity(index_part.timeline_layers.len());
for layer_name in &index_part.timeline_layers {
let layer_metadata = index_part
match index_part
.layer_metadata
.get(layer_name)
.map(LayerFileMetadata::from)
.unwrap_or(LayerFileMetadata::MISSING);
files.insert(layer_name.to_owned(), layer_metadata);
{
Some(layer_metadata) => {
files.insert(layer_name.to_owned(), layer_metadata);
}
None => {
anyhow::bail!(
"No remote layer metadata found for layer {}",
layer_name.file_name()
);
}
}
}
let index_part_metadata = index_part.parse_metadata()?;

View File

@@ -46,8 +46,12 @@ PGconn *pageserver_conn = NULL;
*/
WaitEventSet *pageserver_conn_wes = NULL;
char *page_server_connstring_raw;
char *safekeeper_token_env;
/* GUCs */
char *neon_timeline;
char *neon_tenant;
int32 max_cluster_size;
char *page_server_connstring;
char *neon_auth_token;
int n_unflushed_requests = 0;
int flush_every_n_requests = 8;
@@ -60,10 +64,37 @@ pageserver_connect(int elevel)
{
char *query;
int ret;
const char *keywords[3];
const char *values[3];
int n;
Assert(!connected);
pageserver_conn = PQconnectdb(page_server_connstring);
/*
* Connect using the connection string we got from the
* neon.pageserver_connstring GUC. If the NEON_AUTH_TOKEN environment
* variable was set, use that as the password.
*
* The connection options are parsed in the order they're given, so
* when we set the password before the connection string, the
* connection string can override the password from the env variable.
* Seems useful, although we don't currently use that capability
* anywhere.
*/
n = 0;
if (neon_auth_token)
{
keywords[n] = "password";
values[n] = neon_auth_token;
n++;
}
keywords[n] = "dbname";
values[n] = page_server_connstring;
n++;
keywords[n] = NULL;
values[n] = NULL;
n++;
pageserver_conn = PQconnectdbParams(keywords, values, 1);
if (PQstatus(pageserver_conn) == CONNECTION_BAD)
{
@@ -125,7 +156,7 @@ pageserver_connect(int elevel)
}
}
neon_log(LOG, "libpagestore: connected to '%s'", page_server_connstring_raw);
neon_log(LOG, "libpagestore: connected to '%s'", page_server_connstring);
connected = true;
return true;
@@ -354,105 +385,6 @@ check_neon_id(char **newval, void **extra, GucSource source)
return **newval == '\0' || HexDecodeString(id, *newval, 16);
}
static char *
substitute_pageserver_password(const char *page_server_connstring_raw)
{
char *host = NULL;
char *port = NULL;
char *user = NULL;
char *auth_token = NULL;
char *err = NULL;
char *page_server_connstring = NULL;
PQconninfoOption *conn_options;
PQconninfoOption *conn_option;
MemoryContext oldcontext;
/*
* Here we substitute password in connection string with an environment
* variable. To simplify things we construct a connection string back with
* only known options. In particular: host port user and password. We do
* not currently use other options and constructing full connstring in an
* URI shape is quite messy.
*/
if (page_server_connstring_raw == NULL || page_server_connstring_raw[0] == '\0')
return NULL;
/* extract the auth token from the connection string */
conn_options = PQconninfoParse(page_server_connstring_raw, &err);
if (conn_options == NULL)
{
/* The error string is malloc'd, so we must free it explicitly */
char *errcopy = err ? pstrdup(err) : "out of memory";
PQfreemem(err);
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("invalid connection string syntax: %s", errcopy)));
}
/*
* Trying to populate pageserver connection string with auth token from
* environment. We are looking for password in with placeholder value like
* $ENV_VAR_NAME, so if password field is present and starts with $ we try
* to fetch environment variable value and fail loudly if it is not set.
*/
for (conn_option = conn_options; conn_option->keyword != NULL; conn_option++)
{
if (strcmp(conn_option->keyword, "host") == 0)
{
if (conn_option->val != NULL && conn_option->val[0] != '\0')
host = conn_option->val;
}
else if (strcmp(conn_option->keyword, "port") == 0)
{
if (conn_option->val != NULL && conn_option->val[0] != '\0')
port = conn_option->val;
}
else if (strcmp(conn_option->keyword, "user") == 0)
{
if (conn_option->val != NULL && conn_option->val[0] != '\0')
user = conn_option->val;
}
else if (strcmp(conn_option->keyword, "password") == 0)
{
if (conn_option->val != NULL && conn_option->val[0] != '\0')
{
/* ensure that this is a template */
if (strncmp(conn_option->val, "$", 1) != 0)
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_EXCEPTION),
errmsg("expected placeholder value in pageserver password starting from $ but found: %s", &conn_option->val[1])));
neon_log(LOG, "found auth token placeholder in pageserver conn string '%s'", &conn_option->val[1]);
auth_token = getenv(&conn_option->val[1]);
if (!auth_token)
{
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_EXCEPTION),
errmsg("cannot get auth token, environment variable %s is not set", &conn_option->val[1])));
}
else
{
neon_log(LOG, "using auth token from environment passed via env");
}
}
}
}
/*
* allocate connection string in TopMemoryContext to make sure it is not
* freed
*/
oldcontext = CurrentMemoryContext;
MemoryContextSwitchTo(TopMemoryContext);
page_server_connstring = psprintf("postgresql://%s:%s@%s:%s", user, auth_token ? auth_token : "", host, port);
MemoryContextSwitchTo(oldcontext);
PQconninfoFree(conn_options);
return page_server_connstring;
}
/*
* Module initialization function
*/
@@ -462,21 +394,12 @@ pg_init_libpagestore(void)
DefineCustomStringVariable("neon.pageserver_connstring",
"connection string to the page server",
NULL,
&page_server_connstring_raw,
&page_server_connstring,
"",
PGC_POSTMASTER,
0, /* no flags required */
NULL, NULL, NULL);
DefineCustomStringVariable("neon.safekeeper_token_env",
"the environment variable containing JWT token for authentication with Safekeepers, the convention is to either unset or set to $NEON_AUTH_TOKEN",
NULL,
&safekeeper_token_env,
NULL,
PGC_POSTMASTER,
0, /* no flags required */
NULL, NULL, NULL);
DefineCustomStringVariable("neon.timeline_id",
"Neon timeline_id the server is running on",
NULL,
@@ -533,30 +456,10 @@ pg_init_libpagestore(void)
neon_log(PageStoreTrace, "libpagestore already loaded");
page_server = &api;
/* substitute password in pageserver_connstring */
page_server_connstring = substitute_pageserver_password(page_server_connstring_raw);
/* Is there more correct way to pass CustomGUC to postgres code? */
neon_timeline_walproposer = neon_timeline;
neon_tenant_walproposer = neon_tenant;
/* retrieve the token for Safekeeper, if present */
if (safekeeper_token_env != NULL) {
if (safekeeper_token_env[0] != '$') {
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_EXCEPTION),
errmsg("expected safekeeper auth token environment variable's name starting with $ but found: %s",
safekeeper_token_env)));
}
neon_safekeeper_token_walproposer = getenv(&safekeeper_token_env[1]);
if (!neon_safekeeper_token_walproposer) {
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_EXCEPTION),
errmsg("cannot get safekeeper auth token, environment variable %s is not set",
&safekeeper_token_env[1])));
}
neon_log(LOG, "using safekeeper auth token from environment variable");
}
/* Retrieve the auth token to use when connecting to pageserver and safekeepers */
neon_auth_token = getenv("NEON_AUTH_TOKEN");
if (neon_auth_token)
neon_log(LOG, "using storage auth token from NEON_AUTH_TOKEN environment variable");
if (page_server_connstring && page_server_connstring[0])
{

View File

@@ -51,12 +51,39 @@ walprop_status(WalProposerConn *conn)
}
WalProposerConn *
walprop_connect_start(char *conninfo)
walprop_connect_start(char *conninfo, char *password)
{
WalProposerConn *conn;
PGconn *pg_conn;
const char *keywords[3];
const char *values[3];
int n;
pg_conn = PQconnectStart(conninfo);
/*
* Connect using the given connection string. If the
* NEON_AUTH_TOKEN environment variable was set, use that as
* the password.
*
* The connection options are parsed in the order they're given, so
* when we set the password before the connection string, the
* connection string can override the password from the env variable.
* Seems useful, although we don't currently use that capability
* anywhere.
*/
n = 0;
if (password)
{
keywords[n] = "password";
values[n] = neon_auth_token;
n++;
}
keywords[n] = "dbname";
values[n] = conninfo;
n++;
keywords[n] = NULL;
values[n] = NULL;
n++;
pg_conn = PQconnectStartParams(keywords, values, 1);
/*
* Allocation of a PQconn can fail, and will return NULL. We want to fully

View File

@@ -12,6 +12,11 @@
#ifndef NEON_H
#define NEON_H
/* GUCs */
extern char *neon_auth_token;
extern char *neon_timeline;
extern char *neon_tenant;
extern void pg_init_libpagestore(void);
extern void pg_init_walproposer(void);

View File

@@ -92,14 +92,6 @@ const int SmgrTrace = DEBUG5;
page_server_api *page_server;
/* GUCs */
char *page_server_connstring;
/*with substituted password*/
char *neon_timeline;
char *neon_tenant;
int32 max_cluster_size;
/* unlogged relation build states */
typedef enum
{

View File

@@ -78,10 +78,6 @@ int wal_acceptor_reconnect_timeout;
int wal_acceptor_connection_timeout;
bool am_wal_proposer;
char *neon_timeline_walproposer = NULL;
char *neon_tenant_walproposer = NULL;
char *neon_safekeeper_token_walproposer = NULL;
#define WAL_PROPOSER_SLOT_NAME "wal_proposer_slot"
static int n_safekeepers = 0;
@@ -514,17 +510,9 @@ WalProposerInit(XLogRecPtr flushRecPtr, uint64 systemId)
Safekeeper *sk = &safekeeper[n_safekeepers];
int written = 0;
if (neon_safekeeper_token_walproposer != NULL) {
written = snprintf((char *) &sk->conninfo, MAXCONNINFO,
"host=%s port=%s password=%s dbname=replication options='-c timeline_id=%s tenant_id=%s'",
sk->host, sk->port, neon_safekeeper_token_walproposer, neon_timeline_walproposer,
neon_tenant_walproposer);
} else {
written = snprintf((char *) &sk->conninfo, MAXCONNINFO,
"host=%s port=%s dbname=replication options='-c timeline_id=%s tenant_id=%s'",
sk->host, sk->port, neon_timeline_walproposer, neon_tenant_walproposer);
}
written = snprintf((char *) &sk->conninfo, MAXCONNINFO,
"host=%s port=%s dbname=replication options='-c timeline_id=%s tenant_id=%s'",
sk->host, sk->port, neon_timeline, neon_tenant);
if (written > MAXCONNINFO || written < 0)
elog(FATAL, "could not create connection string for safekeeper %s:%s", sk->host, sk->port);
}
@@ -550,16 +538,16 @@ WalProposerInit(XLogRecPtr flushRecPtr, uint64 systemId)
greetRequest.pgVersion = PG_VERSION_NUM;
pg_strong_random(&greetRequest.proposerId, sizeof(greetRequest.proposerId));
greetRequest.systemId = systemId;
if (!neon_timeline_walproposer)
if (!neon_timeline)
elog(FATAL, "neon.timeline_id is not provided");
if (*neon_timeline_walproposer != '\0' &&
!HexDecodeString(greetRequest.timeline_id, neon_timeline_walproposer, 16))
elog(FATAL, "Could not parse neon.timeline_id, %s", neon_timeline_walproposer);
if (!neon_tenant_walproposer)
if (*neon_timeline != '\0' &&
!HexDecodeString(greetRequest.timeline_id, neon_timeline, 16))
elog(FATAL, "Could not parse neon.timeline_id, %s", neon_timeline);
if (!neon_tenant)
elog(FATAL, "neon.tenant_id is not provided");
if (*neon_tenant_walproposer != '\0' &&
!HexDecodeString(greetRequest.tenant_id, neon_tenant_walproposer, 16))
elog(FATAL, "Could not parse neon.tenant_id, %s", neon_tenant_walproposer);
if (*neon_tenant != '\0' &&
!HexDecodeString(greetRequest.tenant_id, neon_tenant, 16))
elog(FATAL, "Could not parse neon.tenant_id, %s", neon_tenant);
#if PG_VERSION_NUM >= 150000
/* FIXME don't use hardcoded timeline id */
@@ -700,7 +688,7 @@ ResetConnection(Safekeeper *sk)
/*
* Try to establish new connection
*/
sk->conn = walprop_connect_start((char *) &sk->conninfo);
sk->conn = walprop_connect_start((char *) &sk->conninfo, neon_auth_token);
/*
* "If the result is null, then libpq has been unable to allocate a new

View File

@@ -39,10 +39,6 @@ typedef struct WalProposerConn WalProposerConn;
struct WalMessage;
typedef struct WalMessage WalMessage;
extern char *neon_timeline_walproposer;
extern char *neon_tenant_walproposer;
extern char *neon_safekeeper_token_walproposer;
/* Possible return values from ReadPGAsync */
typedef enum
{
@@ -458,7 +454,7 @@ extern char *walprop_error_message(WalProposerConn *conn);
extern WalProposerConnStatusType walprop_status(WalProposerConn *conn);
/* Re-exported PQconnectStart */
extern WalProposerConn * walprop_connect_start(char *conninfo);
extern WalProposerConn * walprop_connect_start(char *conninfo, char *password);
/* Re-exported PQconectPoll */
extern WalProposerConnectPollStatusType walprop_connect_poll(WalProposerConn *conn);

15
pgxn/neon_utils/Makefile Normal file
View File

@@ -0,0 +1,15 @@
# pgxs/neon_utils/Makefile
MODULE_big = neon_utils
OBJS = \
$(WIN32RES) \
neon_utils.o
EXTENSION = neon_utils
DATA = neon_utils--1.0.sql
PGFILEDESC = "neon_utils - small useful functions"
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

View File

@@ -0,0 +1,6 @@
CREATE FUNCTION num_cpus()
RETURNS int
AS 'MODULE_PATHNAME', 'num_cpus'
LANGUAGE C STRICT
PARALLEL UNSAFE
VOLATILE;

View File

@@ -0,0 +1,35 @@
/*-------------------------------------------------------------------------
*
* neon_utils.c
* neon_utils - small useful functions
*
* IDENTIFICATION
* contrib/neon_utils/neon_utils.c
*
*-------------------------------------------------------------------------
*/
#ifdef _WIN32
#include <windows.h>
#else
#include <unistd.h>
#endif
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(num_cpus);
Datum
num_cpus(PG_FUNCTION_ARGS)
{
#ifdef _WIN32
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
uint32 num_cpus = (uint32) sysinfo.dwNumberOfProcessors;
#else
uint32 num_cpus = (uint32) sysconf(_SC_NPROCESSORS_ONLN);
#endif
PG_RETURN_UINT32(num_cpus);
}

View File

@@ -0,0 +1,6 @@
# neon_utils extension
comment = 'neon_utils - small useful functions'
default_version = '1.0'
module_pathname = '$libdir/neon_utils'
relocatable = true
trusted = true

264
poetry.lock generated
View File

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry and should not be changed by hand.
# This file is automatically @generated by Poetry 1.4.0 and should not be changed by hand.
[[package]]
name = "aiohttp"
@@ -253,43 +253,46 @@ files = [
[[package]]
name = "black"
version = "22.6.0"
version = "23.1.0"
description = "The uncompromising code formatter."
category = "dev"
optional = false
python-versions = ">=3.6.2"
python-versions = ">=3.7"
files = [
{file = "black-22.6.0-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:f586c26118bc6e714ec58c09df0157fe2d9ee195c764f630eb0d8e7ccce72e69"},
{file = "black-22.6.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:b270a168d69edb8b7ed32c193ef10fd27844e5c60852039599f9184460ce0807"},
{file = "black-22.6.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:6797f58943fceb1c461fb572edbe828d811e719c24e03375fd25170ada53825e"},
{file = "black-22.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c85928b9d5f83b23cee7d0efcb310172412fbf7cb9d9ce963bd67fd141781def"},
{file = "black-22.6.0-cp310-cp310-win_amd64.whl", hash = "sha256:f6fe02afde060bbeef044af7996f335fbe90b039ccf3f5eb8f16df8b20f77666"},
{file = "black-22.6.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:cfaf3895a9634e882bf9d2363fed5af8888802d670f58b279b0bece00e9a872d"},
{file = "black-22.6.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:94783f636bca89f11eb5d50437e8e17fbc6a929a628d82304c80fa9cd945f256"},
{file = "black-22.6.0-cp36-cp36m-win_amd64.whl", hash = "sha256:2ea29072e954a4d55a2ff58971b83365eba5d3d357352a07a7a4df0d95f51c78"},
{file = "black-22.6.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:e439798f819d49ba1c0bd9664427a05aab79bfba777a6db94fd4e56fae0cb849"},
{file = "black-22.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:187d96c5e713f441a5829e77120c269b6514418f4513a390b0499b0987f2ff1c"},
{file = "black-22.6.0-cp37-cp37m-win_amd64.whl", hash = "sha256:074458dc2f6e0d3dab7928d4417bb6957bb834434516f21514138437accdbe90"},
{file = "black-22.6.0-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:a218d7e5856f91d20f04e931b6f16d15356db1c846ee55f01bac297a705ca24f"},
{file = "black-22.6.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:568ac3c465b1c8b34b61cd7a4e349e93f91abf0f9371eda1cf87194663ab684e"},
{file = "black-22.6.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:6c1734ab264b8f7929cef8ae5f900b85d579e6cbfde09d7387da8f04771b51c6"},
{file = "black-22.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c9a3ac16efe9ec7d7381ddebcc022119794872abce99475345c5a61aa18c45ad"},
{file = "black-22.6.0-cp38-cp38-win_amd64.whl", hash = "sha256:b9fd45787ba8aa3f5e0a0a98920c1012c884622c6c920dbe98dbd05bc7c70fbf"},
{file = "black-22.6.0-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:7ba9be198ecca5031cd78745780d65a3f75a34b2ff9be5837045dce55db83d1c"},
{file = "black-22.6.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:a3db5b6409b96d9bd543323b23ef32a1a2b06416d525d27e0f67e74f1446c8f2"},
{file = "black-22.6.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:560558527e52ce8afba936fcce93a7411ab40c7d5fe8c2463e279e843c0328ee"},
{file = "black-22.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b154e6bbde1e79ea3260c4b40c0b7b3109ffcdf7bc4ebf8859169a6af72cd70b"},
{file = "black-22.6.0-cp39-cp39-win_amd64.whl", hash = "sha256:4af5bc0e1f96be5ae9bd7aaec219c901a94d6caa2484c21983d043371c733fc4"},
{file = "black-22.6.0-py3-none-any.whl", hash = "sha256:ac609cf8ef5e7115ddd07d85d988d074ed00e10fbc3445aee393e70164a2219c"},
{file = "black-22.6.0.tar.gz", hash = "sha256:6c6d39e28aed379aec40da1c65434c77d75e65bb59a1e1c283de545fb4e7c6c9"},
{file = "black-23.1.0-cp310-cp310-macosx_10_16_arm64.whl", hash = "sha256:b6a92a41ee34b883b359998f0c8e6eb8e99803aa8bf3123bf2b2e6fec505a221"},
{file = "black-23.1.0-cp310-cp310-macosx_10_16_universal2.whl", hash = "sha256:57c18c5165c1dbe291d5306e53fb3988122890e57bd9b3dcb75f967f13411a26"},
{file = "black-23.1.0-cp310-cp310-macosx_10_16_x86_64.whl", hash = "sha256:9880d7d419bb7e709b37e28deb5e68a49227713b623c72b2b931028ea65f619b"},
{file = "black-23.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e6663f91b6feca5d06f2ccd49a10f254f9298cc1f7f49c46e498a0771b507104"},
{file = "black-23.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:9afd3f493666a0cd8f8df9a0200c6359ac53940cbde049dcb1a7eb6ee2dd7074"},
{file = "black-23.1.0-cp311-cp311-macosx_10_16_arm64.whl", hash = "sha256:bfffba28dc52a58f04492181392ee380e95262af14ee01d4bc7bb1b1c6ca8d27"},
{file = "black-23.1.0-cp311-cp311-macosx_10_16_universal2.whl", hash = "sha256:c1c476bc7b7d021321e7d93dc2cbd78ce103b84d5a4cf97ed535fbc0d6660648"},
{file = "black-23.1.0-cp311-cp311-macosx_10_16_x86_64.whl", hash = "sha256:382998821f58e5c8238d3166c492139573325287820963d2f7de4d518bd76958"},
{file = "black-23.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2bf649fda611c8550ca9d7592b69f0637218c2369b7744694c5e4902873b2f3a"},
{file = "black-23.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:121ca7f10b4a01fd99951234abdbd97728e1240be89fde18480ffac16503d481"},
{file = "black-23.1.0-cp37-cp37m-macosx_10_16_x86_64.whl", hash = "sha256:a8471939da5e824b891b25751955be52ee7f8a30a916d570a5ba8e0f2eb2ecad"},
{file = "black-23.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8178318cb74f98bc571eef19068f6ab5613b3e59d4f47771582f04e175570ed8"},
{file = "black-23.1.0-cp37-cp37m-win_amd64.whl", hash = "sha256:a436e7881d33acaf2536c46a454bb964a50eff59b21b51c6ccf5a40601fbef24"},
{file = "black-23.1.0-cp38-cp38-macosx_10_16_arm64.whl", hash = "sha256:a59db0a2094d2259c554676403fa2fac3473ccf1354c1c63eccf7ae65aac8ab6"},
{file = "black-23.1.0-cp38-cp38-macosx_10_16_universal2.whl", hash = "sha256:0052dba51dec07ed029ed61b18183942043e00008ec65d5028814afaab9a22fd"},
{file = "black-23.1.0-cp38-cp38-macosx_10_16_x86_64.whl", hash = "sha256:49f7b39e30f326a34b5c9a4213213a6b221d7ae9d58ec70df1c4a307cf2a1580"},
{file = "black-23.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:162e37d49e93bd6eb6f1afc3e17a3d23a823042530c37c3c42eeeaf026f38468"},
{file = "black-23.1.0-cp38-cp38-win_amd64.whl", hash = "sha256:8b70eb40a78dfac24842458476135f9b99ab952dd3f2dab738c1881a9b38b753"},
{file = "black-23.1.0-cp39-cp39-macosx_10_16_arm64.whl", hash = "sha256:a29650759a6a0944e7cca036674655c2f0f63806ddecc45ed40b7b8aa314b651"},
{file = "black-23.1.0-cp39-cp39-macosx_10_16_universal2.whl", hash = "sha256:bb460c8561c8c1bec7824ecbc3ce085eb50005883a6203dcfb0122e95797ee06"},
{file = "black-23.1.0-cp39-cp39-macosx_10_16_x86_64.whl", hash = "sha256:c91dfc2c2a4e50df0026f88d2215e166616e0c80e86004d0003ece0488db2739"},
{file = "black-23.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2a951cc83ab535d248c89f300eccbd625e80ab880fbcfb5ac8afb5f01a258ac9"},
{file = "black-23.1.0-cp39-cp39-win_amd64.whl", hash = "sha256:0680d4380db3719ebcfb2613f34e86c8e6d15ffeabcf8ec59355c5e7b85bb555"},
{file = "black-23.1.0-py3-none-any.whl", hash = "sha256:7a0f701d314cfa0896b9001df70a530eb2472babb76086344e688829efd97d32"},
{file = "black-23.1.0.tar.gz", hash = "sha256:b0bd97bea8903f5a2ba7219257a44e3f1f9d00073d6cc1add68f0beec69692ac"},
]
[package.dependencies]
click = ">=8.0.0"
mypy-extensions = ">=0.4.3"
packaging = ">=22.0"
pathspec = ">=0.9.0"
platformdirs = ">=2"
tomli = {version = ">=1.1.0", markers = "python_full_version < \"3.11.0a7\""}
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
typing-extensions = {version = ">=3.10.0.0", markers = "python_version < \"3.10\""}
[package.extras]
@@ -884,6 +887,8 @@ files = [
{file = "cryptography-39.0.1-cp36-abi3-win32.whl", hash = "sha256:fe913f20024eb2cb2f323e42a64bdf2911bb9738a15dba7d3cce48151034e3a8"},
{file = "cryptography-39.0.1-cp36-abi3-win_amd64.whl", hash = "sha256:ced4e447ae29ca194449a3f1ce132ded8fcab06971ef5f618605aacaa612beac"},
{file = "cryptography-39.0.1-pp38-pypy38_pp73-macosx_10_12_x86_64.whl", hash = "sha256:807ce09d4434881ca3a7594733669bd834f5b2c6d5c7e36f8c00f691887042ad"},
{file = "cryptography-39.0.1-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:c5caeb8188c24888c90b5108a441c106f7faa4c4c075a2bcae438c6e8ca73cef"},
{file = "cryptography-39.0.1-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:4789d1e3e257965e960232345002262ede4d094d1a19f4d3b52e48d4d8f3b885"},
{file = "cryptography-39.0.1-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:96f1157a7c08b5b189b16b47bc9db2332269d6680a196341bf30046330d15388"},
{file = "cryptography-39.0.1-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:e422abdec8b5fa8462aa016786680720d78bdce7a30c652b7fadf83a4ba35336"},
{file = "cryptography-39.0.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl", hash = "sha256:b0afd054cd42f3d213bf82c629efb1ee5f22eba35bf0eec88ea9ea7304f511a2"},
@@ -963,23 +968,6 @@ files = [
[package.extras]
testing = ["pre-commit"]
[[package]]
name = "flake8"
version = "5.0.4"
description = "the modular source code checker: pep8 pyflakes and co"
category = "dev"
optional = false
python-versions = ">=3.6.1"
files = [
{file = "flake8-5.0.4-py2.py3-none-any.whl", hash = "sha256:7a1cf6b73744f5806ab95e526f6f0d8c01c66d7bbe349562d22dfca20610b248"},
{file = "flake8-5.0.4.tar.gz", hash = "sha256:6fbe320aad8d6b95cec8b8e47bc933004678dc63095be98528b7bdd2a9f510db"},
]
[package.dependencies]
mccabe = ">=0.7.0,<0.8.0"
pycodestyle = ">=2.9.0,<2.10.0"
pyflakes = ">=2.5.0,<2.6.0"
[[package]]
name = "flask"
version = "2.1.3"
@@ -1075,24 +1063,6 @@ files = [
{file = "iniconfig-1.1.1.tar.gz", hash = "sha256:bc3af051d7d14b2ee5ef9969666def0cd1a000e121eaea580d4a313df4b37f32"},
]
[[package]]
name = "isort"
version = "5.10.1"
description = "A Python utility / library to sort Python imports."
category = "dev"
optional = false
python-versions = ">=3.6.1,<4.0"
files = [
{file = "isort-5.10.1-py3-none-any.whl", hash = "sha256:6f62d78e2f89b4500b080fe3a81690850cd254227f27f75c3a0c491a1f351ba7"},
{file = "isort-5.10.1.tar.gz", hash = "sha256:e8443a5e7a020e9d7f97f1d7d9cd17c88bcb3bc7e218bf9cf5095fe550be2951"},
]
[package.extras]
colors = ["colorama (>=0.4.3,<0.5.0)"]
pipfile-deprecated-finder = ["pipreqs", "requirementslib"]
plugins = ["setuptools"]
requirements-deprecated-finder = ["pip-api", "pipreqs"]
[[package]]
name = "itsdangerous"
version = "2.1.2"
@@ -1238,6 +1208,7 @@ category = "main"
optional = false
python-versions = "*"
files = [
{file = "junit-xml-1.9.tar.gz", hash = "sha256:de16a051990d4e25a3982b2dd9e89d671067548718866416faec14d9de56db9f"},
{file = "junit_xml-1.9-py2.py3-none-any.whl", hash = "sha256:ec5ca1a55aefdd76d28fcc0b135251d156c7106fa979686a4b48d62b761b4732"},
]
@@ -1294,18 +1265,6 @@ files = [
{file = "MarkupSafe-2.1.1.tar.gz", hash = "sha256:7f91197cc9e48f989d12e4e6fbc46495c446636dfc81b9ccf50bb0ec74b91d4b"},
]
[[package]]
name = "mccabe"
version = "0.7.0"
description = "McCabe checker, plugin for flake8"
category = "dev"
optional = false
python-versions = ">=3.6"
files = [
{file = "mccabe-0.7.0-py2.py3-none-any.whl", hash = "sha256:6c2d30ab6be0e4a46919781807b4f0d834ebdd6c6e3dca0bda5a15f863427b6e"},
{file = "mccabe-0.7.0.tar.gz", hash = "sha256:348e0240c33b60bbdf4e523192ef919f28cb2c3d7d5c7794f74009290f236325"},
]
[[package]]
name = "moto"
version = "4.1.2"
@@ -1453,46 +1412,42 @@ files = [
[[package]]
name = "mypy"
version = "0.991"
version = "1.1.1"
description = "Optional static typing for Python"
category = "dev"
optional = false
python-versions = ">=3.7"
files = [
{file = "mypy-0.991-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:7d17e0a9707d0772f4a7b878f04b4fd11f6f5bcb9b3813975a9b13c9332153ab"},
{file = "mypy-0.991-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:0714258640194d75677e86c786e80ccf294972cc76885d3ebbb560f11db0003d"},
{file = "mypy-0.991-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:0c8f3be99e8a8bd403caa8c03be619544bc2c77a7093685dcf308c6b109426c6"},
{file = "mypy-0.991-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bc9ec663ed6c8f15f4ae9d3c04c989b744436c16d26580eaa760ae9dd5d662eb"},
{file = "mypy-0.991-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:4307270436fd7694b41f913eb09210faff27ea4979ecbcd849e57d2da2f65305"},
{file = "mypy-0.991-cp310-cp310-win_amd64.whl", hash = "sha256:901c2c269c616e6cb0998b33d4adbb4a6af0ac4ce5cd078afd7bc95830e62c1c"},
{file = "mypy-0.991-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:d13674f3fb73805ba0c45eb6c0c3053d218aa1f7abead6e446d474529aafc372"},
{file = "mypy-0.991-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1c8cd4fb70e8584ca1ed5805cbc7c017a3d1a29fb450621089ffed3e99d1857f"},
{file = "mypy-0.991-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:209ee89fbb0deed518605edddd234af80506aec932ad28d73c08f1400ef80a33"},
{file = "mypy-0.991-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:37bd02ebf9d10e05b00d71302d2c2e6ca333e6c2a8584a98c00e038db8121f05"},
{file = "mypy-0.991-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:26efb2fcc6b67e4d5a55561f39176821d2adf88f2745ddc72751b7890f3194ad"},
{file = "mypy-0.991-cp311-cp311-win_amd64.whl", hash = "sha256:3a700330b567114b673cf8ee7388e949f843b356a73b5ab22dd7cff4742a5297"},
{file = "mypy-0.991-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:1f7d1a520373e2272b10796c3ff721ea1a0712288cafaa95931e66aa15798813"},
{file = "mypy-0.991-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:641411733b127c3e0dab94c45af15fea99e4468f99ac88b39efb1ad677da5711"},
{file = "mypy-0.991-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:3d80e36b7d7a9259b740be6d8d906221789b0d836201af4234093cae89ced0cd"},
{file = "mypy-0.991-cp37-cp37m-win_amd64.whl", hash = "sha256:e62ebaad93be3ad1a828a11e90f0e76f15449371ffeecca4a0a0b9adc99abcef"},
{file = "mypy-0.991-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:b86ce2c1866a748c0f6faca5232059f881cda6dda2a893b9a8373353cfe3715a"},
{file = "mypy-0.991-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:ac6e503823143464538efda0e8e356d871557ef60ccd38f8824a4257acc18d93"},
{file = "mypy-0.991-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:0cca5adf694af539aeaa6ac633a7afe9bbd760df9d31be55ab780b77ab5ae8bf"},
{file = "mypy-0.991-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a12c56bf73cdab116df96e4ff39610b92a348cc99a1307e1da3c3768bbb5b135"},
{file = "mypy-0.991-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:652b651d42f155033a1967739788c436491b577b6a44e4c39fb340d0ee7f0d70"},
{file = "mypy-0.991-cp38-cp38-win_amd64.whl", hash = "sha256:4175593dc25d9da12f7de8de873a33f9b2b8bdb4e827a7cae952e5b1a342e243"},
{file = "mypy-0.991-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:98e781cd35c0acf33eb0295e8b9c55cdbef64fcb35f6d3aa2186f289bed6e80d"},
{file = "mypy-0.991-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:6d7464bac72a85cb3491c7e92b5b62f3dcccb8af26826257760a552a5e244aa5"},
{file = "mypy-0.991-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:c9166b3f81a10cdf9b49f2d594b21b31adadb3d5e9db9b834866c3258b695be3"},
{file = "mypy-0.991-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b8472f736a5bfb159a5e36740847808f6f5b659960115ff29c7cecec1741c648"},
{file = "mypy-0.991-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:5e80e758243b97b618cdf22004beb09e8a2de1af481382e4d84bc52152d1c476"},
{file = "mypy-0.991-cp39-cp39-win_amd64.whl", hash = "sha256:74e259b5c19f70d35fcc1ad3d56499065c601dfe94ff67ae48b85596b9ec1461"},
{file = "mypy-0.991-py3-none-any.whl", hash = "sha256:de32edc9b0a7e67c2775e574cb061a537660e51210fbf6006b0b36ea695ae9bb"},
{file = "mypy-0.991.tar.gz", hash = "sha256:3c0165ba8f354a6d9881809ef29f1a9318a236a6d81c690094c5df32107bde06"},
{file = "mypy-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:39c7119335be05630611ee798cc982623b9e8f0cff04a0b48dfc26100e0b97af"},
{file = "mypy-1.1.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:61bf08362e93b6b12fad3eab68c4ea903a077b87c90ac06c11e3d7a09b56b9c1"},
{file = "mypy-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dbb19c9f662e41e474e0cff502b7064a7edc6764f5262b6cd91d698163196799"},
{file = "mypy-1.1.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:315ac73cc1cce4771c27d426b7ea558fb4e2836f89cb0296cbe056894e3a1f78"},
{file = "mypy-1.1.1-cp310-cp310-win_amd64.whl", hash = "sha256:5cb14ff9919b7df3538590fc4d4c49a0f84392237cbf5f7a816b4161c061829e"},
{file = "mypy-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:26cdd6a22b9b40b2fd71881a8a4f34b4d7914c679f154f43385ca878a8297389"},
{file = "mypy-1.1.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:5b5f81b40d94c785f288948c16e1f2da37203c6006546c5d947aab6f90aefef2"},
{file = "mypy-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:21b437be1c02712a605591e1ed1d858aba681757a1e55fe678a15c2244cd68a5"},
{file = "mypy-1.1.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:d809f88734f44a0d44959d795b1e6f64b2bbe0ea4d9cc4776aa588bb4229fc1c"},
{file = "mypy-1.1.1-cp311-cp311-win_amd64.whl", hash = "sha256:a380c041db500e1410bb5b16b3c1c35e61e773a5c3517926b81dfdab7582be54"},
{file = "mypy-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:b7c7b708fe9a871a96626d61912e3f4ddd365bf7f39128362bc50cbd74a634d5"},
{file = "mypy-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c1c10fa12df1232c936830839e2e935d090fc9ee315744ac33b8a32216b93707"},
{file = "mypy-1.1.1-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:0a28a76785bf57655a8ea5eb0540a15b0e781c807b5aa798bd463779988fa1d5"},
{file = "mypy-1.1.1-cp37-cp37m-win_amd64.whl", hash = "sha256:ef6a01e563ec6a4940784c574d33f6ac1943864634517984471642908b30b6f7"},
{file = "mypy-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:d64c28e03ce40d5303450f547e07418c64c241669ab20610f273c9e6290b4b0b"},
{file = "mypy-1.1.1-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:64cc3afb3e9e71a79d06e3ed24bb508a6d66f782aff7e56f628bf35ba2e0ba51"},
{file = "mypy-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ce61663faf7a8e5ec6f456857bfbcec2901fbdb3ad958b778403f63b9e606a1b"},
{file = "mypy-1.1.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:2b0c373d071593deefbcdd87ec8db91ea13bd8f1328d44947e88beae21e8d5e9"},
{file = "mypy-1.1.1-cp38-cp38-win_amd64.whl", hash = "sha256:2888ce4fe5aae5a673386fa232473014056967f3904f5abfcf6367b5af1f612a"},
{file = "mypy-1.1.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:19ba15f9627a5723e522d007fe708007bae52b93faab00f95d72f03e1afa9598"},
{file = "mypy-1.1.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:59bbd71e5c58eed2e992ce6523180e03c221dcd92b52f0e792f291d67b15a71c"},
{file = "mypy-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9401e33814cec6aec8c03a9548e9385e0e228fc1b8b0a37b9ea21038e64cdd8a"},
{file = "mypy-1.1.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:4b398d8b1f4fba0e3c6463e02f8ad3346f71956b92287af22c9b12c3ec965a9f"},
{file = "mypy-1.1.1-cp39-cp39-win_amd64.whl", hash = "sha256:69b35d1dcb5707382810765ed34da9db47e7f95b3528334a3c999b0c90fe523f"},
{file = "mypy-1.1.1-py3-none-any.whl", hash = "sha256:4e4e8b362cdf99ba00c2b218036002bdcdf1e0de085cdb296a49df03fb31dfc4"},
{file = "mypy-1.1.1.tar.gz", hash = "sha256:ae9ceae0f5b9059f33dbc62dea087e942c0ccab4b7a003719cb70f9b8abfa32f"},
]
[package.dependencies]
mypy-extensions = ">=0.4.3"
mypy-extensions = ">=1.0.0"
tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""}
typing-extensions = ">=3.10"
@@ -1519,14 +1474,14 @@ typing-extensions = ">=4.1.0"
[[package]]
name = "mypy-extensions"
version = "0.4.3"
description = "Experimental type system extensions for programs checked with the mypy typechecker."
version = "1.0.0"
description = "Type system extensions for programs checked with the mypy type checker."
category = "dev"
optional = false
python-versions = "*"
python-versions = ">=3.5"
files = [
{file = "mypy_extensions-0.4.3-py2.py3-none-any.whl", hash = "sha256:090fedd75945a69ae91ce1303b5824f428daf5a028d2f6ab8a299250a846f15d"},
{file = "mypy_extensions-0.4.3.tar.gz", hash = "sha256:2d82818f5bb3e369420cb3c4060a7970edba416647068eb4c5343488a6c604a8"},
{file = "mypy_extensions-1.0.0-py3-none-any.whl", hash = "sha256:4392f6c0eb8a5668a69e23d168ffa70f0be9ccfd32b5cc2d26a34ae5b844552d"},
{file = "mypy_extensions-1.0.0.tar.gz", hash = "sha256:75dbf8955dc00442a438fc4d0666508a9a97b6bd41aa2f0ffe9d2f2725af0782"},
]
[[package]]
@@ -1591,19 +1546,16 @@ requests = ["requests"]
[[package]]
name = "packaging"
version = "21.3"
version = "23.0"
description = "Core utilities for Python packages"
category = "main"
optional = false
python-versions = ">=3.6"
python-versions = ">=3.7"
files = [
{file = "packaging-21.3-py3-none-any.whl", hash = "sha256:ef103e05f519cdc783ae24ea4e2e0f508a9c99b2d4969652eed6a2e1ea5bd522"},
{file = "packaging-21.3.tar.gz", hash = "sha256:dd47c42927d89ab911e606518907cc2d3a1f38bbd026385970643f9c5b8ecfeb"},
{file = "packaging-23.0-py3-none-any.whl", hash = "sha256:714ac14496c3e68c99c29b00845f7a2b85f3bb6f1078fd9f72fd20f0570002b2"},
{file = "packaging-23.0.tar.gz", hash = "sha256:b6ad297f8907de0fa2fe1ccbd26fdaf387f5f47c7275fedf8cce89f99446cf97"},
]
[package.dependencies]
pyparsing = ">=2.0.2,<3.0.5 || >3.0.5"
[[package]]
name = "pathspec"
version = "0.9.0"
@@ -1712,6 +1664,7 @@ python-versions = ">=3.6"
files = [
{file = "psycopg2-binary-2.9.3.tar.gz", hash = "sha256:761df5313dc15da1502b21453642d7599d26be88bff659382f8f9747c7ebea4e"},
{file = "psycopg2_binary-2.9.3-cp310-cp310-macosx_10_14_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl", hash = "sha256:539b28661b71da7c0e428692438efbcd048ca21ea81af618d845e06ebfd29478"},
{file = "psycopg2_binary-2.9.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2f2534ab7dc7e776a263b463a16e189eb30e85ec9bbe1bff9e78dae802608932"},
{file = "psycopg2_binary-2.9.3-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6e82d38390a03da28c7985b394ec3f56873174e2c88130e6966cb1c946508e65"},
{file = "psycopg2_binary-2.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:57804fc02ca3ce0dbfbef35c4b3a4a774da66d66ea20f4bda601294ad2ea6092"},
{file = "psycopg2_binary-2.9.3-cp310-cp310-manylinux_2_24_aarch64.whl", hash = "sha256:083a55275f09a62b8ca4902dd11f4b33075b743cf0d360419e2051a8a5d5ff76"},
@@ -1745,6 +1698,7 @@ files = [
{file = "psycopg2_binary-2.9.3-cp37-cp37m-win32.whl", hash = "sha256:adf20d9a67e0b6393eac162eb81fb10bc9130a80540f4df7e7355c2dd4af9fba"},
{file = "psycopg2_binary-2.9.3-cp37-cp37m-win_amd64.whl", hash = "sha256:2f9ffd643bc7349eeb664eba8864d9e01f057880f510e4681ba40a6532f93c71"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-macosx_10_14_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl", hash = "sha256:def68d7c21984b0f8218e8a15d514f714d96904265164f75f8d3a70f9c295667"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:e6aa71ae45f952a2205377773e76f4e3f27951df38e69a4c95440c779e013560"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:dffc08ca91c9ac09008870c9eb77b00a46b3378719584059c034b8945e26b272"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:280b0bb5cbfe8039205c7981cceb006156a675362a00fe29b16fbc264e242834"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-manylinux_2_24_aarch64.whl", hash = "sha256:af9813db73395fb1fc211bac696faea4ca9ef53f32dc0cfa27e4e7cf766dcf24"},
@@ -1756,6 +1710,7 @@ files = [
{file = "psycopg2_binary-2.9.3-cp38-cp38-win32.whl", hash = "sha256:6472a178e291b59e7f16ab49ec8b4f3bdada0a879c68d3817ff0963e722a82ce"},
{file = "psycopg2_binary-2.9.3-cp38-cp38-win_amd64.whl", hash = "sha256:35168209c9d51b145e459e05c31a9eaeffa9a6b0fd61689b48e07464ffd1a83e"},
{file = "psycopg2_binary-2.9.3-cp39-cp39-macosx_10_14_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl", hash = "sha256:47133f3f872faf28c1e87d4357220e809dfd3fa7c64295a4a148bcd1e6e34ec9"},
{file = "psycopg2_binary-2.9.3-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:b3a24a1982ae56461cc24f6680604fffa2c1b818e9dc55680da038792e004d18"},
{file = "psycopg2_binary-2.9.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:91920527dea30175cc02a1099f331aa8c1ba39bf8b7762b7b56cbf54bc5cce42"},
{file = "psycopg2_binary-2.9.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:887dd9aac71765ac0d0bac1d0d4b4f2c99d5f5c1382d8b770404f0f3d0ce8a39"},
{file = "psycopg2_binary-2.9.3-cp39-cp39-manylinux_2_24_aarch64.whl", hash = "sha256:1f14c8b0942714eb3c74e1e71700cbbcb415acbc311c730370e70c578a44a25c"},
@@ -1788,33 +1743,10 @@ category = "main"
optional = false
python-versions = "*"
files = [
{file = "pyasn1-0.4.8-py2.4.egg", hash = "sha256:fec3e9d8e36808a28efb59b489e4528c10ad0f480e57dcc32b4de5c9d8c9fdf3"},
{file = "pyasn1-0.4.8-py2.5.egg", hash = "sha256:0458773cfe65b153891ac249bcf1b5f8f320b7c2ce462151f8fa74de8934becf"},
{file = "pyasn1-0.4.8-py2.6.egg", hash = "sha256:5c9414dcfede6e441f7e8f81b43b34e834731003427e5b09e4e00e3172a10f00"},
{file = "pyasn1-0.4.8-py2.7.egg", hash = "sha256:6e7545f1a61025a4e58bb336952c5061697da694db1cae97b116e9c46abcf7c8"},
{file = "pyasn1-0.4.8-py2.py3-none-any.whl", hash = "sha256:39c7e2ec30515947ff4e87fb6f456dfc6e84857d34be479c9d4a4ba4bf46aa5d"},
{file = "pyasn1-0.4.8-py3.1.egg", hash = "sha256:78fa6da68ed2727915c4767bb386ab32cdba863caa7dbe473eaae45f9959da86"},
{file = "pyasn1-0.4.8-py3.2.egg", hash = "sha256:08c3c53b75eaa48d71cf8c710312316392ed40899cb34710d092e96745a358b7"},
{file = "pyasn1-0.4.8-py3.3.egg", hash = "sha256:03840c999ba71680a131cfaee6fab142e1ed9bbd9c693e285cc6aca0d555e576"},
{file = "pyasn1-0.4.8-py3.4.egg", hash = "sha256:7ab8a544af125fb704feadb008c99a88805126fb525280b2270bb25cc1d78a12"},
{file = "pyasn1-0.4.8-py3.5.egg", hash = "sha256:e89bf84b5437b532b0803ba5c9a5e054d21fec423a89952a74f87fa2c9b7bce2"},
{file = "pyasn1-0.4.8-py3.6.egg", hash = "sha256:014c0e9976956a08139dc0712ae195324a75e142284d5f87f1a87ee1b068a359"},
{file = "pyasn1-0.4.8-py3.7.egg", hash = "sha256:99fcc3c8d804d1bc6d9a099921e39d827026409a58f2a720dcdb89374ea0c776"},
{file = "pyasn1-0.4.8.tar.gz", hash = "sha256:aef77c9fb94a3ac588e87841208bdec464471d9871bd5050a287cc9a475cd0ba"},
]
[[package]]
name = "pycodestyle"
version = "2.9.1"
description = "Python style guide checker"
category = "dev"
optional = false
python-versions = ">=3.6"
files = [
{file = "pycodestyle-2.9.1-py2.py3-none-any.whl", hash = "sha256:d1735fc58b418fd7c5f658d28d943854f8a849b01a5d0a1e6f3f3fdd0166804b"},
{file = "pycodestyle-2.9.1.tar.gz", hash = "sha256:2c9607871d58c76354b697b42f5d57e1ada7d261c261efac224b664affdc5785"},
]
[[package]]
name = "pycparser"
version = "2.21"
@@ -1827,18 +1759,6 @@ files = [
{file = "pycparser-2.21.tar.gz", hash = "sha256:e644fdec12f7872f86c58ff790da456218b10f863970249516d60a5eaca77206"},
]
[[package]]
name = "pyflakes"
version = "2.5.0"
description = "passive checker of Python programs"
category = "dev"
optional = false
python-versions = ">=3.6"
files = [
{file = "pyflakes-2.5.0-py2.py3-none-any.whl", hash = "sha256:4579f67d887f804e67edb544428f264b7b24f435b263c4614f384135cea553d2"},
{file = "pyflakes-2.5.0.tar.gz", hash = "sha256:491feb020dca48ccc562a8c0cbe8df07ee13078df59813b83959cbdada312ea3"},
]
[[package]]
name = "pyjwt"
version = "2.4.0"
@@ -2008,8 +1928,8 @@ files = [
[package.dependencies]
pytest = [
{version = ">=6.2.4", markers = "python_version >= \"3.10\""},
{version = ">=5.0", markers = "python_version < \"3.10\""},
{version = ">=6.2.4", markers = "python_version >= \"3.10\""},
]
[[package]]
@@ -2121,6 +2041,13 @@ files = [
{file = "PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:f84fbc98b019fef2ee9a1cb3ce93e3187a6df0b2538a651bfb890254ba9f90b5"},
{file = "PyYAML-6.0-cp310-cp310-win32.whl", hash = "sha256:2cd5df3de48857ed0544b34e2d40e9fac445930039f3cfe4bcc592a1f836d513"},
{file = "PyYAML-6.0-cp310-cp310-win_amd64.whl", hash = "sha256:daf496c58a8c52083df09b80c860005194014c3698698d1a57cbcfa182142a3a"},
{file = "PyYAML-6.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:d4b0ba9512519522b118090257be113b9468d804b19d63c71dbcf4a48fa32358"},
{file = "PyYAML-6.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:81957921f441d50af23654aa6c5e5eaf9b06aba7f0a19c18a538dc7ef291c5a1"},
{file = "PyYAML-6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:afa17f5bc4d1b10afd4466fd3a44dc0e245382deca5b3c353d8b757f9e3ecb8d"},
{file = "PyYAML-6.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:dbad0e9d368bb989f4515da330b88a057617d16b6a8245084f1b05400f24609f"},
{file = "PyYAML-6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:432557aa2c09802be39460360ddffd48156e30721f5e8d917f01d31694216782"},
{file = "PyYAML-6.0-cp311-cp311-win32.whl", hash = "sha256:bfaef573a63ba8923503d27530362590ff4f576c626d86a9fed95822a8255fd7"},
{file = "PyYAML-6.0-cp311-cp311-win_amd64.whl", hash = "sha256:01b45c0191e6d66c470b6cf1b9531a771a83c1c4208272ead47a3ae4f2f603bf"},
{file = "PyYAML-6.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:897b80890765f037df3403d22bab41627ca8811ae55e9a722fd0392850ec4d86"},
{file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:50602afada6d6cbfad699b0c7bb50d5ccffa7e46a3d738092afddc1f9758427f"},
{file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:48c346915c114f5fdb3ead70312bd042a953a8ce5c7106d5bfb1a5254e47da92"},
@@ -2205,6 +2132,33 @@ files = [
[package.dependencies]
pyasn1 = ">=0.1.3"
[[package]]
name = "ruff"
version = "0.0.255"
description = "An extremely fast Python linter, written in Rust."
category = "dev"
optional = false
python-versions = ">=3.7"
files = [
{file = "ruff-0.0.255-py3-none-macosx_10_7_x86_64.whl", hash = "sha256:b2d71fb6a7e50501a2473864acffc85dee6b750c25db198f7e71fe1dbbff1aad"},
{file = "ruff-0.0.255-py3-none-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl", hash = "sha256:6c97d746861a6010f941179e84bba9feb8a871815667471d9ed6beb98d45c252"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9a7fa60085079b91a298b963361be9b1b1c724582af6c84be954cbabdbd9309a"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c089f7141496334ab5a127b54ce55e41f0d6714e68a4453a1e09d2204cdea8c3"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0423908caa7d437a416b853214565b9c33bbd1106c4f88147982216dddcbbd96"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:981493e92547cacbb8e0874904ec049fe744507ee890dc8736caf89a8864f9a7"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:59d5193d2aedb35db180824462b374dbcfc306b2e76076245088afa6e5837df2"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:dd5e00733c9d160c8a34a22e62b390da9d1e9f326676402421cb8c1236beefc3"},
{file = "ruff-0.0.255-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:694418cf41838bd19c6229e4e1b2d04505b1e6b86fe3ab81165484fc96d36f01"},
{file = "ruff-0.0.255-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5d0408985c9777369daebb5d3340a99e9f7294bdd7120642239261508185cf89"},
{file = "ruff-0.0.255-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:abd6376ef9d12f370d95a8c7c98682fbb9bfedfba59f40e84a816fef8ddcb8de"},
{file = "ruff-0.0.255-py3-none-musllinux_1_2_i686.whl", hash = "sha256:f9b1a5df0bc09193cbef58a6f78e4a9a0b058a4f9733c0442866d078006d1bb9"},
{file = "ruff-0.0.255-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6a25c5f4ff087445b2e1bbcb9963f2ae7c868d65e4a8d5f84c36c12f71571179"},
{file = "ruff-0.0.255-py3-none-win32.whl", hash = "sha256:1ff87a8310354f9f1a099625e54a27fdd6756d9cd2a40b45922f2e943daf982d"},
{file = "ruff-0.0.255-py3-none-win_amd64.whl", hash = "sha256:f3d8416be618f023f93ec4fd6ee3048585ef85dba9563b2a7e38fc7e5131d5b1"},
{file = "ruff-0.0.255-py3-none-win_arm64.whl", hash = "sha256:8ba124819624145d7b6b53add40c367c44318893215ffc1bfe3d72e0225a1c9c"},
{file = "ruff-0.0.255.tar.gz", hash = "sha256:f9eb1d3b2eecbeedae419fa494c4e2a5e4484baf93a1ce0f81eddb005e1919c5"},
]
[[package]]
name = "s3transfer"
version = "0.6.0"
@@ -2643,4 +2597,4 @@ testing = ["func-timeout", "jaraco.itertools", "pytest (>=6)", "pytest-black (>=
[metadata]
lock-version = "2.0"
python-versions = "^3.9"
content-hash = "3038940781ef59d1ed28cedf46120ad6623e21e602c38ad3c359428d79fa1efd"
content-hash = "2515a9320c2960076012fbc036fb33c4f6a23515c8d143785931dc18c6722d91"

View File

@@ -43,17 +43,13 @@ def black(fix_inplace: bool) -> str:
return cmd
def isort(fix_inplace: bool) -> str:
cmd = "poetry run isort"
if not fix_inplace:
cmd += " --diff --check"
def ruff(fix_inplace: bool) -> str:
cmd = "poetry run ruff"
if fix_inplace:
cmd += " --fix"
return cmd
def flake8() -> str:
return "poetry run flake8"
def mypy() -> str:
return "poetry run mypy"
@@ -112,13 +108,6 @@ if __name__ == "__main__":
changed_files=files,
no_color=args.no_color,
)
check(
name="isort",
suffix=".py",
cmd=isort(fix_inplace=args.fix_inplace),
changed_files=files,
no_color=args.no_color,
)
check(
name="black",
suffix=".py",
@@ -127,9 +116,9 @@ if __name__ == "__main__":
no_color=args.no_color,
)
check(
name="flake8",
name="ruff",
suffix=".py",
cmd=flake8(),
cmd=ruff(fix_inplace=args.fix_inplace),
changed_files=files,
no_color=args.no_color,
)

View File

@@ -6,9 +6,9 @@ use std::{io, net::SocketAddr};
use thiserror::Error;
use tokio::net::TcpStream;
use tokio_postgres::NoTls;
use tracing::{error, info};
use tracing::{error, info, warn};
const COULD_NOT_CONNECT: &str = "Could not connect to compute node";
const COULD_NOT_CONNECT: &str = "Couldn't connect to compute node";
#[derive(Debug, Error)]
pub enum ConnectionError {
@@ -131,7 +131,7 @@ impl ConnCfg {
use tokio_postgres::config::Host;
let connect_once = |host, port| {
info!("trying to connect to a compute node at {host}:{port}");
info!("trying to connect to compute node at {host}:{port}");
TcpStream::connect((host, port)).and_then(|socket| async {
let socket_addr = socket.peer_addr()?;
// This prevents load balancer from severing the connection.
@@ -151,7 +151,7 @@ impl ConnCfg {
return Err(io::Error::new(
io::ErrorKind::Other,
format!(
"couldn't connect: bad compute config, \
"bad compute config, \
ports and hosts entries' count does not match: {:?}",
self.0
),
@@ -170,7 +170,7 @@ impl ConnCfg {
Ok(socket) => return Ok(socket),
Err(err) => {
// We can't throw an error here, as there might be more hosts to try.
error!("failed to connect to a compute node at {host}:{port}: {err}");
warn!("couldn't connect to compute node at {host}:{port}: {err}");
connection_error = Some(err);
}
}
@@ -179,7 +179,7 @@ impl ConnCfg {
Err(connection_error.unwrap_or_else(|| {
io::Error::new(
io::ErrorKind::Other,
format!("couldn't connect: bad compute config: {:?}", self.0),
format!("bad compute config: {:?}", self.0),
)
}))
}
@@ -195,12 +195,11 @@ pub struct PostgresConnection {
}
impl ConnCfg {
/// Connect to a corresponding compute node.
pub async fn connect(&self) -> Result<PostgresConnection, ConnectionError> {
async fn do_connect(&self) -> Result<PostgresConnection, ConnectionError> {
// TODO: establish a secure connection to the DB.
let (socket_addr, mut stream) = self.connect_raw().await?;
let (client, connection) = self.0.connect_raw(&mut stream, NoTls).await?;
info!("connected to user's compute node at {socket_addr}");
info!("connected to compute node at {socket_addr}");
// This is very ugly but as of now there's no better way to
// extract the connection parameters from tokio-postgres' connection.
@@ -219,6 +218,16 @@ impl ConnCfg {
Ok(connection)
}
/// Connect to a corresponding compute node.
pub async fn connect(&self) -> Result<PostgresConnection, ConnectionError> {
self.do_connect()
.inspect_err(|err| {
// Immediately log the error we have at our disposal.
error!("couldn't connect to compute node: {err}");
})
.await
}
}
/// Retrieve `options` from a startup message, dropping all proxy-secific flags.

View File

@@ -4,7 +4,7 @@ use crate::{
};
use anyhow::Context;
use once_cell::sync::Lazy;
use postgres_backend::{self, AuthType, PostgresBackend, QueryError};
use postgres_backend::{self, AuthType, PostgresBackend, PostgresBackendTCP, QueryError};
use pq_proto::{BeMessage, SINGLE_COL_ROWDESC};
use std::future;
use tokio::net::{TcpListener, TcpStream};
@@ -71,10 +71,10 @@ pub type ComputeReady = Result<DatabaseInfo, String>;
// TODO: replace with an http-based protocol.
struct MgmtHandler;
#[async_trait::async_trait]
impl postgres_backend::Handler for MgmtHandler {
impl postgres_backend::Handler<tokio::net::TcpStream> for MgmtHandler {
async fn process_query(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackendTCP,
query: &str,
) -> Result<(), QueryError> {
try_process_query(pgb, query).await.map_err(|e| {
@@ -84,7 +84,7 @@ impl postgres_backend::Handler for MgmtHandler {
}
}
async fn try_process_query(pgb: &mut PostgresBackend, query: &str) -> Result<(), QueryError> {
async fn try_process_query(pgb: &mut PostgresBackendTCP, query: &str) -> Result<(), QueryError> {
let resp: KickSession = serde_json::from_str(query).context("Failed to parse query as json")?;
let span = info_span!("event", session_id = resp.session_id);

View File

@@ -21,6 +21,7 @@ const PROXY_IO_BYTES_PER_CLIENT: &str = "proxy_io_bytes_per_client";
#[derive(Eq, Hash, PartialEq, Serialize, Debug)]
pub struct Ids {
pub endpoint_id: String,
pub branch_id: String,
}
pub async fn task_main(config: &MetricCollectionConfig) -> anyhow::Result<()> {
@@ -74,12 +75,23 @@ fn gather_proxy_io_bytes_per_client() -> Vec<(Ids, (u64, DateTime<Utc>))> {
.find(|l| l.get_name() == "endpoint_id")
.unwrap()
.get_value();
let branch_id = ms
.get_label()
.iter()
.find(|l| l.get_name() == "branch_id")
.unwrap()
.get_value();
let value = ms.get_counter().get_value() as u64;
debug!("endpoint_id:val - {}: {}", endpoint_id, value);
debug!(
"branch_id {} endpoint_id {} val: {}",
branch_id, endpoint_id, value
);
current_metrics.push((
Ids {
endpoint_id: endpoint_id.to_string(),
branch_id: "".to_string(),
},
(value, Utc::now()),
));
@@ -131,6 +143,7 @@ async fn collect_metrics_iteration(
value,
extra: Ids {
endpoint_id: curr_key.endpoint_id.clone(),
branch_id: curr_key.branch_id.clone(),
},
})
})
@@ -172,6 +185,7 @@ async fn collect_metrics_iteration(
cached_metrics
.entry(Ids {
endpoint_id: send_metric.extra.endpoint_id.clone(),
branch_id: send_metric.extra.branch_id.clone(),
})
// update cached value (add delta) and time
.and_modify(|e| {

View File

@@ -8,7 +8,7 @@ use crate::{
config::{ProxyConfig, TlsConfig},
console::{self, messages::MetricsAuxInfo},
error::io_error,
stream::{MeasuredStream, PqStream, Stream},
stream::{PqStream, Stream},
};
use anyhow::{bail, Context};
use futures::TryFutureExt;
@@ -16,8 +16,9 @@ use metrics::{register_int_counter, register_int_counter_vec, IntCounter, IntCou
use once_cell::sync::Lazy;
use pq_proto::{BeMessage as Be, FeStartupPacket, StartupMessageParams};
use std::sync::Arc;
use tokio::io::{AsyncRead, AsyncWrite};
use tokio::io::{AsyncRead, AsyncWrite, AsyncWriteExt};
use tracing::{error, info, warn};
use utils::measured_stream::MeasuredStream;
/// Number of times we should retry the `/proxy_wake_compute` http request.
const NUM_RETRIES_WAKE_COMPUTE: usize = 1;
@@ -208,9 +209,18 @@ async fn handshake<S: AsyncRead + AsyncWrite + Unpin>(
if let Some(tls) = tls.take() {
// Upgrade raw stream into a secure TLS-backed stream.
// NOTE: We've consumed `tls`; this fact will be used later.
stream = PqStream::new(
stream.into_inner().upgrade(tls.to_server_config()).await?,
);
let (raw, read_buf) = stream.into_inner();
// TODO: Normally, client doesn't send any data before
// server says TLS handshake is ok and read_buf is empy.
// However, you could imagine pipelining of postgres
// SSLRequest + TLS ClientHello in one hunk similar to
// pipelining in our node js driver. We should probably
// support that by chaining read_buf with the stream.
if !read_buf.is_empty() {
bail!("data is sent before server replied with EncryptionResponse");
}
stream = PqStream::new(raw.upgrade(tls.to_server_config()).await?);
}
}
_ => bail!(ERR_PROTO_VIOLATION),
@@ -353,16 +363,24 @@ async fn proxy_pass(
aux: &MetricsAuxInfo,
) -> anyhow::Result<()> {
let m_sent = NUM_BYTES_PROXIED_COUNTER.with_label_values(&aux.traffic_labels("tx"));
let mut client = MeasuredStream::new(client, |cnt| {
// Number of bytes we sent to the client (outbound).
m_sent.inc_by(cnt as u64);
});
let mut client = MeasuredStream::new(
client,
|_| {},
|cnt| {
// Number of bytes we sent to the client (outbound).
m_sent.inc_by(cnt as u64);
},
);
let m_recv = NUM_BYTES_PROXIED_COUNTER.with_label_values(&aux.traffic_labels("rx"));
let mut compute = MeasuredStream::new(compute, |cnt| {
// Number of bytes the client sent to the compute node (inbound).
m_recv.inc_by(cnt as u64);
});
let mut compute = MeasuredStream::new(
compute,
|_| {},
|cnt| {
// Number of bytes the client sent to the compute node (inbound).
m_recv.inc_by(cnt as u64);
},
);
// Starting from here we only proxy the client's traffic.
info!("performing the proxy pass...");
@@ -434,11 +452,17 @@ impl<S: AsyncRead + AsyncWrite + Unpin> Client<'_, S> {
value: mut node_info,
} = auth_result;
let node = connect_to_compute(&mut node_info, params, &extra, &creds)
let mut node = connect_to_compute(&mut node_info, params, &extra, &creds)
.or_else(|e| stream.throw_error(e))
.await?;
prepare_client_connection(&node, reported_auth_ok, session, &mut stream).await?;
proxy_pass(stream.into_inner(), node.stream, &node_info.aux).await
// Before proxy passing, forward to compute whatever data is left in the
// PqStream input buffer. Normally there is none, but our serverless npm
// driver in pipeline mode sends startup, password and first query
// immediately after opening the connection.
let (stream, read_buf) = stream.into_inner();
node.stream.write_all(&read_buf).await?;
proxy_pass(stream, node.stream, &node_info.aux).await
}
}

View File

@@ -14,7 +14,7 @@ pub const SCRAM_RAW_NONCE_LEN: usize = 18;
fn validate_sasl_extensions<'a>(parts: impl Iterator<Item = &'a str>) -> Option<()> {
for mut chars in parts.map(|s| s.chars()) {
let attr = chars.next()?;
if !('a'..='z').contains(&attr) && !('A'..='Z').contains(&attr) {
if !attr.is_ascii_alphabetic() {
return None;
}
let eq = chars.next()?;

View File

@@ -1,5 +1,6 @@
use crate::error::UserFacingError;
use anyhow::bail;
use bytes::BytesMut;
use pin_project_lite::pin_project;
use pq_proto::framed::{ConnectionError, Framed};
use pq_proto::{BeMessage, FeMessage, FeStartupPacket, ProtocolError};
@@ -27,8 +28,8 @@ impl<S> PqStream<S> {
}
}
/// Extract the underlying stream.
pub fn into_inner(self) -> S {
/// Extract the underlying stream and read buffer.
pub fn into_inner(self) -> (S, BytesMut) {
self.framed.into_inner()
}
@@ -217,68 +218,3 @@ impl<S: AsyncRead + AsyncWrite + Unpin> AsyncWrite for Stream<S> {
}
}
}
pin_project! {
/// This stream tracks all writes and calls user provided
/// callback when the underlying stream is flushed.
pub struct MeasuredStream<S, W> {
#[pin]
stream: S,
write_count: usize,
inc_write_count: W,
}
}
impl<S, W> MeasuredStream<S, W> {
pub fn new(stream: S, inc_write_count: W) -> Self {
Self {
stream,
write_count: 0,
inc_write_count,
}
}
}
impl<S: AsyncRead + Unpin, W> AsyncRead for MeasuredStream<S, W> {
fn poll_read(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
buf: &mut ReadBuf<'_>,
) -> task::Poll<io::Result<()>> {
self.project().stream.poll_read(context, buf)
}
}
impl<S: AsyncWrite + Unpin, W: FnMut(usize)> AsyncWrite for MeasuredStream<S, W> {
fn poll_write(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
buf: &[u8],
) -> task::Poll<io::Result<usize>> {
let this = self.project();
this.stream.poll_write(context, buf).map_ok(|cnt| {
// Increment the write count.
*this.write_count += cnt;
cnt
})
}
fn poll_flush(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
) -> task::Poll<io::Result<()>> {
let this = self.project();
this.stream.poll_flush(context).map_ok(|()| {
// Call the user provided callback and reset the write count.
(this.inc_write_count)(*this.write_count);
*this.write_count = 0;
})
}
fn poll_shutdown(
self: Pin<&mut Self>,
context: &mut task::Context<'_>,
) -> task::Poll<io::Result<()>> {
self.project().stream.poll_shutdown(context)
}
}

View File

@@ -35,11 +35,10 @@ types-toml = "^0.10.8"
pytest-httpserver = "^1.0.6"
aiohttp = "3.7.4"
[tool.poetry.dev-dependencies]
flake8 = "^5.0.4"
mypy = "==0.991"
black = "^22.6.0"
isort = "^5.10.1"
[tool.poetry.group.dev.dependencies]
black = "^23.1.0"
mypy = "==1.1.1"
ruff = "^0.0.255"
[build-system]
requires = ["poetry-core>=1.0.0"]
@@ -53,14 +52,6 @@ extend-exclude = '''
)/
'''
[tool.isort]
profile = "black"
line_length = 100
skip_gitignore = true
skip = [
"vendor",
]
[tool.mypy]
exclude = "^vendor/"
check_untyped_defs = true
@@ -80,3 +71,13 @@ module = [
"pg8000.*",
]
ignore_missing_imports = true
[tool.ruff]
extend-exclude = ["vendor/"]
ignore = ["E501"]
select = [
"E", # pycodestyle
"F", # Pyflakes
"I", # isort
"W", # pycodestyle
]

View File

@@ -8,21 +8,13 @@
# warnings and errors right in the editor.
# In vscode, this setting is Rust-analyzer>Check On Save:Command
# Not every feature is supported in macOS builds. Avoid running regular linting
# script that checks every feature.
#
# manual-range-contains wants
# !(4..=MAX_STARTUP_PACKET_LENGTH).contains(&len)
# instead of
# len < 4 || len > MAX_STARTUP_PACKET_LENGTH
# , let's disagree.
if [[ "$OSTYPE" == "darwin"* ]]; then
# no extra features to test currently, add more here when needed
cargo clippy --locked --all --all-targets --features testing -- -A unknown_lints -A clippy::manual-range-contains -D warnings
else
# * `-A unknown_lints` do not warn about unknown lint suppressions
# that people with newer toolchains might use
# * `-D warnings` - fail on any warnings (`cargo` returns non-zero exit status)
cargo clippy --locked --all --all-targets --all-features -- -A unknown_lints -A clippy::manual-range-contains -D warnings
fi
# * `-A unknown_lints` do not warn about unknown lint suppressions
# that people with newer toolchains might use
# * `-D warnings` - fail on any warnings (`cargo` returns non-zero exit status)
cargo clippy --locked --all --all-targets --all-features -- -A unknown_lints -A clippy::manual-range-contains -D warnings

View File

@@ -19,7 +19,6 @@ git-version.workspace = true
hex.workspace = true
humantime.workspace = true
hyper.workspace = true
nix.workspace = true
once_cell.workspace = true
parking_lot.workspace = true
postgres.workspace = true

View File

@@ -71,6 +71,9 @@ struct Args {
/// Listen http endpoint for management and metrics in the form host:port.
#[arg(long, default_value = DEFAULT_HTTP_LISTEN_ADDR)]
listen_http: String,
/// Availability zone of the safekeeper.
#[arg(long)]
availability_zone: Option<String>,
/// Do not wait for changes to be written safely to disk. Unsafe.
#[arg(short, long)]
no_sync: bool,
@@ -108,7 +111,7 @@ struct Args {
/// WAL backup horizon.
#[arg(long)]
disable_wal_backup: bool,
/// Path to an RSA .pem public key which is used to check JWT tokens.
/// Path to a .pem public key which is used to check JWT tokens.
#[arg(long)]
auth_validation_public_key_path: Option<PathBuf>,
/// Format for logging, either 'plain' or 'json'.
@@ -166,6 +169,7 @@ fn main() -> anyhow::Result<()> {
my_id: id,
listen_pg_addr: args.listen_pg,
listen_http_addr: args.listen_http,
availability_zone: args.availability_zone,
no_sync: args.no_sync,
broker_endpoint: args.broker_endpoint,
broker_keepalive_interval: args.broker_keepalive_interval,

View File

@@ -3,11 +3,14 @@
use anyhow::Context;
use std::str;
use tokio::io::{AsyncRead, AsyncWrite};
use tracing::{info, info_span, Instrument};
use crate::auth::check_permission;
use crate::json_ctrl::{handle_json_ctrl, AppendLogicalMessage};
use crate::metrics::TrafficMetrics;
use crate::wal_service::ConnectionId;
use crate::{GlobalTimelines, SafeKeeperConf};
use postgres_backend::QueryError;
use postgres_backend::{self, PostgresBackend};
@@ -28,7 +31,10 @@ pub struct SafekeeperPostgresHandler {
pub tenant_id: Option<TenantId>,
pub timeline_id: Option<TimelineId>,
pub ttid: TenantTimelineId,
/// Unique connection id is logged in spans for observability.
pub conn_id: ConnectionId,
claims: Option<Claims>,
io_metrics: Option<TrafficMetrics>,
}
/// Parsed Postgres command.
@@ -64,11 +70,13 @@ fn parse_cmd(cmd: &str) -> anyhow::Result<SafekeeperPostgresCommand> {
}
#[async_trait::async_trait]
impl postgres_backend::Handler for SafekeeperPostgresHandler {
impl<IO: AsyncRead + AsyncWrite + Unpin + Send> postgres_backend::Handler<IO>
for SafekeeperPostgresHandler
{
// tenant_id and timeline_id are passed in connection string params
fn startup(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackend<IO>,
sm: &FeStartupPacket,
) -> Result<(), QueryError> {
if let FeStartupPacket::StartupMessage { params, .. } = sm {
@@ -88,6 +96,11 @@ impl postgres_backend::Handler for SafekeeperPostgresHandler {
format!("Failed to parse {value} as timeline id")
})?);
}
Some(("availability_zone", client_az)) => {
if let Some(metrics) = self.io_metrics.as_ref() {
metrics.set_client_az(client_az)
}
}
_ => continue,
}
}
@@ -95,6 +108,9 @@ impl postgres_backend::Handler for SafekeeperPostgresHandler {
if let Some(app_name) = params.get("application_name") {
self.appname = Some(app_name.to_owned());
if let Some(metrics) = self.io_metrics.as_ref() {
metrics.set_app_name(app_name)
}
}
Ok(())
@@ -107,7 +123,7 @@ impl postgres_backend::Handler for SafekeeperPostgresHandler {
fn check_auth_jwt(
&mut self,
_pgb: &mut PostgresBackend,
_pgb: &mut PostgresBackend<IO>,
jwt_response: &[u8],
) -> Result<(), QueryError> {
// this unwrap is never triggered, because check_auth_jwt only called when auth_type is NeonJWT
@@ -136,7 +152,7 @@ impl postgres_backend::Handler for SafekeeperPostgresHandler {
async fn process_query(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackend<IO>,
query_string: &str,
) -> Result<(), QueryError> {
if query_string
@@ -181,14 +197,16 @@ impl postgres_backend::Handler for SafekeeperPostgresHandler {
}
impl SafekeeperPostgresHandler {
pub fn new(conf: SafeKeeperConf) -> Self {
pub fn new(conf: SafeKeeperConf, conn_id: u32, io_metrics: Option<TrafficMetrics>) -> Self {
SafekeeperPostgresHandler {
conf,
appname: None,
tenant_id: None,
timeline_id: None,
ttid: TenantTimelineId::empty(),
conn_id,
claims: None,
io_metrics,
}
}
@@ -212,9 +230,9 @@ impl SafekeeperPostgresHandler {
///
/// Handle IDENTIFY_SYSTEM replication command
///
async fn handle_identify_system(
async fn handle_identify_system<IO: AsyncRead + AsyncWrite + Unpin>(
&mut self,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackend<IO>,
) -> Result<(), QueryError> {
let tli = GlobalTimelines::get(self.ttid).map_err(|e| QueryError::Other(e.into()))?;

View File

@@ -12,6 +12,7 @@ use anyhow::Context;
use bytes::Bytes;
use postgres_backend::QueryError;
use serde::{Deserialize, Serialize};
use tokio::io::{AsyncRead, AsyncWrite};
use tracing::*;
use utils::id::TenantTimelineId;
@@ -60,9 +61,9 @@ struct AppendResult {
/// Handles command to craft logical message WAL record with given
/// content, and then append it with specified term and lsn. This
/// function is used to test safekeepers in different scenarios.
pub async fn handle_json_ctrl(
pub async fn handle_json_ctrl<IO: AsyncRead + AsyncWrite + Unpin>(
spg: &SafekeeperPostgresHandler,
pgb: &mut PostgresBackend,
pgb: &mut PostgresBackend<IO>,
append_request: &AppendLogicalMessage,
) -> Result<(), QueryError> {
info!("JSON_CTRL request: {append_request:?}");

Some files were not shown because too many files have changed in this diff Show More