Compare commits

...

230 Commits

Author SHA1 Message Date
Shany Pozin
4430d0ae7d Merge pull request #5876 from neondatabase/releases/2023-11-17
Release 2023-11-17
2023-11-20 09:11:58 +02:00
Joonas Koivunen
6e183aa0de Merge branch 'main' into releases/2023-11-17 2023-11-19 15:25:47 +00:00
Joonas Koivunen
ac08072d2e fix(layer): VirtualFile opening and read errors can be caused by contention (#5880)
A very low number of layer loads have been marked wrongly as permanent,
as I did not remember that `VirtualFile::open` or reading could fail
transiently for contention. Return separate errors for transient and
persistent errors from `{Delta,Image}LayerInner::load`.

Includes drive-by comment changes.

The implementation looks quite ugly because having the same type be both
the inner (operation error) and outer (critical error), but with the
alternatives I tried I did not find a better way.
2023-11-19 14:57:39 +00:00
John Spray
d22dce2e31 pageserver: shut down idle walredo processes (#5877)
The longer a pageserver runs, the more walredo processes it accumulates
from tenants that are touched intermittently (e.g. by availability
checks). This can lead to getting OOM killed.

Changes:
- Add an Instant recording the last use of the walredo process for a
tenant
- After compaction iteration in the background task, check for idleness
and stop the walredo process if idle for more than 10x compaction
period.

Cc: #3620

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
2023-11-19 14:21:16 +00:00
Joonas Koivunen
3b3f040be3 fix(background_tasks): first backoff, compaction error stacktraces (#5881)
First compaction/gc error backoff starts from 0 which is less than 2s
what it was before #5672. This is now fixed to be the intended 2**n.

Additionally noticed the `compaction_iteration` creating an
`anyhow::Error` via `into()` always captures a stacktrace even if we had
a stacktraceful anyhow error within the CompactionError because there is
no stable api for querying that.
2023-11-19 14:16:31 +00:00
Em Sharnoff
cad0dca4b8 compute_ctl: Remove deprecated flag --file-cache-on-disk (#5622)
See neondatabase/cloud#7516 for more.
2023-11-18 12:43:54 +01:00
Vadim Kharitonov
fd6d0b7635 Merge branch 'release' into releases/2023-11-17 2023-11-17 10:51:45 +01:00
Em Sharnoff
5d13a2e426 Improve error message when neon.max_cluster_size reached (#4173)
Changes the error message encountered when the `neon.max_cluster_size`
limit is reached. Reasoning is that this is user-visible, and so should
*probably* use language that's closer to what users are familiar with.
2023-11-16 21:51:26 +00:00
khanova
0c243faf96 Proxy log pid hack (#5869)
## Problem

Improve observability for the compute node.

## Summary of changes

Log pid from the compute node. Doesn't work with pgbouncer.
2023-11-16 20:46:23 +00:00
Em Sharnoff
d0a842a509 Update vm-builder to v0.19.0 and move its customization here (#5783)
ref neondatabase/autoscaling#600 for more
2023-11-16 18:17:42 +01:00
khanova
6b82f22ada Collect number of connections by sni type (#5867)
## Problem

We don't know the number of users with the different kind of
authentication: ["sni", "endpoint in options" (A and B from
[here](https://neon.tech/docs/connect/connection-errors)),
"password_hack"]

## Summary of changes

Collect metrics by sni kind.
2023-11-16 12:19:13 +00:00
John Spray
ab631e6792 pageserver: make TenantsMap shard-aware (#5819)
## Problem

When using TenantId as the key, we are unable to handle multiple tenant
shards attached to the same pageserver for the same tenant ID. This is
an expected scenario if we have e.g. 8 shards and 5 pageservers.

## Summary of changes

- TenantsMap is now a BTreeMap instead of a HashMap: this enables
looking up by range. In future, we will need this for page_service, as
incoming requests will just specify the Key, and we'll have to figure
out which shard to route it to.
- A new key type TenantShardId is introduced, to act as the key in
TenantsMap, and as the id type in external APIs. Its human readable
serialization is backward compatible with TenantId, and also
forward-compatible as long as sharding is not actually used (when we
construct a TenantShardId with ShardCount(0), it serializes to an
old-fashioned TenantId).
- Essential tenant APIs are updated to accept TenantShardIds:
tenant/timeline create, tenant delete, and /location_conf. These are the
APIs that will enable driving sharded tenants. Other apis like /attach
/detach /load /ignore will not work with sharding: those will soon be
deprecated and replaced with /location_conf as part of the live
migration work.

Closes: #5787
2023-11-15 23:20:21 +02:00
Alexander Bayandin
f84ac2b98d Fix baseline commit and branch for code coverage (#5769)
## Problem

`HEAD` commit for a PR is a phantom merge commit which skews the baseline
commit for coverage reports.

See
https://github.com/neondatabase/neon/pull/5751#issuecomment-1790717867

## Summary of changes
- Use commit hash instead of `HEAD` for finding baseline commits for
code coverage
- Use the base branch for PRs or the current branch for pushes
2023-11-15 12:40:21 +01:00
dependabot[bot]
5cd5b93066 build(deps): bump aiohttp from 3.8.5 to 3.8.6 (#5864) 2023-11-15 11:08:49 +00:00
khanova
2f0d245c2a Proxy control plane rate limiter (#5785)
## Problem

Proxy might overload the control plane.

## Summary of changes

Implement rate limiter for proxy<->control plane connection.
Resolves https://github.com/neondatabase/neon/issues/5707

Used implementation ideas from https://github.com/conradludgate/squeeze/
2023-11-15 09:15:59 +00:00
Joonas Koivunen
462f04d377 Smaller test addition and change (#5858)
- trivial serialization roundtrip test for
`pageserver::repository::Value`
- add missing `start_paused = true` to 15s test making it <0s test
- completely unrelated future clippy lint avoidance (helps beta channel
users)
2023-11-14 18:04:34 +01:00
Arpad Müller
31a54d663c Migrate links from wiki to notion (#5862)
See the slack discussion:
https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499
2023-11-14 15:36:47 +00:00
John Spray
7709c91fe5 neon_local: use remote storage by default, add cargo neon tenant migrate (#5760)
## Problem

Currently the only way to exercise tenant migration is via python test
code. We need a convenient way for developers to do it directly in a
neon local environment.

## Summary of changes

- Add a `--num-pageservers` argument to `cargo neon init` so that it's
easy to run with multiple pageservers
- Modify default pageserver overrides in neon_local to set up `LocalFs`
remote storage, as any migration/attach/detach stuff doesn't work in the
legacy local storage mode. This also unblocks removing the pageserver's
support for the legacy local mode.
- Add a new `cargo neon tenant migrate` command that orchestrates tenant
migration, including endpoints.
2023-11-14 09:51:51 +00:00
Arpad Müller
f7249b9018 Fix comment in find_lsn_for_timestamp (#5855)
We still subtract 1 from low to compute `commit_lsn`. the comment
moved/added by #5844 should point this out.
2023-11-11 00:32:00 +00:00
Joonas Koivunen
74d150ba45 build: upgrade ahash (#5851)
`cargo deny` was complaining the version 0.8.3 was yanked (for possible
DoS attack [wiki]), but the latest version (0.8.5) also includes aarch64
fixes which may or may not be relevant. Our usage of ahash limits to
proxy, but I don't think we are at any risk.

[wiki]: https://github.com/tkaitchuck/aHash/wiki/Yanked-versions
2023-11-10 19:10:54 +00:00
Joonas Koivunen
b7f45204a2 build: deny async-std and friends (#5849)
rationale: some crates pull these in as default; hopefully these hints
will require less cleanup-after and Cargo.lock file watching.

follow-up to #5848.
2023-11-10 18:02:22 +01:00
Joonas Koivunen
a05f104cce build: remove async-std dependency (#5848)
Introduced by accident (missing `default-features = false`) in
e09d5ada6a. We directly need only `http_types::StatusCode`.
2023-11-10 16:05:21 +02:00
John Spray
d672e44eee pageserver: error type for collect_keyspace (#5846)
## Problem

This is a log hygiene fix, for an occasional test failure.

warn-level logging in imitate_timeline_cached_layer_accesses can't
distinguish actual errors from shutdown cases.

## Summary of changes

Replaced anyhow::Error with an explicit CollectKeySpaceError type, that
includes conversion from PageReconstructError::Cancelled.
2023-11-10 13:58:18 +00:00
Rahul Modpur
a6f892e200 metric: add started and killed walredo processes counter (#5809)
In OOM situations, knowing exactly how many walredo processes there were
at a time would help afterwards to understand why was pageserver OOM
killed. Add `pageserver_wal_redo_process_total` metric to keep track of
total wal redo process started, shutdown and killed since pageserver
start.

Closes #5722

---------

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
2023-11-10 15:05:22 +02:00
Alexander Bayandin
71b380f90a Set BUILD_TAG for build-neon job (#5847)
## Problem

I've added `BUILD_TAG` to docker images.
(https://github.com/neondatabase/neon/pull/5812), but forgot to add it
to services that we build for tests

## Summary of changes
- Set `BUILD_TAG` in `build-neon` job
2023-11-10 12:49:52 +00:00
Alexander Bayandin
6e145a44fa workflows/neon_extra_builds: run check-codestyle-rust & build-neon on arm64 (#5832)
## Problem

Some developers use workstations with arm CPUs, and sometimes x86-64
code is not fully compatible with it (for example,
https://github.com/neondatabase/neon/pull/5827).
Although we don't have arm CPUs in the prod (yet?), it is worth having
some basic checks for this architecture to have a better developer
experience.

Closes https://github.com/neondatabase/neon/issues/5829

## Summary of changes
- Run `check-codestyle-rust`-like & `build-neon`-like jobs on Arm runner
- Add `run-extra-build-*` label to run all available extra builds
2023-11-10 12:45:41 +00:00
Arpad Müller
8e5e3971ba find_lsn_for_timestamp fixes (#5844)
Includes the changes of #3689 that address point 1 of #3689, plus some
further improvements. In particular, this PR does:

* set `min_lsn` to a safe value to create branches from (and verify it
in tests)
* return `min_lsn` instead of `max_lsn` for `NoData` and `Past` (verify
it in test for `Past`, `NoData` is harder and not as important)
* return `commit_lsn` instead of `max_lsn` for Future (and verify it in
the tests)
* add some comments

Split out of #5686 to get something more minimal out to users.
2023-11-10 13:38:44 +01:00
Joonas Koivunen
8dd29f1e27 fix(pageserver): spawn all kinds of tenant shutdowns (#5841)
Minor bugfix, something noticed while manual code-review. Use the same
joinset for inprogress tenants so we can get the benefit of the
buffering logging just as we get for attached tenants, and no single
inprogress task can hold up shutdown of other tenants.
2023-11-09 21:36:57 +00:00
Joonas Koivunen
f5344fb85a temp: log all layer loading errors while we lose them (#5816)
Temporary workaround while some errors are not being logged.

Cc: #5815.
2023-11-09 21:31:53 +00:00
Vadim Kharitonov
3710c32aae Merge pull request #5778 from neondatabase/releases/2023-11-03
Release 2023-11-03
2023-11-03 16:06:58 +01:00
Vadim Kharitonov
be83bee49d Merge branch 'release' into releases/2023-11-03 2023-11-03 11:18:15 +01:00
Alexander Bayandin
cf28e5922a Merge pull request #5685 from neondatabase/releases/2023-10-26
Release 2023-10-26
2023-10-27 10:42:12 +01:00
Em Sharnoff
7d384d6953 Bump vm-builder v0.18.2 -> v0.18.4 (#5666)
Only applicable change was neondatabase/autoscaling#584, setting
pgbouncer auth_dbname=postgres in order to fix superuser connections
from preventing dropping databases.
2023-10-26 20:15:45 +01:00
Em Sharnoff
4b3b37b912 Bump vm-builder v0.18.1 -> v0.18.2 (#5646)
Only applicable change was neondatabase/autoscaling#571, removing the
postgres_exporter flags `--auto-discover-databases` and
`--exclude-databases=...`
2023-10-26 20:15:29 +01:00
Shany Pozin
1d8d200f4d Merge pull request #5668 from neondatabase/sp/aux_files_cherry_pick
Cherry pick: Ignore missed AUX_FILES_KEY when generating image layer (#5660)
2023-10-26 10:08:16 +03:00
Konstantin Knizhnik
0d80d6ce18 Ignore missed AUX_FILES_KEY when generating image layer (#5660)
## Problem

Logical replication requires new AUX_FILES_KEY which is definitely
absent in existed database.
We do not have function to check if key exists in our KV storage.
So I have to handle the error in `list_aux_files` method.
But this key is also included in key space range and accessed y
`create_image_layer` method.

## Summary of changes

Check if AUX_FILES_KEY  exists before including it in keyspace.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-10-26 09:30:28 +03:00
Shany Pozin
f653ee039f Merge pull request #5638 from neondatabase/releases/2023-10-24
Release 2023-10-24
2023-10-24 12:10:52 +03:00
Em Sharnoff
e614a95853 Merge pull request #5610 from neondatabase/sharnoff/rc-2023-10-20-vm-monitor-fixes
Release 2023-10-20: vm-monitor memory.high throttling fixes
2023-10-20 00:11:06 -07:00
Em Sharnoff
850db4cc13 vm-monitor: Deny not fail downscale if no memory stats yet (#5606)
Fixes an issue we observed on staging that happens when the
autoscaler-agent attempts to immediately downscale the VM after binding,
which is typical for pooled computes.

The issue was occurring because the autoscaler-agent was requesting
downscaling before the vm-monitor had gathered sufficient cgroup memory
stats to be confident in approving it. When the vm-monitor returned an
internal error instead of denying downscaling, the autoscaler-agent
retried the connection and immediately hit the same issue (in part
because cgroup stats are collected per-connection, rather than
globally).
2023-10-19 21:56:55 -07:00
Em Sharnoff
8a316b1277 vm-monitor: Log full error on message handling failure (#5604)
There's currently an issue with the vm-monitor on staging that's not
really feasible to debug because the current display impl gives no
context to the errors (just says "failed to downscale").

Logging the full error should help.

For communications with the autoscaler-agent, it's ok to only provide
the outermost cause, because we can cross-reference with the VM logs.
At some point in the future, we may want to change that.
2023-10-19 21:56:50 -07:00
Em Sharnoff
4d13bae449 vm-monitor: Switch from memory.high to polling memory.stat (#5524)
tl;dr it's really hard to avoid throttling from memory.high, and it
counts tmpfs & page cache usage, so it's also hard to make sense of.

In the interest of fixing things quickly with something that should be
*good enough*, this PR switches to instead periodically fetch memory
statistics from the cgroup's memory.stat and use that data to determine
if and when we should upscale.

This PR fixes #5444, which has a lot more detail on the difficulties
we've hit with memory.high. This PR also supersedes #5488.
2023-10-19 21:56:36 -07:00
Vadim Kharitonov
49377abd98 Merge pull request #5577 from neondatabase/releases/2023-10-17
Release 2023-10-17
2023-10-17 12:21:20 +02:00
Christian Schwarz
a6b2f4e54e limit imitate accesses concurrency, using same semaphore as compactions (#5578)
Before this PR, when we restarted pageserver, we'd see a rush of
`$number_of_tenants` concurrent eviction tasks starting to do imitate
accesses building up in the period of `[init_order allows activations,
$random_access_delay + EvictionPolicyLayerAccessThreshold::period]`.

We simply cannot handle that degree of concurrent IO.

We already solved the problem for compactions by adding a semaphore.
So, this PR shares that semaphore for use by evictions.

Part of https://github.com/neondatabase/neon/issues/5479

Which is again part of https://github.com/neondatabase/neon/issues/4743

Risks / Changes In System Behavior
==================================

* we don't do evictions as timely as we currently do
* we log a bunch of warnings about eviction taking too long
* imitate accesses and compactions compete for the same concurrency
limit, so, they'll slow each other down through this shares semaphore

Changes
=======

- Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs`
- Rename it to `CONCURRENT_BACKGROUND_TASKS`
- Use it also for the eviction imitate accesses:
    - Imitate acceses are both per-TIMELINE and per-TENANT
    - The per-TENANT is done through coalescing all the per-TIMELINE
      tasks via a tokio mutex `eviction_task_tenant_state`.
    - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the
      beginning of the eviction iteration, much before the imitate
      acesses start (and they may not even start at all in the given
      iteration, as they happen only every $threshold).
    - Acquiring early is **sub-optimal** because when the per-timline
      tasks coalesce on the `eviction_task_tenant_state` mutex,
      they are already holding a CONCURRENT_BACKGROUND_TASKS permit.
    - It's also unfair because tenants with many timelines win
      the CONCURRENT_BACKGROUND_TASKS more often.
    - I don't think there's another way though, without refactoring
      more of the imitate accesses logic, e.g, making it all per-tenant.
- Add metrics for queue depth behind the semaphore.
I found these very useful to understand what work is queued in the
system.

    - The metrics are tagged by the new `BackgroundLoopKind`.
    - On a green slate, I would have used `TaskKind`, but we already had
      pre-existing labels whose names didn't map exactly to task kind.
      Also the task kind is kind of a lower-level detail, so, I think
it's fine to have a separate enum to identify background work kinds.

Future Work
===========

I guess I could move the eviction tasks from a ticker to "sleep for
$period".
The benefit would be that the semaphore automatically "smears" the
eviction task scheduling over time, so, we only have the rush on restart
but a smeared-out rush afterward.

The downside is that this perverts the meaning of "$period", as we'd
actually not run the eviction at a fixed period. It also means the the
"took to long" warning & metric becomes meaningless.

Then again, that is already the case for the compaction and gc tasks,
which do sleep for `$period` instead of using a ticker.

(cherry picked from commit 9256788273)
2023-10-17 12:16:26 +02:00
Shany Pozin
face60d50b Merge pull request #5526 from neondatabase/releases/2023-10-11
Release 2023-10-11
2023-10-11 11:16:39 +03:00
Shany Pozin
9768aa27f2 Merge pull request #5516 from neondatabase/releases/2023-10-10
Release 2023-10-10
2023-10-10 14:16:47 +03:00
Shany Pozin
96b2e575e1 Merge pull request #5445 from neondatabase/releases/2023-10-03
Release 2023-10-03
2023-10-04 13:53:37 +03:00
Alexander Bayandin
7222777784 Update checksums for pg_jsonschema & pg_graphql (#5455)
## Problem

Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to
increase timeouts on their CI), for us, these are a noop changes, 
but unfortunately, this will cause our builds to fail due to checksums 
mismatch (this might not strike right away because of the build cache).
- 8ba7c7be9d
- aa7509370a

## Summary of changes
- `pg_jsonschema` update checksum
- `pg_graphql` update checksum
2023-10-03 18:44:30 +01:00
Em Sharnoff
5469fdede0 Merge pull request #5422 from neondatabase/sharnoff/rc-2023-09-28-fix-restart-on-postmaster-SIGKILL
Release 2023-09-28: Fix (lack of) restart on neonvm postmaster SIGKILL
2023-09-28 10:48:51 -07:00
MMeent
72aa6b9fdd Fix neon_zeroextend's WAL logging (#5387)
When you log more than a few blocks, you need to reserve the space in
advance. We didn't do that, so we got errors. Now we do that, and
shouldn't get errors.
2023-09-28 09:37:28 -07:00
Em Sharnoff
ae0634b7be Bump vm-builder v0.17.11 -> v0.17.12 (#5407)
Only relevant change is neondatabase/autoscaling#534 - refer there for
more details.
2023-09-28 09:28:04 -07:00
Shany Pozin
70711f32fa Merge pull request #5375 from neondatabase/releases/2023-09-26
Release 2023-09-26
2023-09-26 15:19:45 +03:00
Vadim Kharitonov
52a88af0aa Merge pull request #5336 from neondatabase/releases/2023-09-19
Release 2023-09-19
2023-09-19 11:16:43 +02:00
Alexander Bayandin
b7a43bf817 Merge branch 'release' into releases/2023-09-19 2023-09-19 09:07:20 +01:00
Alexander Bayandin
dce91b33a4 Merge pull request #5318 from neondatabase/releases/2023-09-15-1
Postgres 14/15: Use previous extensions versions
2023-09-15 16:30:44 +01:00
Alexander Bayandin
23ee4f3050 Revert plv8 only 2023-09-15 15:45:23 +01:00
Alexander Bayandin
46857e8282 Postgres 14/15: Use previous extensions versions 2023-09-15 15:27:00 +01:00
Alexander Bayandin
368ab0ce54 Merge pull request #5313 from neondatabase/releases/2023-09-15
Release 2023-09-15
2023-09-15 10:39:56 +01:00
Konstantin Knizhnik
a5987eebfd References to old and new blocks were mixed in xlog_heap_update handler (#5312)
## Problem

See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029

https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178

## Summary of changes

Swap old/new block references

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-15 10:11:41 +01:00
Alexander Bayandin
6686ede30f Update checksum for pg_hint_plan (#5309)
## Problem

The checksum for `pg_hint_plan` doesn't match:
```
sha256sum: WARNING: 1 computed checksum did NOT match
```

Ref
https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307

It seems that the release was retagged yesterday:
https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0

I don't see any malicious changes from 15_1.5.1:
https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0,
so it should be ok to update.

## Summary of changes
- Update checksum for `pg_hint_plan` 16_1.6.0
2023-09-15 09:54:42 +01:00
Em Sharnoff
373c7057cc vm-monitor: Fix cgroup throttling (#5303)
I believe this (not actual IO problems) is the cause of the "disk speed
issue" that we've had for VMs recently. See e.g.:

1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH
2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659

The vm-informant (and now, the vm-monitor, its replacement) is supposed
to gradually increase the `neon-postgres` cgroup's memory.high value,
because otherwise the kernel will throttle all the processes in the
cgroup.

This PR fixes a bug with the vm-monitor's implementation of this
behavior.

---

Other references, for the vm-informant's implementation:

- Original issue: neondatabase/autoscaling#44
- Original PR: neondatabase/autoscaling#223
2023-09-15 09:54:42 +01:00
Shany Pozin
7d6ec16166 Merge pull request #5296 from neondatabase/releases/2023-09-13
Release 2023-09-13
2023-09-13 13:49:14 +03:00
Shany Pozin
0e6fdc8a58 Merge pull request #5283 from neondatabase/releases/2023-09-12
Release 2023-09-12
2023-09-12 14:56:47 +03:00
Christian Schwarz
521438a5c6 fix deadlock around TENANTS (#5285)
The sequence that can lead to a deadlock:

1. DELETE request gets all the way to `tenant.shutdown(progress,
false).await.is_err() ` , while holding TENANTS.read()
2. POST request for tenant creation comes in, calls `tenant_map_insert`,
it does `let mut guard = TENANTS.write().await;`
3. Something that `tenant.shutdown()` needs to wait for needs a
`TENANTS.read().await`.
The only case identified in exhaustive manual scanning of the code base
is this one:
Imitate size access does `get_tenant().await`, which does
`TENANTS.read().await` under the hood.

In the above case (1) waits for (3), (3)'s read-lock request is queued
behind (2)'s write-lock, and (2) waits for (1).
Deadlock.

I made a reproducer/proof-that-above-hypothesis-holds in
https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for
merge yet and we want the fix _now_.

fixes https://github.com/neondatabase/neon/issues/5284
2023-09-12 14:13:13 +03:00
Vadim Kharitonov
07d7874bc8 Merge pull request #5202 from neondatabase/releases/2023-09-05
Release 2023-09-05
2023-09-05 12:16:06 +02:00
Anastasia Lubennikova
1804111a02 Merge pull request #5161 from neondatabase/rc-2023-08-31
Release 2023-08-31
2023-08-31 16:53:17 +03:00
Arthur Petukhovsky
cd0178efed Merge pull request #5150 from neondatabase/release-sk-fix-active-timeline
Release 2023-08-30
2023-08-30 11:43:39 +02:00
Shany Pozin
333574be57 Merge pull request #5133 from neondatabase/releases/2023-08-29
Release 2023-08-29
2023-08-29 14:02:58 +03:00
Alexander Bayandin
79a799a143 Merge branch 'release' into releases/2023-08-29 2023-08-29 11:17:57 +01:00
Conrad Ludgate
9da06af6c9 Merge pull request #5113 from neondatabase/release-http-connection-fix
Release 2023-08-25
2023-08-25 17:21:35 +01:00
Conrad Ludgate
ce1753d036 proxy: dont return connection pending (#5107)
## Problem

We were returning Pending when a connection had a notice/notification
(introduced recently in #5020). When returning pending, the runtime
assumes you will call `cx.waker().wake()` in order to continue
processing.

We weren't doing that, so the connection task would get stuck

## Summary of changes

Don't return pending. Loop instead
2023-08-25 16:42:30 +01:00
Alek Westover
67db8432b4 Fix cargo deny errors (#5068)
## Problem
cargo deny lint broken

Links to the CVEs:

[rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052)

[rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053)
One is fixed, the other one isn't so we allow it (for now), to unbreak
CI. Then later we'll try to get rid of webpki in favour of the rustls
fork.

## Summary of changes
```
+ignore = ["RUSTSEC-2023-0052"]
```
2023-08-25 16:42:30 +01:00
Vadim Kharitonov
4e2e44e524 Enable neon-pool-opt-in (#5062) 2023-08-22 09:06:14 +01:00
Vadim Kharitonov
ed786104f3 Merge pull request #5060 from neondatabase/releases/2023-08-22
Release 2023-08-22
2023-08-22 09:41:02 +02:00
Stas Kelvich
84b74f2bd1 Merge pull request #4997 from neondatabase/sk/proxy-release-23-07-15
Fix lint
2023-08-15 18:54:20 +03:00
Arthur Petukhovsky
fec2ad6283 Fix lint 2023-08-15 18:49:02 +03:00
Stas Kelvich
98eebd4682 Merge pull request #4996 from neondatabase/sk/proxy_release
Disable neon-pool-opt-in
2023-08-15 18:37:50 +03:00
Arthur Petukhovsky
2f74287c9b Disable neon-pool-opt-in 2023-08-15 18:34:17 +03:00
Shany Pozin
aee1bf95e3 Merge pull request #4990 from neondatabase/releases/2023-08-15
Release 2023-08-15
2023-08-15 15:34:38 +03:00
Shany Pozin
b9de9d75ff Merge branch 'release' into releases/2023-08-15 2023-08-15 14:35:00 +03:00
Stas Kelvich
7943b709e6 Merge pull request #4940 from neondatabase/sk/release-23-05-25-proxy-fixup
Release: proxy retry fixup
2023-08-09 13:53:19 +03:00
Conrad Ludgate
d7d066d493 proxy: delay auth on retry (#4929)
## Problem

When an endpoint is shutting down, it can take a few seconds. Currently
when starting a new compute, this causes an "endpoint is in transition"
error. We need to add delays before retrying to ensure that we allow
time for the endpoint to shutdown properly.

## Summary of changes

Adds a delay before retrying in auth. connect_to_compute already has
this delay
2023-08-09 12:54:24 +03:00
Felix Prasanna
e78ac22107 release fix: revert vm builder bump from 0.13.1 -> 0.15.0-alpha1 (#4932)
This reverts commit 682dfb3a31.

hotfix for a CLI arg issue in the monitor
2023-08-08 21:08:46 +03:00
Vadim Kharitonov
76a8f2bb44 Merge pull request #4923 from neondatabase/releases/2023-08-08
Release 2023-08-08
2023-08-08 11:44:38 +02:00
Vadim Kharitonov
8d59a8581f Merge branch 'release' into releases/2023-08-08 2023-08-08 10:54:34 +02:00
Vadim Kharitonov
b1ddd01289 Define NEON_SMGR to make it possible for extensions to use Neon SMG API (#4889)
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-08-03 16:28:31 +03:00
Alexander Bayandin
6eae4fc9aa Release 2023-08-02: update pg_embedding (#4877)
Cherry-picking ca4d71a954 from `main` into
the `release`

Co-authored-by: Vadim Kharitonov <vadim2404@users.noreply.github.com>
2023-08-03 08:48:09 +02:00
Christian Schwarz
765455bca2 Merge pull request #4861 from neondatabase/releases/2023-08-01--2-fix-pipeline
ci: fix upload-postgres-extensions-to-s3 job
2023-08-01 13:22:07 +02:00
Christian Schwarz
4204960942 ci: fix upload-postgres-extensions-to-s3 job
commit

	commit 5f8fd640bf
	Author: Alek Westover <alek.westover@gmail.com>
	Date:   Wed Jul 26 08:24:03 2023 -0400

	    Upload Test Remote Extensions (#4792)

switched to using the release tag instead of `latest`, but,
the `promote-images` job only uploads `latest` to the prod ECR.

The switch to using release tag was good in principle, but,
reverting that part to make the release pipeine work.

Note that a proper fix should abandon use of `:latest` tag
at all: currently, if a `main` pipeline runs concurrently
with a `release` pipeline, the `release` pipeline may end
up using the `main` pipeline's images.
2023-08-01 12:01:45 +02:00
Christian Schwarz
67345d66ea Merge pull request #4858 from neondatabase/releases/2023-08-01
Release 2023-08-01
2023-08-01 10:44:01 +02:00
Shany Pozin
2266ee5971 Merge pull request #4803 from neondatabase/releases/2023-07-25
Release 2023-07-25
2023-07-25 14:21:07 +03:00
Shany Pozin
b58445d855 Merge pull request #4746 from neondatabase/releases/2023-07-18
Release 2023-07-18
2023-07-18 14:45:39 +03:00
Conrad Ludgate
36050e7f3d Merge branch 'release' into releases/2023-07-18 2023-07-18 12:00:09 +01:00
Alexander Bayandin
33360ed96d Merge pull request #4705 from neondatabase/release-2023-07-12
Release 2023-07-12 (only proxy)
2023-07-12 19:44:36 +01:00
Conrad Ludgate
39a28d1108 proxy wake_compute loop (#4675)
## Problem

If we fail to wake up the compute node, a subsequent connect attempt
will definitely fail. However, kubernetes won't fail the connection
immediately, instead it hangs until we timeout (10s).

## Summary of changes

Refactor the loop to allow fast retries of compute_wake and to skip a
connect attempt.
2023-07-12 18:40:11 +01:00
Conrad Ludgate
efa6aa134f allow repeated IO errors from compute node (#4624)
## Problem

#4598 compute nodes are not accessible some time after wake up due to
kubernetes DNS not being fully propagated.

## Summary of changes

Update connect retry mechanism to support handling IO errors and
sleeping for 100ms

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-07-12 18:40:06 +01:00
Alexander Bayandin
2c724e56e2 Merge pull request #4646 from neondatabase/releases/2023-07-06-hotfix
Release 2023-07-06 (add pg_embedding extension only)
2023-07-06 12:19:52 +01:00
Alexander Bayandin
feff887c6f Compile pg_embedding extension (#4634)
```
CREATE EXTENSION embedding;
CREATE TABLE t (val real[]);
INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL);
CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3);
INSERT INTO t (val) VALUES (array[1,2,4]);

SELECT * FROM t ORDER BY val <-> array[3,3,3];
   val   
---------
 {1,2,3}
 {1,2,4}
 {1,1,1}
 {0,0,0}
 
(5 rows)
```
2023-07-06 09:39:41 +01:00
Vadim Kharitonov
353d915fcf Merge pull request #4633 from neondatabase/releases/2023-07-05
Release 2023-07-05
2023-07-05 15:10:47 +02:00
Vadim Kharitonov
2e38098cbc Merge branch 'release' into releases/2023-07-05 2023-07-05 12:41:48 +02:00
Vadim Kharitonov
a6fe5ea1ac Merge pull request #4571 from neondatabase/releases/2023-06-27
Release 2023-06-27
2023-06-27 12:55:33 +02:00
Vadim Kharitonov
05b0aed0c1 Merge branch 'release' into releases/2023-06-27 2023-06-27 12:22:12 +02:00
Alex Chi Z
cd1705357d Merge pull request #4561 from neondatabase/releases/2023-06-23-hotfix
Release 2023-06-23 (pageserver-only)
2023-06-23 15:38:50 -04:00
Christian Schwarz
6bc7561290 don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker
The consumption metrics synthetic size worker does logical size calculation.
Logical size calculation currently does synchronous disk IO.
This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures.

While there's work on the way to move the synchronous disk IO into spawn_blocking,
the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME.

Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME
executor threads on CPU or sync disk IO.
That work isn't done yet, as many of the mgmt tasks still _do_ disk IO.
But it's not as intensive as the logical size calculations that we're fixing here.

While we're at it, fix disk-usage-based eviction in a similar way.
It wasn't the culprit here, according to prod logs, but it can theoretically be
a little CPU-intensive.

More context, including graphs from Prod:
https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949

(cherry picked from commit d6e35222ea)
2023-06-23 20:54:07 +02:00
Christian Schwarz
fbd3ac14b5 Merge pull request #4544 from neondatabase/releases/2023-06-21-hotfix
Release 2023-06-21 (fixup for post-merge failed 2023-06-20)
2023-06-21 16:54:34 +03:00
Christian Schwarz
e437787c8f cargo update -p openssl (#4542)
To unblock release
https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054

Context: https://rustsec.org/advisories/RUSTSEC-2023-0044
2023-06-21 15:52:56 +03:00
Christian Schwarz
3460dbf90b Merge pull request #4536 from neondatabase/releases/2023-06-20
Release 2023-06-20 (actually 2023-06-21)
2023-06-21 14:19:14 +03:00
Vadim Kharitonov
6b89d99677 Merge pull request #4521 from neondatabase/release_2023-06-15
Release 2023 06 15
2023-06-15 17:40:01 +02:00
Vadim Kharitonov
6cc8ea86e4 Merge branch 'main' into release_2023-06-15 2023-06-15 16:50:44 +02:00
Shany Pozin
e62a492d6f Merge pull request #4486 from neondatabase/releases/2023-06-13
Release 2023-06-13
2023-06-13 15:21:35 +03:00
Alexey Kondratov
a475cdf642 [compute_ctl] Fix logging if catalog updates are skipped (#4480)
Otherwise, it wasn't clear from the log when Postgres started up
completely if catalog updates were skipped.

Follow-up for 4936ab6
2023-06-13 13:37:24 +02:00
Stas Kelvich
7002c79a47 Merge pull request #4447 from neondatabase/release_proxy_08-06-2023
Release proxy 08 06 2023
2023-06-08 21:02:54 +03:00
Vadim Kharitonov
ee6cf357b4 Merge pull request #4427 from neondatabase/releases/2023-06-06
Release 2023-06-06
2023-06-06 14:42:21 +02:00
Vadim Kharitonov
e5c2086b5f Merge branch 'release' into releases/2023-06-06 2023-06-06 12:33:56 +02:00
Shany Pozin
5f1208296a Merge pull request #4395 from neondatabase/releases/2023-06-01
Release 2023-06-01
2023-06-01 10:58:00 +03:00
Stas Kelvich
88e8e473cd Merge pull request #4345 from neondatabase/release-23-05-25-proxy
Release 23-05-25, take 3
2023-05-25 19:40:43 +03:00
Stas Kelvich
b0a77844f6 Add SQL-over-HTTP endpoint to Proxy
This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON
response structure resembling that of the node-postgres driver. This method,
using HTTP POST, achieves smaller amortized latencies in edge setups due to
fewer round trips and an enhanced open connection reuse by the v8 engine.

This update involves several intricacies:
1. SQL injection protection: We employed the extended query protocol, modifying
   the rust-postgres driver to send queries in one roundtrip using a text
   protocol rather than binary, bypassing potential issues like those identified
   in https://github.com/sfackler/rust-postgres/issues/1030.

2. Postgres type compatibility: As not all postgres types have binary
   representations (e.g., acl's in pg_class), we adjusted rust-postgres to
   respond with text protocol, simplifying serialization and fixing queries with
   text-only types in response.

3. Data type conversion: Considering JSON supports fewer data types than
   Postgres, we perform conversions where possible, passing all other types as
   strings. Key conversions include:
   - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain
     text)
   - postgres bool, null, text -> json bool, null, string
   - postgres array -> json array
   - postgres json and jsonb -> json object

4. Alignment with node-postgres: To facilitate integration with js libraries,
   we've matched the response structure of node-postgres, returning command tags
   and column oids. Command tag capturing was added to the rust-postgres
   functionality as part of this change.
2023-05-25 17:59:17 +03:00
Vadim Kharitonov
1baf464307 Merge pull request #4309 from neondatabase/releases/2023-05-23
Release 2023-05-23
2023-05-24 11:56:54 +02:00
Alexander Bayandin
e9b8e81cea Merge branch 'release' into releases/2023-05-23 2023-05-23 12:54:08 +01:00
Alexander Bayandin
85d6194aa4 Fix regress-tests job for Postgres 15 on release branch (#4254)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:19:12 +01:00
Vadim Kharitonov
333a7a68ef Merge pull request #4245 from neondatabase/releases/2023-05-16
Release 2023-05-16
2023-05-16 13:38:40 +02:00
Vadim Kharitonov
6aa4e41bee Merge branch 'release' into releases/2023-05-16 2023-05-16 12:48:23 +02:00
Joonas Koivunen
840183e51f try: higher page_service timeouts to isolate an issue 2023-05-11 16:24:53 +03:00
Shany Pozin
cbccc94b03 Merge pull request #4184 from neondatabase/releases/2023-05-09
Release 2023-05-09
2023-05-09 15:30:36 +03:00
Stas Kelvich
fce227df22 Merge pull request #4163 from neondatabase/main
Release 23-05-05
2023-05-05 15:56:23 +03:00
Stas Kelvich
bd787e800f Merge pull request #4133 from neondatabase/main
Release 23-04-01
2023-05-01 18:52:46 +03:00
Shany Pozin
4a7704b4a3 Merge pull request #4131 from neondatabase/sp/hotfix_adding_sks_us_west
Hotfix: Adding 4 new pageservers and two sets of safekeepers to us west 2
2023-05-01 15:17:38 +03:00
Shany Pozin
ff1119da66 Add 2 new sets of safekeepers to us-west2 2023-05-01 14:35:31 +03:00
Shany Pozin
4c3ba1627b Add 4 new Pageservers for retool launch 2023-05-01 14:34:38 +03:00
Vadim Kharitonov
1407174fb2 Merge pull request #4110 from neondatabase/vk/release_2023-04-28
Release 2023 04 28
2023-04-28 17:43:16 +02:00
Vadim Kharitonov
ec9dcb1889 Merge branch 'release' into vk/release_2023-04-28 2023-04-28 16:32:26 +02:00
Joonas Koivunen
d11d781afc revert: "Add check for duplicates of generated image layers" (#4104)
This reverts commit 732acc5.

Reverted PR: #3869

As noted in PR #4094, we do in fact try to insert duplicates to the
layer map, if L0->L1 compaction is interrupted. We do not have a proper
fix for that right now, and we are in a hurry to make a release to
production, so revert the changes related to this to the state that we
have in production currently. We know that we have a bug here, but
better to live with the bug that we've had in production for a long
time, than rush a fix to production without testing it in staging first.

Cc: #4094, #4088
2023-04-28 16:31:35 +02:00
Anastasia Lubennikova
4e44565b71 Merge pull request #4000 from neondatabase/releases/2023-04-11
Release 2023-04-11
2023-04-11 17:47:41 +03:00
Stas Kelvich
4ed51ad33b Add more proxy cnames 2023-04-11 15:59:35 +03:00
Arseny Sher
1c1ebe5537 Merge pull request #3946 from neondatabase/releases/2023-04-04
Release 2023-04-04
2023-04-04 14:38:40 +04:00
Christian Schwarz
c19cb7f386 Merge pull request #3935 from neondatabase/releases/2023-04-03
Release 2023-04-03
2023-04-03 16:19:49 +02:00
Vadim Kharitonov
4b97d31b16 Merge pull request #3896 from neondatabase/releases/2023-03-28
Release 2023-03-28
2023-03-28 17:58:06 +04:00
Shany Pozin
923ade3dd7 Merge pull request #3855 from neondatabase/releases/2023-03-21
Release 2023-03-21
2023-03-21 13:12:32 +02:00
Arseny Sher
b04e711975 Merge pull request #3825 from neondatabase/release-2023-03-15
Release 2023.03.15
2023-03-15 15:38:00 +03:00
Arseny Sher
afd0a6b39a Forward framed read buf contents to compute before proxy pass.
Otherwise they get lost. Normally buffer is empty before proxy pass, but this is
not the case with pipeline mode of out npm driver; fixes connection hangup
introduced by b80fe41af3 for it.

fixes https://github.com/neondatabase/neon/issues/3822
2023-03-15 15:36:06 +04:00
Lassi Pölönen
99752286d8 Use RollingUpdate strategy also for legacy proxy (#3814)
## Describe your changes
We have previously changed the neon-proxy to use RollingUpdate. This
should be enabled in legacy proxy too in order to avoid breaking
connections for the clients and allow for example backups to run even
during deployment. (https://github.com/neondatabase/neon/pull/3683)

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-03-15 15:35:51 +04:00
Arseny Sher
15df93363c Merge pull request #3804 from neondatabase/release-2023-03-13
Release 2023.03.13
2023-03-13 20:25:40 +03:00
Vadim Kharitonov
bc0ab741af Merge pull request #3758 from neondatabase/releases/2023-03-07
Release 2023-03-07
2023-03-07 12:38:47 +01:00
Christian Schwarz
51d9dfeaa3 Merge pull request #3743 from neondatabase/releases/2023-03-03
Release 2023-03-03
2023-03-03 19:20:21 +01:00
Shany Pozin
f63cb18155 Merge pull request #3713 from neondatabase/releases/2023-02-28
Release 2023-02-28
2023-02-28 12:52:24 +02:00
Arseny Sher
0de603d88e Merge pull request #3707 from neondatabase/release-2023-02-24
Release 2023-02-24

Hotfix for UNLOGGED tables. Contains #3706
Also contains rebase on 14.7 and 15.2 #3581
2023-02-25 00:32:11 +04:00
Heikki Linnakangas
240913912a Fix UNLOGGED tables.
Instead of trying to create missing files on the way, send init fork contents as
main fork from pageserver during basebackup. Add test for that. Call
put_rel_drop for init forks; previously they weren't removed. Bump
vendor/postgres to revert previous approach on Postgres side.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

ref https://github.com/neondatabase/postgres/pull/264
ref https://github.com/neondatabase/postgres/pull/259
ref https://github.com/neondatabase/neon/issues/1222
2023-02-24 23:54:53 +04:00
MMeent
91a4ea0de2 Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581)
## Describe your changes
Rebase vendored PostgreSQL onto 14.7 and 15.2

## Issue ticket number and link

#3579

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
    ```
The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL
14 and 15.2 for PostgreSQL 15.
    ```
2023-02-24 23:54:42 +04:00
Arseny Sher
8608704f49 Merge pull request #3691 from neondatabase/release-2023-02-23
Release 2023-02-23

Hotfix for the unlogged tables with indexes issue.

neondatabase/postgres#259
neondatabase/postgres#262
2023-02-23 13:39:33 +04:00
Arseny Sher
efef68ce99 Bump vendor/postgres to include hotfix for unlogged tables with indexes.
https://github.com/neondatabase/postgres/pull/259
https://github.com/neondatabase/postgres/pull/262
2023-02-23 08:49:43 +04:00
Joonas Koivunen
8daefd24da Merge pull request #3679 from neondatabase/releases/2023-02-22
Releases/2023-02-22
2023-02-22 15:56:55 +02:00
Arthur Petukhovsky
46cc8b7982 Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671)
We migrated all timelines to
`safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be
removed.
2023-02-22 15:07:57 +02:00
Sergey Melnikov
38cd90dd0c Add -v to ansible invocations (#3670)
To get more debug output on failures
2023-02-22 15:07:57 +02:00
Joonas Koivunen
a51b269f15 fix: hold permit until GetObject eof (#3663)
previously we applied the ratelimiting only up to receiving the headers
from s3, or somewhere near it. the commit adds an adapter which carries
the permit until the AsyncRead has been disposed.

fixes #3662.
2023-02-22 15:07:57 +02:00
Joonas Koivunen
43bf6d0a0f calculate_logical_size: no longer use spawn_blocking (#3664)
Calculation of logical size is now async because of layer downloads, so
we shouldn't use spawn_blocking for it. Use of `spawn_blocking`
exhausted resources which are needed by `tokio::io::copy` when copying
from a stream to a file which lead to deadlock.

Fixes: #3657
2023-02-22 15:07:57 +02:00
Joonas Koivunen
15273a9b66 chore: ignore all compaction inactive tenant errors (#3665)
these are happening in tests because of #3655 but they sure took some
time to appear.

makes the `Compaction failed, retrying in 2s: Cannot run compaction
iteration on inactive tenant` into a globally allowed error, because it
has been seen failing on different test cases.
2023-02-22 15:07:57 +02:00
Joonas Koivunen
78aca668d0 fix: log download failed error (#3661)
Fixes #3659
2023-02-22 15:07:57 +02:00
Vadim Kharitonov
acbf4148ea Merge pull request #3656 from neondatabase/releases/2023-02-21
Release 2023-02-21
2023-02-21 16:03:48 +01:00
Vadim Kharitonov
6508540561 Merge branch 'release' into releases/2023-02-21 2023-02-21 15:31:16 +01:00
Arthur Petukhovsky
a41b5244a8 Add new safekeeper to ap-southeast-1 prod (#3645) (#3646)
To trigger deployment of #3645 to production.
2023-02-20 15:22:49 +00:00
Shany Pozin
2b3189be95 Merge pull request #3600 from neondatabase/releases/2023-02-14
Release 2023-02-14
2023-02-15 13:31:30 +02:00
Vadim Kharitonov
248563c595 Merge pull request #3553 from neondatabase/releases/2023-02-07
Release 2023-02-07
2023-02-07 14:07:44 +01:00
Vadim Kharitonov
14cd6ca933 Merge branch 'release' into releases/2023-02-07 2023-02-07 12:11:56 +01:00
Vadim Kharitonov
eb36403e71 Release 2023 01 31 (#3497)
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Sergey Melnikov <sergey@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>
2023-01-31 15:06:35 +02:00
Anastasia Lubennikova
3c6f779698 Merge pull request #3411 from neondatabase/release_2023_01_23
Fix Release 2023 01 23
2023-01-23 20:10:03 +02:00
Joonas Koivunen
f67f0c1c11 More tenant size fixes (#3410)
Small changes, but hopefully this will help with the panic detected in
staging, for which we cannot get the debugging information right now
(end-of-branch before branch-point).
2023-01-23 17:46:13 +02:00
Shany Pozin
edb02d3299 Adding pageserver3 to staging (#3403) 2023-01-23 17:46:13 +02:00
Konstantin Knizhnik
664a69e65b Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354) 2023-01-23 17:46:13 +02:00
Anastasia Lubennikova
478322ebf9 Fix tenant size orphans (#3377)
Before only the timelines which have passed the `gc_horizon` were
processed which failed with orphans at the tree_sort phase. Example
input in added `test_branched_empty_timeline_size` test case.

The PR changes iteration to happen through all timelines, and in
addition to that, any learned branch points will be calculated as they
would had been in the original implementation if the ancestor branch had
been over the `gc_horizon`.

This also changes how tenants where all timelines are below `gc_horizon`
are handled. Previously tenant_size 0 was returned, but now they will
have approximately `initdb_lsn` worth of tenant_size.

The PR also adds several new tenant size tests that describe various corner
cases of branching structure and `gc_horizon` setting.
They are currently disabled to not consume time during CI.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-01-23 17:46:13 +02:00
Joonas Koivunen
802f174072 fix: dont stop pageserver if we fail to calculate synthetic size 2023-01-23 17:46:13 +02:00
Alexey Kondratov
47f9890bae [compute_ctl] Make role deletion spec processing idempotent (#3380)
Previously, we were trying to re-assign owned objects of the already
deleted role. This were causing a crash loop in the case when compute
was restarted with a spec that includes delta operation for role
deletion. To avoid such cases, check that role is still present before
calling `reassign_owned_objects`.

Resolves neondatabase/cloud#3553
2023-01-23 17:46:13 +02:00
Christian Schwarz
262265daad Revert "Use actual temporary dir for pageserver unit tests"
This reverts commit 826e89b9ce.

The problem with that commit was that it deletes the TempDir while
there are still EphemeralFile instances open.

At first I thought this could be fixed by simply adding

  Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None))

to TenantHarness::drop, but it turned out to be insufficient.

So, reverting the commit until we find a proper solution.

refs https://github.com/neondatabase/neon/issues/3385
2023-01-23 17:46:13 +02:00
bojanserafimov
300da5b872 Improve layer map docstrings (#3382) 2023-01-23 17:46:13 +02:00
Heikki Linnakangas
7b22b5c433 Switch to 'tracing' for logging, restructure code to make use of spans.
Refactors Compute::prepare_and_run. It's split into subroutines
differently, to make it easier to attach tracing spans to the
different stages. The high-level logic for waiting for Postgres to
exit is moved to the caller.

Replace 'env_logger' with 'tracing', and add `#instrument` directives
to different stages fo the startup process. This is a fairly
mechanical change, except for the changes in 'spec.rs'. 'spec.rs'
contained some complicated formatting, where parts of log messages
were printed directly to stdout with `print`s. That was a bit messed
up because the log normally goes to stderr, but those lines were
printed to stdout. In our docker images, stderr and stdout both go to
the same place so you wouldn't notice, but I don't think it was
intentional.

This changes the log format to the default
'tracing_subscriber::format' format. It's different from the Postgres
log format, however, and because both compute_tools and Postgres print
to the same log, it's now a mix of two different formats.  I'm not
sure how the Grafana log parsing pipeline can handle that. If it's a
problem, we can build custom formatter to change the compute_tools log
format to be the same as Postgres's, like it was before this commit,
or we can change the Postgres log format to match tracing_formatter's,
or we can start printing compute_tool's log output to a different
destination than Postgres
2023-01-23 17:46:12 +02:00
Kirill Bulatov
ffca97bc1e Enable logs in unit tests 2023-01-23 17:46:12 +02:00
Kirill Bulatov
cb356f3259 Use actual temporary dir for pageserver unit tests 2023-01-23 17:46:12 +02:00
Vadim Kharitonov
c85374295f Change SENTRY_ENVIRONMENT from "development" to "staging" 2023-01-23 17:46:12 +02:00
Anastasia Lubennikova
4992160677 Fix metric_collection_endpoint for prod.
It was incorrectly set to staging url
2023-01-23 17:46:12 +02:00
Heikki Linnakangas
bd535b3371 If an error happens while checking for core dumps, don't panic.
If we panic, we skip the 30s wait in 'main', and don't give the
console a chance to observe the error. Which is not nice.

Spotted by @ololobus at
https://github.com/neondatabase/neon/pull/3352#discussion_r1072806981
2023-01-23 17:46:12 +02:00
Kirill Bulatov
d90c5a03af Add more io::Error context when fail to operate on a path (#3254)
I have a test failure that shows 

```
Caused by:
    0: Failed to reconstruct a page image:
    1: Directory not empty (os error 39)
```

but does not really show where exactly that happens.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d

The PR aims to add more context in debugging that issue.
2023-01-23 17:46:12 +02:00
Anastasia Lubennikova
2d02cc9079 Merge pull request #3365 from neondatabase/main
Release 2023-01-17
2023-01-17 16:41:34 +02:00
Christian Schwarz
49ad94b99f Merge pull request #3301 from neondatabase/release-2023-01-10
Release 2023-01-10
2023-01-10 16:42:26 +01:00
Christian Schwarz
948a217398 Merge commit '95bf19b85a06b27a7fc3118dee03d48648efab15' into release-2023-01-10
Conflicts:
        .github/helm-values/neon-stress.proxy-scram.yaml
        .github/helm-values/neon-stress.proxy.yaml
        .github/helm-values/staging.proxy-scram.yaml
        .github/helm-values/staging.proxy.yaml
        All of the above were deleted in `main` after we hotfixed them
        in `release. Deleting them here
        storage_broker/src/bin/storage_broker.rs
        Hotfix toned down logging, but `main` has sinced implemented
        a proper fix. Taken `main`'s side, see
        https://neondb.slack.com/archives/C033RQ5SPDH/p1673354385387479?thread_ts=1673354306.474729&cid=C033RQ5SPDH

closes https://github.com/neondatabase/neon/issues/3287
2023-01-10 15:40:14 +01:00
Dmitry Rodionov
125381eae7 Merge pull request #3236 from neondatabase/dkr/retrofit-sk4-sk4-change
Move zenith-1-sk-3 to zenith-1-sk-4 (#3164)
2022-12-30 14:13:50 +03:00
Arthur Petukhovsky
cd01bbc715 Move zenith-1-sk-3 to zenith-1-sk-4 (#3164) 2022-12-30 12:32:52 +02:00
Dmitry Rodionov
d8b5e3b88d Merge pull request #3229 from neondatabase/dkr/add-pageserver-for-release
add pageserver to new region see https://github.com/neondatabase/aws/pull/116

decrease log volume for pageserver
2022-12-30 12:34:04 +03:00
Dmitry Rodionov
06d25f2186 switch to debug from info to produce less noise 2022-12-29 17:48:47 +02:00
Dmitry Rodionov
f759b561f3 add pageserver to new region see https://github.com/neondatabase/aws/pull/116 2022-12-29 17:17:35 +02:00
Sergey Melnikov
ece0555600 Push proxy metrics to Victoria Metrics (#3106) 2022-12-16 14:44:49 +02:00
Joonas Koivunen
73ea0a0b01 fix(remote_storage): use cached credentials (#3128)
IMDSv2 has limits, and if we query it on every s3 interaction we are
going to go over those limits. Changes the s3_bucket client
configuration to use:
- ChainCredentialsProvider to handle env variables or imds usage
- LazyCachingCredentialsProvider to actually cache any credentials

Related: https://github.com/awslabs/aws-sdk-rust/issues/629
Possibly related: https://github.com/neondatabase/neon/issues/3118
2022-12-16 14:44:49 +02:00
Arseny Sher
d8f6d6fd6f Merge pull request #3126 from neondatabase/broker-lb-release
Deploy broker with L4 LB in new env.
2022-12-16 01:25:28 +03:00
Arseny Sher
d24de169a7 Deploy broker with L4 LB in new env.
Seems to be fixing issue with missing keepalives.
2022-12-16 01:45:32 +04:00
Arseny Sher
0816168296 Hotfix: terminate subscription if channel is full.
Might help as a hotfix, but need to understand root better.
2022-12-15 12:23:56 +03:00
Dmitry Rodionov
277b44d57a Merge pull request #3102 from neondatabase/main
Hotfix. See commits for details
2022-12-14 19:38:43 +03:00
MMeent
68c2c3880e Merge pull request #3038 from neondatabase/main
Release 22-12-14
2022-12-14 14:35:47 +01:00
Arthur Petukhovsky
49da498f65 Merge pull request #2833 from neondatabase/main
Release 2022-11-16
2022-11-17 08:44:10 +01:00
Stas Kelvich
2c76ba3dd7 Merge pull request #2718 from neondatabase/main-rc-22-10-28
Release 22-10-28
2022-10-28 20:33:56 +03:00
Arseny Sher
dbe3dc69ad Merge branch 'main' into main-rc-22-10-28
Release 22-10-28.
2022-10-28 19:10:11 +04:00
Arseny Sher
8e5bb3ed49 Enable etcd compaction in neon_local. 2022-10-27 12:53:20 +03:00
Stas Kelvich
ab0be7b8da Avoid debian-testing packages in compute Dockerfiles
plv8 can only be built with a fairly new gold linker version. We used to install
it via binutils packages from testing, but it also updates libc and that causes
troubles in the resulting image as different extensions were built against
different libc versions. We could either use libc from debian-testing everywhere
or restrain from using testing packages and install necessary programs manually.
This patch uses the latter approach: gold for plv8 and cmake for h3 are
installed manually.

In a passing declare h3_postgis as a safe extension (previous omission).
2022-10-27 12:53:20 +03:00
bojanserafimov
b4c55f5d24 Move pagestream api to libs/pageserver_api (#2698) 2022-10-27 12:53:20 +03:00
mikecaat
ede70d833c Add a docker-compose example file (#1943) (#2666)
Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>
2022-10-27 12:53:20 +03:00
Sergey Melnikov
70c3d18bb0 Do not release to new staging proxies on release (#2685) 2022-10-27 12:53:20 +03:00
bojanserafimov
7a491f52c4 Add draw_timeline binary (#2688) 2022-10-27 12:53:20 +03:00
Alexander Bayandin
323c4ecb4f Add data format backward compatibility tests (#2626) 2022-10-27 12:53:20 +03:00
Anastasia Lubennikova
3d2466607e Merge pull request #2692 from neondatabase/main-rc
Release 2022-10-25
2022-10-25 18:18:58 +03:00
Anastasia Lubennikova
ed478b39f4 Merge branch 'release' into main-rc 2022-10-25 17:06:33 +03:00
Stas Kelvich
91585a558d Merge pull request #2678 from neondatabase/stas/hotfix_schema
Hotfix to disable grant create on public schema
2022-10-22 02:54:31 +03:00
Stas Kelvich
93467eae1f Hotfix to disable grant create on public schema
`GRANT CREATE ON SCHEMA public` fails if there is no schema `public`.
Disable it in release for now and make a better fix later (it is
needed for v15 support).
2022-10-22 02:26:28 +03:00
Stas Kelvich
f3aac81d19 Merge pull request #2668 from neondatabase/main
Release 2022-10-21
2022-10-21 15:21:42 +03:00
Stas Kelvich
979ad60c19 Merge pull request #2581 from neondatabase/main
Release 2022-10-07
2022-10-07 16:50:55 +03:00
Stas Kelvich
9316cb1b1f Merge pull request #2573 from neondatabase/main
Release 2022-10-06
2022-10-07 11:07:06 +03:00
Anastasia Lubennikova
e7939a527a Merge pull request #2377 from neondatabase/main
Release 2022-09-01
2022-09-01 20:20:44 +03:00
Arthur Petukhovsky
36d26665e1 Merge pull request #2299 from neondatabase/main
* Check for entire range during sasl validation (#2281)

* Gen2 GH runner (#2128)

* Re-add rustup override

* Try s3 bucket

* Set git version

* Use v4 cache key to prevent problems

* Switch to v5 for key

* Add second rustup fix

* Rebase

* Add kaniko steps

* Fix typo and set compress level

* Disable global run default

* Specify shell for step

* Change approach with kaniko

* Try less verbose shell spec

* Add submodule pull

* Add promote step

* Adjust dependency chain

* Try default swap again

* Use env

* Don't override aws key

* Make kaniko build conditional

* Specify runs on

* Try without dependency link

* Try soft fail

* Use image with git

* Try passing to next step

* Fix duplicate

* Try other approach

* Try other approach

* Fix typo

* Try other syntax

* Set env

* Adjust setup

* Try step 1

* Add link

* Try global env

* Fix mistake

* Debug

* Try other syntax

* Try other approach

* Change order

* Move output one step down

* Put output up one level

* Try other syntax

* Skip build

* Try output

* Re-enable build

* Try other syntax

* Skip middle step

* Update check

* Try first step of dockerhub push

* Update needs dependency

* Try explicit dir

* Add missing package

* Try other approach

* Try other approach

* Specify region

* Use with

* Try other approach

* Add debug

* Try other approach

* Set region

* Follow AWS example

* Try github approach

* Skip Qemu

* Try stdin

* Missing steps

* Add missing close

* Add echo debug

* Try v2 endpoint

* Use v1 endpoint

* Try without quotes

* Revert

* Try crane

* Add debug

* Split steps

* Fix duplicate

* Add shell step

* Conform to options

* Add verbose flag

* Try single step

* Try workaround

* First request fails hunch

* Try bullseye image

* Try other approach

* Adjust verbose level

* Try previous step

* Add more debug

* Remove debug step

* Remove rogue indent

* Try with larger image

* Add build tag step

* Update workflow for testing

* Add tag step for test

* Remove unused

* Update dependency chain

* Add ownership fix

* Use matrix for promote

* Force update

* Force build

* Remove unused

* Add new image

* Add missing argument

* Update dockerfile copy

* Update Dockerfile

* Update clone

* Update dockerfile

* Go to correct folder

* Use correct format

* Update dockerfile

* Remove cd

* Debug find where we are

* Add debug on first step

* Changedir to postgres

* Set workdir

* Use v1 approach

* Use other dependency

* Try other approach

* Try other approach

* Update dockerfile

* Update approach

* Update dockerfile

* Update approach

* Update dockerfile

* Update dockerfile

* Add workspace hack

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Change last step

* Cleanup pull in prep for review

* Force build images

* Add condition for latest tagging

* Use pinned version

* Try without name value

* Remove more names

* Shorten names

* Add kaniko comments

* Pin kaniko

* Pin crane and ecr helper

* Up one level

* Switch to pinned tag for rust image

* Force update for test

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box>

* Add missing step output, revert one deploy step (#2285)

* Add missing step output, revert one deploy step

* Conform to syntax

* Update approach

* Add missing value

* Add missing needs

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Error for fatal not git repo (#2286)

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Use main, not branch for ref check (#2288)

* Use main, not branch for ref check

* Add more debug

* Count main, not head

* Try new approach

* Conform to syntax

* Update approach

* Get full history

* Skip checkout

* Cleanup debug

* Remove more debug

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Fix docker zombie process issue (#2289)

* Fix docker zombie process issue

* Init everywhere

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Fix 1.63 clippy lints (#2282)

* split out timeline metrics, track layer map loading and size calculation

* reset rust cache for clippy run to avoid an ICE

additionally remove trailing whitespaces

* Rename pg_control_ffi.h to bindgen_deps.h, for clarity.

The pg_control_ffi.h name implies that it only includes stuff related to
pg_control.h. That's mostly true currently, but really the point of the
file is to include everything that we need to generate Rust definitions
from.

* Make local mypy behave like CI mypy (#2291)

* Fix flaky pageserver restarts in tests (#2261)

* Remove extra type aliases (#2280)

* Update cachepot endpoint (#2290)

* Update cachepot endpoint

* Update dockerfile & remove env

* Update image building process

* Cannot use metadata endpoint for this

* Update workflow

* Conform to kaniko syntax

* Update syntax

* Update approach

* Update dockerfiles

* Force update

* Update dockerfiles

* Update dockerfile

* Cleanup dockerfiles

* Update s3 test location

* Revert s3 experiment

* Add more debug

* Specify aws region

* Remove debug, add prefix

* Remove one more debug

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* workflows/benchmarking: increase timeout (#2294)

* Rework `init` in pageserver CLI  (#2272)

* Do not create initial tenant and timeline (adjust Python tests for that)
* Rework config handling during init, add --update-config to manage local config updates

* Fix: Always build images (#2296)

* Always build images

* Remove unused

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Move auto-generated 'bindings' to a separate inner module.

Re-export only things that are used by other modules.

In the future, I'm imagining that we run bindgen twice, for Postgres
v14 and v15. The two sets of bindings would go into separate
'bindings_v14' and 'bindings_v15' modules.

Rearrange postgres_ffi modules.

Move function, to avoid Postgres version dependency in timelines.rs
Move function to generate a logical-message WAL record to postgres_ffi.

* fix cargo test

* Fix walreceiver and safekeeper bugs (#2295)

- There was an issue with zero commit_lsn `reason: LaggingWal { current_commit_lsn: 0/0, new_commit_lsn: 1/6FD90D38, threshold: 10485760 } }`. The problem was in `send_wal.rs`, where we initialized `end_pos = Lsn(0)` and in some cases sent it to the pageserver.
- IDENTIFY_SYSTEM previously returned `flush_lsn` as a physical end of WAL. Now it returns `flush_lsn` (as it was) to walproposer and `commit_lsn` to everyone else including pageserver.
- There was an issue with backoff where connection was cancelled right after initialization: `connected!` -> `safekeeper_handle_db: Connection cancelled` -> `Backoff: waiting 3 seconds`. The problem was in sleeping before establishing the connection. This is fixed by reworking retry logic.
- There was an issue with getting `NoKeepAlives` reason in a loop. The issue is probably the same as the previous.
- There was an issue with filtering safekeepers based on retry attempts, which could filter some safekeepers indefinetely. This is fixed by using retry cooldown duration instead of retry attempts.
- Some `send_wal.rs` connections failed with errors without context. This is fixed by adding a timeline to safekeepers errors.

New retry logic works like this:
- Every candidate has a `next_retry_at` timestamp and is not considered for connection until that moment
- When walreceiver connection is closed, we update `next_retry_at` using exponential backoff, increasing the cooldown on every disconnect.
- When `last_record_lsn` was advanced using the WAL from the safekeeper, we reset the retry cooldown and exponential backoff, allowing walreceiver to reconnect to the same safekeeper instantly.

* on safekeeper registration pass availability zone param (#2292)

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Anton Galitsyn <agalitsyn@users.noreply.github.com>
2022-08-18 15:32:33 +03:00
Arthur Petukhovsky
873347f977 Merge pull request #2275 from neondatabase/main
* github/workflows: Fix git dubious ownership (#2223)

* Move relation size cache from WalIngest to DatadirTimeline (#2094)

* Move relation sie cache to layered timeline

* Fix obtaining current LSN for relation size cache

* Resolve merge conflicts

* Resolve merge conflicts

* Reestore 'lsn' field in DatadirModification

* adjust DatadirModification lsn in ingest_record

* Fix formatting

* Pass lsn to get_relsize

* Fix merge conflict

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* refactor: replace lazy-static with once-cell (#2195)

- Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy`
- fixes #1147

Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>

* Add more buckets to pageserver latency metrics (#2225)

* ignore record property warning to fix benchmarks

* increase statement timeout

* use event so it fires only if workload thread successfully finished

* remove debug log

* increase timeout to pass test with real s3

* avoid duplicate parameter, increase timeout

* Major migration script (#2073)

This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview.

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>

* Fix etcd typos

* Fix links to safekeeper protocol docs. (#2188)

safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in
commit 0b14fdb078, as part of reorganizing the docs into 'mdbook' format.

Fixes issue #1475. Thanks to @banks for spotting the outdated references.

In addition to fixing the above issue, this patch also fixes other broken links as a result of 0b14fdb078. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Thang Pham <thang@neon.tech>

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* support node id and remote storage params in docker_entrypoint.sh

* Safe truncate (#2218)

* Move relation sie cache to layered timeline

* Fix obtaining current LSN for relation size cache

* Resolve merge conflicts

* Resolve merge conflicts

* Reestore 'lsn' field in DatadirModification

* adjust DatadirModification lsn in ingest_record

* Fix formatting

* Pass lsn to get_relsize

* Fix merge conflict

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Check if relation exists before trying to truncat it

refer #1932

* Add test reporducing FSM truncate problem

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Fix exponential backoff values

* Update back `vendor/postgres` back; it was changed accidentally. (#2251)

Commit 4227cfc96e accidentally reverted vendor/postgres to an older
version. Update it back.

* Add pageserver checkpoint_timeout option.

To flush inmemory layer eventually when no new data arrives, which helps
safekeepers to suspend activity (stop pushing to the broker). Default 10m should
be ok.

* Share exponential backoff code and fix logic for delete task failure (#2252)

* Fix bug when import large (>1GB) relations (#2172)

Resolves #2097 

- use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get`
- update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. 
  + `small` is the old test
  + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build

* Fix timeline physical size flaky tests (#2244)

Resolves #2212.

- use `wait_for_last_flush_lsn` in `test_timeline_physical_size_*` tests

## Context
Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk.

### Note
The `LayerFlushThread` threads are processed **concurrently**, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly

* postgres_ffi/waldecoder: validate more header fields

* postgres_ffi/waldecoder: remove unused startlsn

* postgres_ffi/waldecoder: introduce explicit `enum State`

Previously it was emulated with a combination of nullable fields.
This change should make the logic more readable.

* disable `test_import_from_pageserver_multisegment` (#2258)

This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure.

See: #2255, #2256

* get_binaries uses DOCKER_TAG taken from docker image build step (#2260)

* [proxy] Rework wire format of the password hack and some errors (#2236)

The new format has a few benefits: it's shorter, simpler and
human-readable as well. We don't use base64 anymore, since
url encoding got us covered.

We also show a better error in case we couldn't parse the
payload; the users should know it's all about passing the
correct project name.

* test_runner/pg_clients: collect docker logs (#2259)

* get_binaries script fix (#2263)

* get_binaries uses DOCKER_TAG taken from docker image build step

* remove docker tag discovery at all and fix get_binaries for version variable

* Better storage sync logs (#2268)

* Find end of WAL on safekeepers using WalStreamDecoder.

We could make it inside wal_storage.rs, but taking into account that
 - wal_storage.rs reading is async
 - we don't need s3 here
 - error handling is different; error during decoding is normal
I decided to put it separately.

Test
cargo test test_find_end_of_wal_last_crossing_segment
prepared earlier by @yeputons passes now.

Fixes https://github.com/neondatabase/neon/issues/544
      https://github.com/neondatabase/cloud/issues/2004
Supersedes https://github.com/neondatabase/neon/pull/2066

* Improve walreceiver logic (#2253)

This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers.

- There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237
- Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down.
- Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second.
- `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast.
- `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.

* increase timeout in wait_for_upload to avoid spurious failures when testing with real s3

* Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274)

* Set up a workflow to run pgbench against captest (#2077)

Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
Co-authored-by: Ankur Srivastava <ansrivas@users.noreply.github.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Thang Pham <thang@neon.tech>
Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Co-authored-by: Egor Suvorov <egor@neon.tech>
Co-authored-by: Andrey Taranik <andrey@cicd.team>
Co-authored-by: Dmitry Ivanov <ivadmi5@gmail.com>
2022-08-15 21:30:45 +03:00
Arthur Petukhovsky
e814ac16f9 Merge pull request #2219 from neondatabase/main
Release 2022-08-04
2022-08-04 20:06:34 +03:00
Heikki Linnakangas
ad3055d386 Merge pull request #2203 from neondatabase/release-uuid-ossp
Deploy new storage and compute version to production

Release 2022-08-02
2022-08-02 15:08:14 +03:00
Heikki Linnakangas
94e03eb452 Merge remote-tracking branch 'origin/main' into 'release'
Release 2022-08-01
2022-08-02 12:43:49 +03:00
Sergey Melnikov
380f26ef79 Merge pull request #2170 from neondatabase/main (Release 2022-07-28)
Release 2022-07-28
2022-07-28 14:16:52 +03:00
Arthur Petukhovsky
3c5b7f59d7 Merge pull request #2119 from neondatabase/main
Release 2022-07-19
2022-07-19 11:58:48 +03:00
Arthur Petukhovsky
fee89f80b5 Merge pull request #2115 from neondatabase/main-2022-07-18
Release 2022-07-18
2022-07-18 19:21:11 +03:00
Arthur Petukhovsky
41cce8eaf1 Merge remote-tracking branch 'origin/release' into main-2022-07-18 2022-07-18 18:21:20 +03:00
Alexey Kondratov
f88fe0218d Merge pull request #1842 from neondatabase/release-deploy-hotfix
[HOTFIX] Release deploy fix

This PR uses this branch neondatabase/postgres#171 and several required commits from the main to use only locally built compute-tools. This should allow us to rollout safekeepers sync issue fix on prod
2022-06-01 11:04:30 +03:00
Alexey Kondratov
cc856eca85 Install missing openssl packages in the Github Actions workflow 2022-05-31 21:31:31 +02:00
Alexey Kondratov
cf350c6002 Use :local compute-tools tag to build compute-node image 2022-05-31 21:31:16 +02:00
Arseny Sher
0ce6b6a0a3 Merge pull request #1836 from neondatabase/release-hotfix-basebackup-lsn-page-boundary
Bump vendor/postgres to hotfix basebackup LSN comparison.
2022-05-31 16:54:03 +04:00
Arseny Sher
73f247d537 Bump vendor/postgres to hotfix basebackup LSN comparison. 2022-05-31 16:00:50 +04:00
Andrey Taranik
960be82183 Merge pull request #1792 from neondatabase/main
Release 2202-05-25 (second)
2022-05-25 16:37:57 +03:00
Andrey Taranik
806e5a6c19 Merge pull request #1787 from neondatabase/main
Release 2022-05-25
2022-05-25 13:34:11 +03:00
Alexey Kondratov
8d5df07cce Merge pull request #1385 from zenithdb/main
Release main 2022-03-22
2022-03-22 05:04:34 -05:00
Andrey Taranik
df7a9d1407 release fix 2022-03-16 (#1375) 2022-03-17 00:43:28 +03:00
59 changed files with 3086 additions and 1190 deletions

View File

@@ -3,7 +3,7 @@
**NB: this PR must be merged only by 'Create a merge commit'!**
### Checklist when preparing for release
- [ ] Read or refresh [the release flow guide](https://github.com/neondatabase/cloud/wiki/Release:-general-flow)
- [ ] Read or refresh [the release flow guide](https://www.notion.so/neondatabase/Release-general-flow-61f2e39fd45d4d14a70c7749604bd70b)
- [ ] Ask in the [cloud Slack channel](https://neondb.slack.com/archives/C033A2WE6BZ) that you are going to rollout the release. Any blockers?
- [ ] Does this release contain any db migrations? Destructive ones? What is the rollback plan?

View File

@@ -1,5 +1,7 @@
self-hosted-runner:
labels:
- arm64
- dev
- gen3
- large
- small

View File

@@ -172,10 +172,10 @@ jobs:
# https://github.com/EmbarkStudios/cargo-deny
- name: Check rust licenses/bans/advisories/sources
if: ${{ !cancelled() }}
run: cargo deny check
run: cargo deny check --hide-inclusion-graph
build-neon:
needs: [ check-permissions ]
needs: [ check-permissions, tag ]
runs-on: [ self-hosted, gen3, large ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
@@ -187,6 +187,7 @@ jobs:
env:
BUILD_TYPE: ${{ matrix.build_type }}
GIT_VERSION: ${{ github.event.pull_request.head.sha || github.sha }}
BUILD_TAG: ${{ needs.tag.outputs.build-tag }}
steps:
- name: Fix git ownership
@@ -585,10 +586,13 @@ jobs:
id: upload-coverage-report-new
env:
BUCKET: neon-github-public-dev
# A differential coverage report is available only for PRs.
# (i.e. for pushes into main/release branches we have a regular coverage report)
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
BASE_SHA: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
BASELINE="$(git merge-base HEAD origin/main)"
CURRENT="${COMMIT_SHA}"
BASELINE="$(git merge-base $BASE_SHA $CURRENT)"
cp /tmp/coverage/report/lcov.info ./${CURRENT}.info
@@ -848,7 +852,7 @@ jobs:
run:
shell: sh -eu {0}
env:
VM_BUILDER_VERSION: v0.18.5
VM_BUILDER_VERSION: v0.19.0
steps:
- name: Checkout
@@ -870,8 +874,7 @@ jobs:
- name: Build vm image
run: |
./vm-builder \
-enable-file-cache \
-cgroup-uid=postgres \
-spec=vm-image-spec.yaml \
-src=369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-${{ matrix.version }}:${{needs.tag.outputs.build-tag}} \
-dst=369495373322.dkr.ecr.eu-central-1.amazonaws.com/vm-compute-node-${{ matrix.version }}:${{needs.tag.outputs.build-tag}}

View File

@@ -21,7 +21,10 @@ env:
jobs:
check-macos-build:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 90
runs-on: macos-latest
@@ -112,8 +115,182 @@ jobs:
- name: Check that no warnings are produced
run: ./run_clippy.sh
check-linux-arm-build:
timeout-minutes: 90
runs-on: [ self-hosted, dev, arm64 ]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
CARGO_FEATURES: --features testing
CARGO_FLAGS: --locked --release
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
steps:
- name: Fix git ownership
run: |
# Workaround for `fatal: detected dubious ownership in repository at ...`
#
# Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers
# Ref https://github.com/actions/checkout/issues/785
#
git config --global --add safe.directory ${{ github.workspace }}
git config --global --add safe.directory ${GITHUB_WORKSPACE}
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
- name: Set pg 14 revision for caching
id: pg_v14_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT
- name: Set pg 15 revision for caching
id: pg_v15_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT
- name: Set pg 16 revision for caching
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Set env variables
run: |
echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo" >> $GITHUB_ENV
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v3
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v3
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v3
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: mold -run make postgres-v14 -j$(nproc)
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: mold -run make postgres-v15 -j$(nproc)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: mold -run make postgres-v16 -j$(nproc)
- name: Build neon extensions
run: mold -run make neon-pg-ext -j$(nproc)
- name: Build walproposer-lib
run: mold -run make walproposer-lib -j$(nproc)
- name: Run cargo build
run: |
mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests
- name: Run cargo test
run: |
cargo test $CARGO_FLAGS $CARGO_FEATURES
# Run separate tests for real S3
export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty
export REMOTE_STORAGE_S3_BUCKET=neon-github-public-dev
export REMOTE_STORAGE_S3_REGION=eu-central-1
# Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now
cargo test $CARGO_FLAGS --package remote_storage --test test_real_s3
# Run separate tests for real Azure Blob Storage
# XXX: replace region with `eu-central-1`-like region
export ENABLE_REAL_AZURE_REMOTE_STORAGE=y
export AZURE_STORAGE_ACCOUNT="${{ secrets.AZURE_STORAGE_ACCOUNT_DEV }}"
export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"
export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"
export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"
# Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now
cargo test $CARGO_FLAGS --package remote_storage --test test_real_azure
check-codestyle-rust-arm:
timeout-minutes: 90
runs-on: [ self-hosted, dev, arm64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
# cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.
# This will catch compiler & clippy warnings in all feature combinations.
# TODO: use cargo hack for build and test as well, but, that's quite expensive.
# NB: keep clippy args in sync with ./run_clippy.sh
- run: |
CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"
if [ "$CLIPPY_COMMON_ARGS" = "" ]; then
echo "No clippy args found in .neon_clippy_args"
exit 1
fi
echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV
- name: Run cargo clippy (debug)
run: cargo hack --feature-powerset clippy $CLIPPY_COMMON_ARGS
- name: Run cargo clippy (release)
run: cargo hack --feature-powerset clippy --release $CLIPPY_COMMON_ARGS
- name: Check documentation generation
run: cargo doc --workspace --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"
# Use `${{ !cancelled() }}` to run quck tests after the longer clippy run
- name: Check formatting
if: ${{ !cancelled() }}
run: cargo fmt --all -- --check
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check rust dependencies
if: ${{ !cancelled() }}
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack
# https://github.com/EmbarkStudios/cargo-deny
- name: Check rust licenses/bans/advisories/sources
if: ${{ !cancelled() }}
run: cargo deny check
gather-rust-build-stats:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats')
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
runs-on: [ self-hosted, gen3, large ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

635
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -83,7 +83,7 @@ hex = "0.4"
hex-literal = "0.4"
hmac = "0.12.1"
hostname = "0.3.1"
http-types = "2"
http-types = { version = "2", default-features = false }
humantime = "2.1"
humantime-serde = "1.1.1"
hyper = "0.14"
@@ -136,6 +136,7 @@ strum_macros = "0.24"
svg_fmt = "0.4.1"
sync_wrapper = "0.1.2"
tar = "0.4"
task-local-extensions = "0.1.4"
test-context = "0.1"
thiserror = "1.0"
tls-listener = { version = "0.7", features = ["rustls", "hyper-h1"] }
@@ -164,11 +165,11 @@ env_logger = "0.10"
log = "0.4"
## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
postgres-native-tls = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
postgres-native-tls = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
## Other git libraries
heapless = { default-features=false, features=[], git = "https://github.com/japaric/heapless.git", rev = "644653bf3b831c6bb4963be2de24804acf5e5001" } # upstream release pending
@@ -205,7 +206,7 @@ tonic-build = "0.9"
# This is only needed for proxy's tests.
# TODO: we should probably fork `tokio-postgres-rustls` instead.
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="ce7260db5998fe27167da42503905a12e7ad9048" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="6ce32f791526e27533cab0232a6bb243b2c32584" }
################# Binary contents sections

View File

@@ -479,13 +479,6 @@ fn cli() -> clap::Command {
)
.value_name("FILECACHE_CONNSTR"),
)
.arg(
// DEPRECATED, NO LONGER DOES ANYTHING.
// See https://github.com/neondatabase/cloud/issues/7516
Arg::new("file-cache-on-disk")
.long("file-cache-on-disk")
.action(clap::ArgAction::SetTrue),
)
}
#[test]

View File

@@ -133,45 +133,6 @@ fn parse_pg_version(human_version: &str) -> &str {
panic!("Unsuported postgres version {human_version}");
}
#[cfg(test)]
mod tests {
use super::parse_pg_version;
#[test]
fn test_parse_pg_version() {
assert_eq!(parse_pg_version("PostgreSQL 15.4"), "v15");
assert_eq!(parse_pg_version("PostgreSQL 15.14"), "v15");
assert_eq!(
parse_pg_version("PostgreSQL 15.4 (Ubuntu 15.4-0ubuntu0.23.04.1)"),
"v15"
);
assert_eq!(parse_pg_version("PostgreSQL 14.15"), "v14");
assert_eq!(parse_pg_version("PostgreSQL 14.0"), "v14");
assert_eq!(
parse_pg_version("PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1"),
"v14"
);
assert_eq!(parse_pg_version("PostgreSQL 16devel"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16beta1"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16rc2"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16extra"), "v16");
}
#[test]
#[should_panic]
fn test_parse_pg_unsupported_version() {
parse_pg_version("PostgreSQL 13.14");
}
#[test]
#[should_panic]
fn test_parse_pg_incorrect_version_format() {
parse_pg_version("PostgreSQL 14");
}
}
// download the archive for a given extension,
// unzip it, and place files in the appropriate locations (share/lib)
pub async fn download_extension(
@@ -285,3 +246,42 @@ pub fn init_remote_storage(remote_ext_config: &str) -> anyhow::Result<GenericRem
};
GenericRemoteStorage::from_config(&config)
}
#[cfg(test)]
mod tests {
use super::parse_pg_version;
#[test]
fn test_parse_pg_version() {
assert_eq!(parse_pg_version("PostgreSQL 15.4"), "v15");
assert_eq!(parse_pg_version("PostgreSQL 15.14"), "v15");
assert_eq!(
parse_pg_version("PostgreSQL 15.4 (Ubuntu 15.4-0ubuntu0.23.04.1)"),
"v15"
);
assert_eq!(parse_pg_version("PostgreSQL 14.15"), "v14");
assert_eq!(parse_pg_version("PostgreSQL 14.0"), "v14");
assert_eq!(
parse_pg_version("PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1"),
"v14"
);
assert_eq!(parse_pg_version("PostgreSQL 16devel"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16beta1"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16rc2"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16extra"), "v16");
}
#[test]
#[should_panic]
fn test_parse_pg_unsupported_version() {
parse_pg_version("PostgreSQL 13.14");
}
#[test]
#[should_panic]
fn test_parse_pg_incorrect_version_format() {
parse_pg_version("PostgreSQL 14");
}
}

View File

@@ -9,6 +9,7 @@ pub struct AttachmentService {
env: LocalEnv,
listen: String,
path: PathBuf,
client: reqwest::blocking::Client,
}
const COMMAND: &str = "attachment_service";
@@ -24,6 +25,16 @@ pub struct AttachHookResponse {
pub gen: Option<u32>,
}
#[derive(Serialize, Deserialize)]
pub struct InspectRequest {
pub tenant_id: TenantId,
}
#[derive(Serialize, Deserialize)]
pub struct InspectResponse {
pub attachment: Option<(u32, NodeId)>,
}
impl AttachmentService {
pub fn from_env(env: &LocalEnv) -> Self {
let path = env.base_data_dir.join("attachments.json");
@@ -42,6 +53,9 @@ impl AttachmentService {
env: env.clone(),
path,
listen,
client: reqwest::blocking::ClientBuilder::new()
.build()
.expect("Failed to construct http client"),
}
}
@@ -84,16 +98,13 @@ impl AttachmentService {
.unwrap()
.join("attach-hook")
.unwrap();
let client = reqwest::blocking::ClientBuilder::new()
.build()
.expect("Failed to construct http client");
let request = AttachHookRequest {
tenant_id,
node_id: Some(pageserver_id),
};
let response = client.post(url).json(&request).send()?;
let response = self.client.post(url).json(&request).send()?;
if response.status() != StatusCode::OK {
return Err(anyhow!("Unexpected status {}", response.status()));
}
@@ -101,4 +112,26 @@ impl AttachmentService {
let response = response.json::<AttachHookResponse>()?;
Ok(response.gen)
}
pub fn inspect(&self, tenant_id: TenantId) -> anyhow::Result<Option<(u32, NodeId)>> {
use hyper::StatusCode;
let url = self
.env
.control_plane_api
.clone()
.unwrap()
.join("inspect")
.unwrap();
let request = InspectRequest { tenant_id };
let response = self.client.post(url).json(&request).send()?;
if response.status() != StatusCode::OK {
return Err(anyhow!("Unexpected status {}", response.status()));
}
let response = response.json::<InspectResponse>()?;
Ok(response.attachment)
}
}

View File

@@ -32,7 +32,9 @@ use pageserver_api::control_api::{
ValidateResponseTenant,
};
use control_plane::attachment_service::{AttachHookRequest, AttachHookResponse};
use control_plane::attachment_service::{
AttachHookRequest, AttachHookResponse, InspectRequest, InspectResponse,
};
#[derive(Parser)]
#[command(author, version, about, long_about = None)]
@@ -255,12 +257,28 @@ async fn handle_attach_hook(mut req: Request<Body>) -> Result<Response<Body>, Ap
)
}
async fn handle_inspect(mut req: Request<Body>) -> Result<Response<Body>, ApiError> {
let inspect_req = json_request::<InspectRequest>(&mut req).await?;
let state = get_state(&req).inner.clone();
let locked = state.write().await;
let tenant_state = locked.tenants.get(&inspect_req.tenant_id);
json_response(
StatusCode::OK,
InspectResponse {
attachment: tenant_state.and_then(|s| s.pageserver.map(|ps| (s.generation, ps))),
},
)
}
fn make_router(persistent_state: PersistentState) -> RouterBuilder<hyper::Body, ApiError> {
endpoint::make_router()
.data(Arc::new(State::new(persistent_state)))
.post("/re-attach", |r| request_span(r, handle_re_attach))
.post("/validate", |r| request_span(r, handle_validate))
.post("/attach-hook", |r| request_span(r, handle_attach_hook))
.post("/inspect", |r| request_span(r, handle_inspect))
}
#[tokio::main]

View File

@@ -11,13 +11,14 @@ use compute_api::spec::ComputeMode;
use control_plane::attachment_service::AttachmentService;
use control_plane::endpoint::ComputeControlPlane;
use control_plane::local_env::LocalEnv;
use control_plane::pageserver::PageServerNode;
use control_plane::pageserver::{PageServerNode, PAGESERVER_REMOTE_STORAGE_DIR};
use control_plane::safekeeper::SafekeeperNode;
use control_plane::tenant_migration::migrate_tenant;
use control_plane::{broker, local_env};
use pageserver_api::models::TimelineInfo;
use pageserver_api::{
DEFAULT_HTTP_LISTEN_ADDR as DEFAULT_PAGESERVER_HTTP_ADDR,
DEFAULT_PG_LISTEN_ADDR as DEFAULT_PAGESERVER_PG_ADDR,
DEFAULT_HTTP_LISTEN_PORT as DEFAULT_PAGESERVER_HTTP_PORT,
DEFAULT_PG_LISTEN_PORT as DEFAULT_PAGESERVER_PG_PORT,
};
use postgres_backend::AuthType;
use safekeeper_api::{
@@ -46,8 +47,8 @@ const DEFAULT_PG_VERSION: &str = "15";
const DEFAULT_PAGESERVER_CONTROL_PLANE_API: &str = "http://127.0.0.1:1234/";
fn default_conf() -> String {
format!(
fn default_conf(num_pageservers: u16) -> String {
let mut template = format!(
r#"
# Default built-in configuration, defined in main.rs
control_plane_api = '{DEFAULT_PAGESERVER_CONTROL_PLANE_API}'
@@ -55,21 +56,33 @@ control_plane_api = '{DEFAULT_PAGESERVER_CONTROL_PLANE_API}'
[broker]
listen_addr = '{DEFAULT_BROKER_ADDR}'
[[pageservers]]
id = {DEFAULT_PAGESERVER_ID}
listen_pg_addr = '{DEFAULT_PAGESERVER_PG_ADDR}'
listen_http_addr = '{DEFAULT_PAGESERVER_HTTP_ADDR}'
pg_auth_type = '{trust_auth}'
http_auth_type = '{trust_auth}'
[[safekeepers]]
id = {DEFAULT_SAFEKEEPER_ID}
pg_port = {DEFAULT_SAFEKEEPER_PG_PORT}
http_port = {DEFAULT_SAFEKEEPER_HTTP_PORT}
"#,
trust_auth = AuthType::Trust,
)
);
for i in 0..num_pageservers {
let pageserver_id = NodeId(DEFAULT_PAGESERVER_ID.0 + i as u64);
let pg_port = DEFAULT_PAGESERVER_PG_PORT + i;
let http_port = DEFAULT_PAGESERVER_HTTP_PORT + i;
template += &format!(
r#"
[[pageservers]]
id = {pageserver_id}
listen_pg_addr = '127.0.0.1:{pg_port}'
listen_http_addr = '127.0.0.1:{http_port}'
pg_auth_type = '{trust_auth}'
http_auth_type = '{trust_auth}'
"#,
trust_auth = AuthType::Trust,
)
}
template
}
///
@@ -295,6 +308,9 @@ fn parse_timeline_id(sub_match: &ArgMatches) -> anyhow::Result<Option<TimelineId
}
fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
let num_pageservers = init_match
.get_one::<u16>("num-pageservers")
.expect("num-pageservers arg has a default");
// Create config file
let toml_file: String = if let Some(config_path) = init_match.get_one::<PathBuf>("config") {
// load and parse the file
@@ -306,7 +322,7 @@ fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
})?
} else {
// Built-in default config
default_conf()
default_conf(*num_pageservers)
};
let pg_version = init_match
@@ -320,6 +336,9 @@ fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
env.init(pg_version, force)
.context("Failed to initialize neon repository")?;
// Create remote storage location for default LocalFs remote storage
std::fs::create_dir_all(env.base_data_dir.join(PAGESERVER_REMOTE_STORAGE_DIR))?;
// Initialize pageserver, create initial tenant and timeline.
for ps_conf in &env.pageservers {
PageServerNode::from_env(&env, ps_conf)
@@ -433,6 +452,15 @@ fn handle_tenant(tenant_match: &ArgMatches, env: &mut local_env::LocalEnv) -> an
.with_context(|| format!("Tenant config failed for tenant with id {tenant_id}"))?;
println!("tenant {tenant_id} successfully configured on the pageserver");
}
Some(("migrate", matches)) => {
let tenant_id = get_tenant_id(matches, env)?;
let new_pageserver = get_pageserver(env, matches)?;
let new_pageserver_id = new_pageserver.conf.id;
migrate_tenant(env, tenant_id, new_pageserver)?;
println!("tenant {tenant_id} migrated to {}", new_pageserver_id);
}
Some((sub_name, _)) => bail!("Unexpected tenant subcommand '{}'", sub_name),
None => bail!("no tenant subcommand provided"),
}
@@ -867,20 +895,20 @@ fn handle_mappings(sub_match: &ArgMatches, env: &mut local_env::LocalEnv) -> Res
}
}
fn get_pageserver(env: &local_env::LocalEnv, args: &ArgMatches) -> Result<PageServerNode> {
let node_id = if let Some(id_str) = args.get_one::<String>("pageserver-id") {
NodeId(id_str.parse().context("while parsing pageserver id")?)
} else {
DEFAULT_PAGESERVER_ID
};
Ok(PageServerNode::from_env(
env,
env.get_pageserver_conf(node_id)?,
))
}
fn handle_pageserver(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
fn get_pageserver(env: &local_env::LocalEnv, args: &ArgMatches) -> Result<PageServerNode> {
let node_id = if let Some(id_str) = args.get_one::<String>("pageserver-id") {
NodeId(id_str.parse().context("while parsing pageserver id")?)
} else {
DEFAULT_PAGESERVER_ID
};
Ok(PageServerNode::from_env(
env,
env.get_pageserver_conf(node_id)?,
))
}
match sub_match.subcommand() {
Some(("start", subcommand_args)) => {
if let Err(e) = get_pageserver(env, subcommand_args)?
@@ -917,6 +945,20 @@ fn handle_pageserver(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> Resul
}
}
Some(("migrate", subcommand_args)) => {
let pageserver = get_pageserver(env, subcommand_args)?;
//TODO what shutdown strategy should we use here?
if let Err(e) = pageserver.stop(false) {
eprintln!("pageserver stop failed: {}", e);
exit(1);
}
if let Err(e) = pageserver.start(&pageserver_config_overrides(subcommand_args)) {
eprintln!("pageserver start failed: {e}");
exit(1);
}
}
Some(("status", subcommand_args)) => {
match get_pageserver(env, subcommand_args)?.check_status() {
Ok(_) => println!("Page server is up and running"),
@@ -1224,6 +1266,13 @@ fn cli() -> Command {
.help("Force initialization even if the repository is not empty")
.required(false);
let num_pageservers_arg = Arg::new("num-pageservers")
.value_parser(value_parser!(u16))
.long("num-pageservers")
.help("How many pageservers to create (default 1)")
.required(false)
.default_value("1");
Command::new("Neon CLI")
.arg_required_else_help(true)
.version(GIT_VERSION)
@@ -1231,6 +1280,7 @@ fn cli() -> Command {
Command::new("init")
.about("Initialize a new Neon repository, preparing configs for services to start with")
.arg(pageserver_config_args.clone())
.arg(num_pageservers_arg.clone())
.arg(
Arg::new("config")
.long("config")
@@ -1301,6 +1351,10 @@ fn cli() -> Command {
.subcommand(Command::new("config")
.arg(tenant_id_arg.clone())
.arg(Arg::new("config").short('c').num_args(1).action(ArgAction::Append).required(false)))
.subcommand(Command::new("migrate")
.about("Migrate a tenant from one pageserver to another")
.arg(tenant_id_arg.clone())
.arg(pageserver_id_arg.clone()))
)
.subcommand(
Command::new("pageserver")

View File

@@ -14,3 +14,4 @@ pub mod local_env;
pub mod pageserver;
pub mod postgresql_conf;
pub mod safekeeper;
pub mod tenant_migration;

View File

@@ -15,7 +15,10 @@ use std::{io, result};
use anyhow::{bail, Context};
use camino::Utf8PathBuf;
use pageserver_api::models::{self, TenantInfo, TimelineInfo};
use pageserver_api::models::{
self, LocationConfig, TenantInfo, TenantLocationConfigRequest, TimelineInfo,
};
use pageserver_api::shard::TenantShardId;
use postgres_backend::AuthType;
use postgres_connection::{parse_host_port, PgConnectionConfig};
use reqwest::blocking::{Client, RequestBuilder, Response};
@@ -31,6 +34,9 @@ use utils::{
use crate::local_env::PageServerConf;
use crate::{background_process, local_env::LocalEnv};
/// Directory within .neon which will be used by default for LocalFs remote storage.
pub const PAGESERVER_REMOTE_STORAGE_DIR: &str = "local_fs_remote_storage/pageserver";
#[derive(Error, Debug)]
pub enum PageserverHttpError {
#[error("Reqwest error: {0}")]
@@ -98,8 +104,10 @@ impl PageServerNode {
}
}
// pageserver conf overrides defined by neon_local configuration.
fn neon_local_overrides(&self) -> Vec<String> {
/// Merge overrides provided by the user on the command line with our default overides derived from neon_local configuration.
///
/// These all end up on the command line of the `pageserver` binary.
fn neon_local_overrides(&self, cli_overrides: &[&str]) -> Vec<String> {
let id = format!("id={}", self.conf.id);
// FIXME: the paths should be shell-escaped to handle paths with spaces, quotas etc.
let pg_distrib_dir_param = format!(
@@ -132,12 +140,25 @@ impl PageServerNode {
));
}
if !cli_overrides
.iter()
.any(|c| c.starts_with("remote_storage"))
{
overrides.push(format!(
"remote_storage={{local_path='../{PAGESERVER_REMOTE_STORAGE_DIR}'}}"
));
}
if self.conf.http_auth_type != AuthType::Trust || self.conf.pg_auth_type != AuthType::Trust
{
// Keys are generated in the toplevel repo dir, pageservers' workdirs
// are one level below that, so refer to keys with ../
overrides.push("auth_validation_public_key_path='../auth_public_key.pem'".to_owned());
}
// Apply the user-provided overrides
overrides.extend(cli_overrides.iter().map(|&c| c.to_owned()));
overrides
}
@@ -203,9 +224,6 @@ impl PageServerNode {
}
fn start_node(&self, config_overrides: &[&str], update_config: bool) -> anyhow::Result<Child> {
let mut overrides = self.neon_local_overrides();
overrides.extend(config_overrides.iter().map(|&c| c.to_owned()));
let datadir = self.repo_path();
print!(
"Starting pageserver node {} at '{}' in {:?}",
@@ -248,8 +266,7 @@ impl PageServerNode {
) -> Vec<Cow<'a, str>> {
let mut args = vec![Cow::Borrowed("-D"), Cow::Borrowed(datadir_path_str)];
let mut overrides = self.neon_local_overrides();
overrides.extend(config_overrides.iter().map(|&c| c.to_owned()));
let overrides = self.neon_local_overrides(config_overrides);
for config_override in overrides {
args.push(Cow::Borrowed("-c"));
args.push(Cow::Owned(config_override));
@@ -392,7 +409,7 @@ impl PageServerNode {
};
let request = models::TenantCreateRequest {
new_tenant_id,
new_tenant_id: TenantShardId::unsharded(new_tenant_id),
generation,
config,
};
@@ -501,6 +518,27 @@ impl PageServerNode {
Ok(())
}
pub fn location_config(
&self,
tenant_id: TenantId,
config: LocationConfig,
) -> anyhow::Result<()> {
let req_body = TenantLocationConfigRequest { tenant_id, config };
self.http_request(
Method::PUT,
format!(
"{}/tenant/{}/location_config",
self.http_base_url, tenant_id
),
)?
.json(&req_body)
.send()?
.error_from_body()?;
Ok(())
}
pub fn timeline_list(&self, tenant_id: &TenantId) -> anyhow::Result<Vec<TimelineInfo>> {
let timeline_infos: Vec<TimelineInfo> = self
.http_request(

View File

@@ -0,0 +1,202 @@
//!
//! Functionality for migrating tenants across pageservers: unlike most of neon_local, this code
//! isn't scoped to a particular physical service, as it needs to update compute endpoints to
//! point to the new pageserver.
//!
use crate::local_env::LocalEnv;
use crate::{
attachment_service::AttachmentService, endpoint::ComputeControlPlane,
pageserver::PageServerNode,
};
use pageserver_api::models::{
LocationConfig, LocationConfigMode, LocationConfigSecondary, TenantConfig,
};
use std::collections::HashMap;
use std::time::Duration;
use utils::{
generation::Generation,
id::{TenantId, TimelineId},
lsn::Lsn,
};
/// Given an attached pageserver, retrieve the LSN for all timelines
fn get_lsns(
tenant_id: TenantId,
pageserver: &PageServerNode,
) -> anyhow::Result<HashMap<TimelineId, Lsn>> {
let timelines = pageserver.timeline_list(&tenant_id)?;
Ok(timelines
.into_iter()
.map(|t| (t.timeline_id, t.last_record_lsn))
.collect())
}
/// Wait for the timeline LSNs on `pageserver` to catch up with or overtake
/// `baseline`.
fn await_lsn(
tenant_id: TenantId,
pageserver: &PageServerNode,
baseline: HashMap<TimelineId, Lsn>,
) -> anyhow::Result<()> {
loop {
let latest = match get_lsns(tenant_id, pageserver) {
Ok(l) => l,
Err(e) => {
println!(
"🕑 Can't get LSNs on pageserver {} yet, waiting ({e})",
pageserver.conf.id
);
std::thread::sleep(Duration::from_millis(500));
continue;
}
};
let mut any_behind: bool = false;
for (timeline_id, baseline_lsn) in &baseline {
match latest.get(timeline_id) {
Some(latest_lsn) => {
println!("🕑 LSN origin {baseline_lsn} vs destination {latest_lsn}");
if latest_lsn < baseline_lsn {
any_behind = true;
}
}
None => {
// Expected timeline isn't yet visible on migration destination.
// (IRL we would have to account for timeline deletion, but this
// is just test helper)
any_behind = true;
}
}
}
if !any_behind {
println!("✅ LSN caught up. Proceeding...");
break;
} else {
std::thread::sleep(Duration::from_millis(500));
}
}
Ok(())
}
/// This function spans multiple services, to demonstrate live migration of a tenant
/// between pageservers:
/// - Coordinate attach/secondary/detach on pageservers
/// - call into attachment_service for generations
/// - reconfigure compute endpoints to point to new attached pageserver
pub fn migrate_tenant(
env: &LocalEnv,
tenant_id: TenantId,
dest_ps: PageServerNode,
) -> anyhow::Result<()> {
// Get a new generation
let attachment_service = AttachmentService::from_env(env);
let previous = attachment_service.inspect(tenant_id)?;
let mut baseline_lsns = None;
if let Some((generation, origin_ps_id)) = &previous {
let origin_ps = PageServerNode::from_env(env, env.get_pageserver_conf(*origin_ps_id)?);
if origin_ps_id == &dest_ps.conf.id {
println!("🔁 Already attached to {origin_ps_id}, freshening...");
let gen = attachment_service.attach_hook(tenant_id, dest_ps.conf.id)?;
let dest_conf = LocationConfig {
mode: LocationConfigMode::AttachedSingle,
generation: gen.map(Generation::new),
secondary_conf: None,
tenant_conf: TenantConfig::default(),
};
dest_ps.location_config(tenant_id, dest_conf)?;
println!("✅ Migration complete");
return Ok(());
}
println!("🔁 Switching origin pageserver {origin_ps_id} to stale mode");
let stale_conf = LocationConfig {
mode: LocationConfigMode::AttachedStale,
generation: Some(Generation::new(*generation)),
secondary_conf: None,
tenant_conf: TenantConfig::default(),
};
origin_ps.location_config(tenant_id, stale_conf)?;
baseline_lsns = Some(get_lsns(tenant_id, &origin_ps)?);
}
let gen = attachment_service.attach_hook(tenant_id, dest_ps.conf.id)?;
let dest_conf = LocationConfig {
mode: LocationConfigMode::AttachedMulti,
generation: gen.map(Generation::new),
secondary_conf: None,
tenant_conf: TenantConfig::default(),
};
println!("🔁 Attaching to pageserver {}", dest_ps.conf.id);
dest_ps.location_config(tenant_id, dest_conf)?;
if let Some(baseline) = baseline_lsns {
println!("🕑 Waiting for LSN to catch up...");
await_lsn(tenant_id, &dest_ps, baseline)?;
}
let cplane = ComputeControlPlane::load(env.clone())?;
for (endpoint_name, endpoint) in &cplane.endpoints {
if endpoint.tenant_id == tenant_id {
println!(
"🔁 Reconfiguring endpoint {} to use pageserver {}",
endpoint_name, dest_ps.conf.id
);
endpoint.reconfigure(Some(dest_ps.conf.id))?;
}
}
for other_ps_conf in &env.pageservers {
if other_ps_conf.id == dest_ps.conf.id {
continue;
}
let other_ps = PageServerNode::from_env(env, other_ps_conf);
let other_ps_tenants = other_ps.tenant_list()?;
// Check if this tenant is attached
let found = other_ps_tenants
.into_iter()
.map(|t| t.id)
.any(|i| i == tenant_id);
if !found {
continue;
}
// Downgrade to a secondary location
let secondary_conf = LocationConfig {
mode: LocationConfigMode::Secondary,
generation: None,
secondary_conf: Some(LocationConfigSecondary { warm: true }),
tenant_conf: TenantConfig::default(),
};
println!(
"💤 Switching to secondary mode on pageserver {}",
other_ps.conf.id
);
other_ps.location_config(tenant_id, secondary_conf)?;
}
println!(
"🔁 Switching to AttachedSingle mode on pageserver {}",
dest_ps.conf.id
);
let dest_conf = LocationConfig {
mode: LocationConfigMode::AttachedSingle,
generation: gen.map(Generation::new),
secondary_conf: None,
tenant_conf: TenantConfig::default(),
};
dest_ps.location_config(tenant_id, dest_conf)?;
println!("✅ Migration complete");
Ok(())
}

View File

@@ -74,10 +74,30 @@ highlight = "all"
workspace-default-features = "allow"
external-default-features = "allow"
allow = []
deny = []
skip = []
skip-tree = []
[[bans.deny]]
# we use tokio, the same rationale applies for async-{io,waker,global-executor,executor,channel,lock}, smol
# if you find yourself here while adding a dependency, try "default-features = false", ask around on #rust
name = "async-std"
[[bans.deny]]
name = "async-io"
[[bans.deny]]
name = "async-waker"
[[bans.deny]]
name = "async-global-executor"
[[bans.deny]]
name = "async-executor"
[[bans.deny]]
name = "smol"
# This section is considered when running `cargo deny check sources`.
# More documentation about the 'sources' section can be found here:
# https://embarkstudios.github.io/cargo-deny/checks/sources/cfg.html

View File

@@ -177,7 +177,7 @@ I e during migration create_branch can be called on old pageserver and newly cre
The difference of simplistic approach from one described above is that it calls ignore on source tenant first and then calls attach on target pageserver. Approach above does it in opposite order thus opening a possibility for race conditions we strive to avoid.
The approach largely follows this guide: <https://github.com/neondatabase/cloud/wiki/Cloud:-Ad-hoc-tenant-relocation>
The approach largely follows this guide: <https://www.notion.so/neondatabase/Cloud-Ad-hoc-tenant-relocation-f687474f7bfc42269e6214e3acba25c7>
The happy path sequence:

View File

@@ -17,5 +17,9 @@ postgres_ffi.workspace = true
enum-map.workspace = true
strum.workspace = true
strum_macros.workspace = true
hex.workspace = true
workspace_hack.workspace = true
[dev-dependencies]
bincode.workspace = true

View File

@@ -0,0 +1,142 @@
use anyhow::{bail, Result};
use byteorder::{ByteOrder, BE};
use serde::{Deserialize, Serialize};
use std::fmt;
/// Key used in the Repository kv-store.
///
/// The Repository treats this as an opaque struct, but see the code in pgdatadir_mapping.rs
/// for what we actually store in these fields.
#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, Ord, PartialOrd, Serialize, Deserialize)]
pub struct Key {
pub field1: u8,
pub field2: u32,
pub field3: u32,
pub field4: u32,
pub field5: u8,
pub field6: u32,
}
pub const KEY_SIZE: usize = 18;
impl Key {
/// 'field2' is used to store tablespaceid for relations and small enum numbers for other relish.
/// As long as Neon does not support tablespace (because of lack of access to local file system),
/// we can assume that only some predefined namespace OIDs are used which can fit in u16
pub fn to_i128(&self) -> i128 {
assert!(self.field2 < 0xFFFF || self.field2 == 0xFFFFFFFF || self.field2 == 0x22222222);
(((self.field1 & 0xf) as i128) << 120)
| (((self.field2 & 0xFFFF) as i128) << 104)
| ((self.field3 as i128) << 72)
| ((self.field4 as i128) << 40)
| ((self.field5 as i128) << 32)
| self.field6 as i128
}
pub const fn from_i128(x: i128) -> Self {
Key {
field1: ((x >> 120) & 0xf) as u8,
field2: ((x >> 104) & 0xFFFF) as u32,
field3: (x >> 72) as u32,
field4: (x >> 40) as u32,
field5: (x >> 32) as u8,
field6: x as u32,
}
}
pub fn next(&self) -> Key {
self.add(1)
}
pub fn add(&self, x: u32) -> Key {
let mut key = *self;
let r = key.field6.overflowing_add(x);
key.field6 = r.0;
if r.1 {
let r = key.field5.overflowing_add(1);
key.field5 = r.0;
if r.1 {
let r = key.field4.overflowing_add(1);
key.field4 = r.0;
if r.1 {
let r = key.field3.overflowing_add(1);
key.field3 = r.0;
if r.1 {
let r = key.field2.overflowing_add(1);
key.field2 = r.0;
if r.1 {
let r = key.field1.overflowing_add(1);
key.field1 = r.0;
assert!(!r.1);
}
}
}
}
}
key
}
pub fn from_slice(b: &[u8]) -> Self {
Key {
field1: b[0],
field2: u32::from_be_bytes(b[1..5].try_into().unwrap()),
field3: u32::from_be_bytes(b[5..9].try_into().unwrap()),
field4: u32::from_be_bytes(b[9..13].try_into().unwrap()),
field5: b[13],
field6: u32::from_be_bytes(b[14..18].try_into().unwrap()),
}
}
pub fn write_to_byte_slice(&self, buf: &mut [u8]) {
buf[0] = self.field1;
BE::write_u32(&mut buf[1..5], self.field2);
BE::write_u32(&mut buf[5..9], self.field3);
BE::write_u32(&mut buf[9..13], self.field4);
buf[13] = self.field5;
BE::write_u32(&mut buf[14..18], self.field6);
}
}
impl fmt::Display for Key {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"{:02X}{:08X}{:08X}{:08X}{:02X}{:08X}",
self.field1, self.field2, self.field3, self.field4, self.field5, self.field6
)
}
}
impl Key {
pub const MIN: Key = Key {
field1: u8::MIN,
field2: u32::MIN,
field3: u32::MIN,
field4: u32::MIN,
field5: u8::MIN,
field6: u32::MIN,
};
pub const MAX: Key = Key {
field1: u8::MAX,
field2: u32::MAX,
field3: u32::MAX,
field4: u32::MAX,
field5: u8::MAX,
field6: u32::MAX,
};
pub fn from_hex(s: &str) -> Result<Self> {
if s.len() != 36 {
bail!("parse error");
}
Ok(Key {
field1: u8::from_str_radix(&s[0..2], 16)?,
field2: u32::from_str_radix(&s[2..10], 16)?,
field3: u32::from_str_radix(&s[10..18], 16)?,
field4: u32::from_str_radix(&s[18..26], 16)?,
field5: u8::from_str_radix(&s[26..28], 16)?,
field6: u32::from_str_radix(&s[28..36], 16)?,
})
}
}

View File

@@ -4,8 +4,10 @@ use const_format::formatcp;
/// Public API types
pub mod control_api;
pub mod key;
pub mod models;
pub mod reltag;
pub mod shard;
pub const DEFAULT_PG_LISTEN_PORT: u16 = 64000;
pub const DEFAULT_PG_LISTEN_ADDR: &str = formatcp!("127.0.0.1:{DEFAULT_PG_LISTEN_PORT}");

View File

@@ -16,7 +16,7 @@ use utils::{
lsn::Lsn,
};
use crate::reltag::RelTag;
use crate::{reltag::RelTag, shard::TenantShardId};
use anyhow::bail;
use bytes::{BufMut, Bytes, BytesMut};
@@ -187,7 +187,7 @@ pub struct TimelineCreateRequest {
#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
pub struct TenantCreateRequest {
pub new_tenant_id: TenantId,
pub new_tenant_id: TenantShardId,
#[serde(default)]
#[serde(skip_serializing_if = "Option::is_none")]
pub generation: Option<u32>,

View File

@@ -0,0 +1,321 @@
use std::{ops::RangeInclusive, str::FromStr};
use hex::FromHex;
use serde::{Deserialize, Serialize};
use utils::id::TenantId;
#[derive(Ord, PartialOrd, Eq, PartialEq, Clone, Copy, Serialize, Deserialize, Debug)]
pub struct ShardNumber(pub u8);
#[derive(Ord, PartialOrd, Eq, PartialEq, Clone, Copy, Serialize, Deserialize, Debug)]
pub struct ShardCount(pub u8);
impl ShardCount {
pub const MAX: Self = Self(u8::MAX);
}
impl ShardNumber {
pub const MAX: Self = Self(u8::MAX);
}
/// TenantShardId identify the units of work for the Pageserver.
///
/// These are written as `<tenant_id>-<shard number><shard-count>`, for example:
///
/// # The second shard in a two-shard tenant
/// 072f1291a5310026820b2fe4b2968934-0102
///
/// Historically, tenants could not have multiple shards, and were identified
/// by TenantId. To support this, TenantShardId has a special legacy
/// mode where `shard_count` is equal to zero: this represents a single-sharded
/// tenant which should be written as a TenantId with no suffix.
///
/// The human-readable encoding of TenantShardId, such as used in API URLs,
/// is both forward and backward compatible: a legacy TenantId can be
/// decoded as a TenantShardId, and when re-encoded it will be parseable
/// as a TenantId.
///
/// Note that the binary encoding is _not_ backward compatible, because
/// at the time sharding is introduced, there are no existing binary structures
/// containing TenantId that we need to handle.
#[derive(Eq, PartialEq, PartialOrd, Ord, Clone, Copy)]
pub struct TenantShardId {
pub tenant_id: TenantId,
pub shard_number: ShardNumber,
pub shard_count: ShardCount,
}
impl TenantShardId {
pub fn unsharded(tenant_id: TenantId) -> Self {
Self {
tenant_id,
shard_number: ShardNumber(0),
shard_count: ShardCount(0),
}
}
/// The range of all TenantShardId that belong to a particular TenantId. This is useful when
/// you have a BTreeMap of TenantShardId, and are querying by TenantId.
pub fn tenant_range(tenant_id: TenantId) -> RangeInclusive<Self> {
RangeInclusive::new(
Self {
tenant_id,
shard_number: ShardNumber(0),
shard_count: ShardCount(0),
},
Self {
tenant_id,
shard_number: ShardNumber::MAX,
shard_count: ShardCount::MAX,
},
)
}
pub fn shard_slug(&self) -> String {
format!("{:02x}{:02x}", self.shard_number.0, self.shard_count.0)
}
}
impl std::fmt::Display for TenantShardId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if self.shard_count != ShardCount(0) {
write!(
f,
"{}-{:02x}{:02x}",
self.tenant_id, self.shard_number.0, self.shard_count.0
)
} else {
// Legacy case (shard_count == 0) -- format as just the tenant id. Note that this
// is distinct from the normal single shard case (shard count == 1).
self.tenant_id.fmt(f)
}
}
}
impl std::fmt::Debug for TenantShardId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// Debug is the same as Display: the compact hex representation
write!(f, "{}", self)
}
}
impl std::str::FromStr for TenantShardId {
type Err = hex::FromHexError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
// Expect format: 16 byte TenantId, '-', 1 byte shard number, 1 byte shard count
if s.len() == 32 {
// Legacy case: no shard specified
Ok(Self {
tenant_id: TenantId::from_str(s)?,
shard_number: ShardNumber(0),
shard_count: ShardCount(0),
})
} else if s.len() == 37 {
let bytes = s.as_bytes();
let tenant_id = TenantId::from_hex(&bytes[0..32])?;
let mut shard_parts: [u8; 2] = [0u8; 2];
hex::decode_to_slice(&bytes[33..37], &mut shard_parts)?;
Ok(Self {
tenant_id,
shard_number: ShardNumber(shard_parts[0]),
shard_count: ShardCount(shard_parts[1]),
})
} else {
Err(hex::FromHexError::InvalidStringLength)
}
}
}
impl From<[u8; 18]> for TenantShardId {
fn from(b: [u8; 18]) -> Self {
let tenant_id_bytes: [u8; 16] = b[0..16].try_into().unwrap();
Self {
tenant_id: TenantId::from(tenant_id_bytes),
shard_number: ShardNumber(b[16]),
shard_count: ShardCount(b[17]),
}
}
}
impl Serialize for TenantShardId {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
if serializer.is_human_readable() {
serializer.collect_str(self)
} else {
let mut packed: [u8; 18] = [0; 18];
packed[0..16].clone_from_slice(&self.tenant_id.as_arr());
packed[16] = self.shard_number.0;
packed[17] = self.shard_count.0;
packed.serialize(serializer)
}
}
}
impl<'de> Deserialize<'de> for TenantShardId {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
struct IdVisitor {
is_human_readable_deserializer: bool,
}
impl<'de> serde::de::Visitor<'de> for IdVisitor {
type Value = TenantShardId;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
if self.is_human_readable_deserializer {
formatter.write_str("value in form of hex string")
} else {
formatter.write_str("value in form of integer array([u8; 18])")
}
}
fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error>
where
A: serde::de::SeqAccess<'de>,
{
let s = serde::de::value::SeqAccessDeserializer::new(seq);
let id: [u8; 18] = Deserialize::deserialize(s)?;
Ok(TenantShardId::from(id))
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
TenantShardId::from_str(v).map_err(E::custom)
}
}
if deserializer.is_human_readable() {
deserializer.deserialize_str(IdVisitor {
is_human_readable_deserializer: true,
})
} else {
deserializer.deserialize_tuple(
18,
IdVisitor {
is_human_readable_deserializer: false,
},
)
}
}
}
#[cfg(test)]
mod tests {
use std::str::FromStr;
use bincode;
use utils::{id::TenantId, Hex};
use super::*;
const EXAMPLE_TENANT_ID: &str = "1f359dd625e519a1a4e8d7509690f6fc";
#[test]
fn tenant_shard_id_string() -> Result<(), hex::FromHexError> {
let example = TenantShardId {
tenant_id: TenantId::from_str(EXAMPLE_TENANT_ID).unwrap(),
shard_count: ShardCount(10),
shard_number: ShardNumber(7),
};
let encoded = format!("{example}");
let expected = format!("{EXAMPLE_TENANT_ID}-070a");
assert_eq!(&encoded, &expected);
let decoded = TenantShardId::from_str(&encoded)?;
assert_eq!(example, decoded);
Ok(())
}
#[test]
fn tenant_shard_id_binary() -> Result<(), hex::FromHexError> {
let example = TenantShardId {
tenant_id: TenantId::from_str(EXAMPLE_TENANT_ID).unwrap(),
shard_count: ShardCount(10),
shard_number: ShardNumber(7),
};
let encoded = bincode::serialize(&example).unwrap();
let expected: [u8; 18] = [
0x1f, 0x35, 0x9d, 0xd6, 0x25, 0xe5, 0x19, 0xa1, 0xa4, 0xe8, 0xd7, 0x50, 0x96, 0x90,
0xf6, 0xfc, 0x07, 0x0a,
];
assert_eq!(Hex(&encoded), Hex(&expected));
let decoded = bincode::deserialize(&encoded).unwrap();
assert_eq!(example, decoded);
Ok(())
}
#[test]
fn tenant_shard_id_backward_compat() -> Result<(), hex::FromHexError> {
// Test that TenantShardId can decode a TenantId in human
// readable form
let example = TenantId::from_str(EXAMPLE_TENANT_ID).unwrap();
let encoded = format!("{example}");
assert_eq!(&encoded, EXAMPLE_TENANT_ID);
let decoded = TenantShardId::from_str(&encoded)?;
assert_eq!(example, decoded.tenant_id);
assert_eq!(decoded.shard_count, ShardCount(0));
assert_eq!(decoded.shard_number, ShardNumber(0));
Ok(())
}
#[test]
fn tenant_shard_id_forward_compat() -> Result<(), hex::FromHexError> {
// Test that a legacy TenantShardId encodes into a form that
// can be decoded as TenantId
let example_tenant_id = TenantId::from_str(EXAMPLE_TENANT_ID).unwrap();
let example = TenantShardId::unsharded(example_tenant_id);
let encoded = format!("{example}");
assert_eq!(&encoded, EXAMPLE_TENANT_ID);
let decoded = TenantId::from_str(&encoded)?;
assert_eq!(example_tenant_id, decoded);
Ok(())
}
#[test]
fn tenant_shard_id_legacy_binary() -> Result<(), hex::FromHexError> {
// Unlike in human readable encoding, binary encoding does not
// do any special handling of legacy unsharded TenantIds: this test
// is equivalent to the main test for binary encoding, just verifying
// that the same behavior applies when we have used `unsharded()` to
// construct a TenantShardId.
let example = TenantShardId::unsharded(TenantId::from_str(EXAMPLE_TENANT_ID).unwrap());
let encoded = bincode::serialize(&example).unwrap();
let expected: [u8; 18] = [
0x1f, 0x35, 0x9d, 0xd6, 0x25, 0xe5, 0x19, 0xa1, 0xa4, 0xe8, 0xd7, 0x50, 0x96, 0x90,
0xf6, 0xfc, 0x00, 0x00,
];
assert_eq!(Hex(&encoded), Hex(&expected));
let decoded = bincode::deserialize::<TenantShardId>(&encoded).unwrap();
assert_eq!(example, decoded);
Ok(())
}
}

View File

@@ -16,6 +16,7 @@ use pageserver_api::models::{
DownloadRemoteLayersTaskSpawnRequest, LocationConfigMode, TenantAttachRequest,
TenantLoadRequest, TenantLocationConfigRequest,
};
use pageserver_api::shard::TenantShardId;
use remote_storage::GenericRemoteStorage;
use tenant_size_model::{SizeResult, StorageModel};
use tokio_util::sync::CancellationToken;
@@ -419,9 +420,9 @@ async fn timeline_create_handler(
mut request: Request<Body>,
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
let request_data: TimelineCreateRequest = json_request(&mut request).await?;
check_permission(&request, Some(tenant_id))?;
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let new_timeline_id = request_data.new_timeline_id;
@@ -430,7 +431,7 @@ async fn timeline_create_handler(
let state = get_state(&request);
async {
let tenant = mgr::get_tenant(tenant_id, true)?;
let tenant = state.tenant_manager.get_attached_tenant_shard(tenant_shard_id, true)?;
match tenant.create_timeline(
new_timeline_id,
request_data.ancestor_timeline_id.map(TimelineId::from),
@@ -464,7 +465,10 @@ async fn timeline_create_handler(
Err(tenant::CreateTimelineError::Other(err)) => Err(ApiError::InternalServerError(err)),
}
}
.instrument(info_span!("timeline_create", %tenant_id, timeline_id = %new_timeline_id, lsn=?request_data.ancestor_start_lsn, pg_version=?request_data.pg_version))
.instrument(info_span!("timeline_create",
tenant_id = %tenant_shard_id.tenant_id,
shard = %tenant_shard_id.shard_slug(),
timeline_id = %new_timeline_id, lsn=?request_data.ancestor_start_lsn, pg_version=?request_data.pg_version))
.await
}
@@ -660,14 +664,15 @@ async fn timeline_delete_handler(
request: Request<Body>,
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
check_permission(&request, Some(tenant_id))?;
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Warn);
let state = get_state(&request);
mgr::delete_timeline(tenant_id, timeline_id, &ctx)
.instrument(info_span!("timeline_delete", %tenant_id, %timeline_id))
state.tenant_manager.delete_timeline(tenant_shard_id, timeline_id, &ctx)
.instrument(info_span!("timeline_delete", tenant_id=%tenant_shard_id.tenant_id, shard=%tenant_shard_id.shard_slug(), %timeline_id))
.await?;
json_response(StatusCode::ACCEPTED, ())
@@ -681,11 +686,14 @@ async fn tenant_detach_handler(
check_permission(&request, Some(tenant_id))?;
let detach_ignored: Option<bool> = parse_query_param(&request, "detach_ignored")?;
// This is a legacy API (`/location_conf` is the replacement). It only supports unsharded tenants
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
let state = get_state(&request);
let conf = state.conf;
mgr::detach_tenant(
conf,
tenant_id,
tenant_shard_id,
detach_ignored.unwrap_or(false),
&state.deletion_queue_client,
)
@@ -802,13 +810,16 @@ async fn tenant_delete_handler(
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
// TODO openapi spec
let tenant_id: TenantId = parse_request_param(&request, "tenant_id")?;
check_permission(&request, Some(tenant_id))?;
let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let state = get_state(&request);
mgr::delete_tenant(state.conf, state.remote_storage.clone(), tenant_id)
.instrument(info_span!("tenant_delete_handler", %tenant_id))
mgr::delete_tenant(state.conf, state.remote_storage.clone(), tenant_shard_id)
.instrument(info_span!("tenant_delete_handler",
tenant_id = %tenant_shard_id.tenant_id,
shard = tenant_shard_id.shard_slug()
))
.await?;
json_response(StatusCode::ACCEPTED, ())
@@ -1138,9 +1149,10 @@ async fn put_tenant_location_config_handler(
mut request: Request<Body>,
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
let request_data: TenantLocationConfigRequest = json_request(&mut request).await?;
let tenant_id = request_data.tenant_id;
check_permission(&request, Some(tenant_id))?;
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Warn);
let state = get_state(&request);
@@ -1149,9 +1161,13 @@ async fn put_tenant_location_config_handler(
// The `Detached` state is special, it doesn't upsert a tenant, it removes
// its local disk content and drops it from memory.
if let LocationConfigMode::Detached = request_data.config.mode {
if let Err(e) = mgr::detach_tenant(conf, tenant_id, true, &state.deletion_queue_client)
.instrument(info_span!("tenant_detach", %tenant_id))
.await
if let Err(e) =
mgr::detach_tenant(conf, tenant_shard_id, true, &state.deletion_queue_client)
.instrument(info_span!("tenant_detach",
tenant_id = %tenant_shard_id.tenant_id,
shard = tenant_shard_id.shard_slug()
))
.await
{
match e {
TenantStateError::SlotError(TenantSlotError::NotFound(_)) => {
@@ -1168,7 +1184,7 @@ async fn put_tenant_location_config_handler(
state
.tenant_manager
.upsert_location(tenant_id, location_conf, &ctx)
.upsert_location(tenant_shard_id, location_conf, &ctx)
.await
// TODO: badrequest assumes the caller was asking for something unreasonable, but in
// principle we might have hit something like concurrent API calls to the same tenant,
@@ -1478,7 +1494,7 @@ async fn timeline_collect_keyspace(
let keys = timeline
.collect_keyspace(at_lsn, &ctx)
.await
.map_err(ApiError::InternalServerError)?;
.map_err(|e| ApiError::InternalServerError(e.into()))?;
json_response(StatusCode::OK, Partitioning { keys, at_lsn })
}
@@ -1752,7 +1768,7 @@ pub fn make_router(
.get("/v1/tenant", |r| api_handler(r, tenant_list_handler))
.post("/v1/tenant", |r| api_handler(r, tenant_create_handler))
.get("/v1/tenant/:tenant_id", |r| api_handler(r, tenant_status))
.delete("/v1/tenant/:tenant_id", |r| {
.delete("/v1/tenant/:tenant_shard_id", |r| {
api_handler(r, tenant_delete_handler)
})
.get("/v1/tenant/:tenant_id/synthetic_size", |r| {
@@ -1764,13 +1780,13 @@ pub fn make_router(
.get("/v1/tenant/:tenant_id/config", |r| {
api_handler(r, get_tenant_config_handler)
})
.put("/v1/tenant/:tenant_id/location_config", |r| {
.put("/v1/tenant/:tenant_shard_id/location_config", |r| {
api_handler(r, put_tenant_location_config_handler)
})
.get("/v1/tenant/:tenant_id/timeline", |r| {
api_handler(r, timeline_list_handler)
})
.post("/v1/tenant/:tenant_id/timeline", |r| {
.post("/v1/tenant/:tenant_shard_id/timeline", |r| {
api_handler(r, timeline_create_handler)
})
.post("/v1/tenant/:tenant_id/attach", |r| {
@@ -1814,7 +1830,7 @@ pub fn make_router(
"/v1/tenant/:tenant_id/timeline/:timeline_id/download_remote_layers",
|r| api_handler(r, timeline_download_remote_layers_handler_get),
)
.delete("/v1/tenant/:tenant_id/timeline/:timeline_id", |r| {
.delete("/v1/tenant/:tenant_shard_id/timeline/:timeline_id", |r| {
api_handler(r, timeline_delete_handler)
})
.get("/v1/tenant/:tenant_id/timeline/:timeline_id/layer", |r| {

View File

@@ -638,7 +638,7 @@ const STORAGE_IO_TIME_BUCKETS: &[f64] = &[
///
/// Operations:
/// - open ([`std::fs::OpenOptions::open`])
/// - close (dropping [`std::fs::File`])
/// - close (dropping [`crate::virtual_file::VirtualFile`])
/// - close-by-replace (close by replacement algorithm)
/// - read (`read_at`)
/// - write (`write_at`)
@@ -1252,6 +1252,46 @@ pub(crate) static WAL_REDO_RECORD_COUNTER: Lazy<IntCounter> = Lazy::new(|| {
.unwrap()
});
pub(crate) struct WalRedoProcessCounters {
pub(crate) started: IntCounter,
pub(crate) killed_by_cause: enum_map::EnumMap<WalRedoKillCause, IntCounter>,
}
#[derive(Debug, enum_map::Enum, strum_macros::IntoStaticStr)]
pub(crate) enum WalRedoKillCause {
WalRedoProcessDrop,
NoLeakChildDrop,
Startup,
}
impl Default for WalRedoProcessCounters {
fn default() -> Self {
let started = register_int_counter!(
"pageserver_wal_redo_process_started_total",
"Number of WAL redo processes started",
)
.unwrap();
let killed = register_int_counter_vec!(
"pageserver_wal_redo_process_stopped_total",
"Number of WAL redo processes stopped",
&["cause"],
)
.unwrap();
Self {
started,
killed_by_cause: EnumMap::from_array(std::array::from_fn(|i| {
let cause = <WalRedoKillCause as enum_map::Enum>::from_usize(i);
let cause_str: &'static str = cause.into();
killed.with_label_values(&[cause_str])
})),
}
}
}
pub(crate) static WAL_REDO_PROCESS_COUNTERS: Lazy<WalRedoProcessCounters> =
Lazy::new(WalRedoProcessCounters::default);
/// Similar to `prometheus::HistogramTimer` but does not record on drop.
pub struct StorageTimeMetricsTimer {
metrics: StorageTimeMetrics,

View File

@@ -22,6 +22,7 @@ use std::collections::{hash_map, HashMap, HashSet};
use std::ops::ControlFlow;
use std::ops::Range;
use tracing::{debug, trace, warn};
use utils::bin_ser::DeserializeError;
use utils::{bin_ser::BeSer, lsn::Lsn};
/// Block number within a relation or SLRU. This matches PostgreSQL's BlockNumber type.
@@ -29,9 +30,33 @@ pub type BlockNumber = u32;
#[derive(Debug)]
pub enum LsnForTimestamp {
/// Found commits both before and after the given timestamp
Present(Lsn),
/// Found no commits after the given timestamp, this means
/// that the newest data in the branch is older than the given
/// timestamp.
///
/// All commits <= LSN happened before the given timestamp
Future(Lsn),
/// The queried timestamp is past our horizon we look back at (PITR)
///
/// All commits > LSN happened after the given timestamp,
/// but any commits < LSN might have happened before or after
/// the given timestamp. We don't know because no data before
/// the given lsn is available.
Past(Lsn),
/// We have found no commit with a timestamp,
/// so we can't return anything meaningful.
///
/// The associated LSN is the lower bound value we can safely
/// create branches on, but no statement is made if it is
/// older or newer than the timestamp.
///
/// This variant can e.g. be returned right after a
/// cluster import.
NoData(Lsn),
}
@@ -43,6 +68,25 @@ pub enum CalculateLogicalSizeError {
Other(#[from] anyhow::Error),
}
#[derive(Debug, thiserror::Error)]
pub(crate) enum CollectKeySpaceError {
#[error(transparent)]
Decode(#[from] DeserializeError),
#[error(transparent)]
PageRead(PageReconstructError),
#[error("cancelled")]
Cancelled,
}
impl From<PageReconstructError> for CollectKeySpaceError {
fn from(err: PageReconstructError) -> Self {
match err {
PageReconstructError::Cancelled => Self::Cancelled,
err => Self::PageRead(err),
}
}
}
impl From<PageReconstructError> for CalculateLogicalSizeError {
fn from(pre: PageReconstructError) -> Self {
match pre {
@@ -324,7 +368,11 @@ impl Timeline {
ctx: &RequestContext,
) -> Result<LsnForTimestamp, PageReconstructError> {
let gc_cutoff_lsn_guard = self.get_latest_gc_cutoff_lsn();
let min_lsn = *gc_cutoff_lsn_guard;
// We use this method to figure out the branching LSN for the new branch, but the
// GC cutoff could be before the branching point and we cannot create a new branch
// with LSN < `ancestor_lsn`. Thus, pick the maximum of these two to be
// on the safe side.
let min_lsn = std::cmp::max(*gc_cutoff_lsn_guard, self.get_ancestor_lsn());
let max_lsn = self.get_last_record_lsn();
// LSNs are always 8-byte aligned. low/mid/high represent the
@@ -354,30 +402,33 @@ impl Timeline {
low = mid + 1;
}
}
// If `found_smaller == true`, `low = t + 1` where `t` is the target LSN,
// so the LSN of the last commit record before or at `search_timestamp`.
// Remove one from `low` to get `t`.
//
// FIXME: it would be better to get the LSN of the previous commit.
// Otherwise, if you restore to the returned LSN, the database will
// include physical changes from later commits that will be marked
// as aborted, and will need to be vacuumed away.
let commit_lsn = Lsn((low - 1) * 8);
match (found_smaller, found_larger) {
(false, false) => {
// This can happen if no commit records have been processed yet, e.g.
// just after importing a cluster.
Ok(LsnForTimestamp::NoData(max_lsn))
}
(true, false) => {
// Didn't find any commit timestamps larger than the request
Ok(LsnForTimestamp::Future(max_lsn))
Ok(LsnForTimestamp::NoData(min_lsn))
}
(false, true) => {
// Didn't find any commit timestamps smaller than the request
Ok(LsnForTimestamp::Past(max_lsn))
Ok(LsnForTimestamp::Past(min_lsn))
}
(true, true) => {
// low is the LSN of the first commit record *after* the search_timestamp,
// Back off by one to get to the point just before the commit.
//
// FIXME: it would be better to get the LSN of the previous commit.
// Otherwise, if you restore to the returned LSN, the database will
// include physical changes from later commits that will be marked
// as aborted, and will need to be vacuumed away.
Ok(LsnForTimestamp::Present(Lsn((low - 1) * 8)))
(true, false) => {
// Only found commits with timestamps smaller than the request.
// It's still a valid case for branch creation, return it.
// And `update_gc_info()` ignores LSN for a `LsnForTimestamp::Future`
// case, anyway.
Ok(LsnForTimestamp::Future(commit_lsn))
}
(true, true) => Ok(LsnForTimestamp::Present(commit_lsn)),
}
}
@@ -605,11 +656,11 @@ impl Timeline {
/// Get a KeySpace that covers all the Keys that are in use at the given LSN.
/// Anything that's not listed maybe removed from the underlying storage (from
/// that LSN forwards).
pub async fn collect_keyspace(
pub(crate) async fn collect_keyspace(
&self,
lsn: Lsn,
ctx: &RequestContext,
) -> anyhow::Result<KeySpace> {
) -> Result<KeySpace, CollectKeySpaceError> {
// Iterate through key ranges, greedily packing them into partitions
let mut result = KeySpaceAccum::new();
@@ -618,7 +669,7 @@ impl Timeline {
// Fetch list of database dirs and iterate them
let buf = self.get(DBDIR_KEY, lsn, ctx).await?;
let dbdir = DbDirectory::des(&buf).context("deserialization failure")?;
let dbdir = DbDirectory::des(&buf)?;
let mut dbs: Vec<(Oid, Oid)> = dbdir.dbdirs.keys().cloned().collect();
dbs.sort_unstable();
@@ -651,7 +702,7 @@ impl Timeline {
let slrudir_key = slru_dir_to_key(kind);
result.add_key(slrudir_key);
let buf = self.get(slrudir_key, lsn, ctx).await?;
let dir = SlruSegmentDirectory::des(&buf).context("deserialization failure")?;
let dir = SlruSegmentDirectory::des(&buf)?;
let mut segments: Vec<u32> = dir.segments.iter().cloned().collect();
segments.sort_unstable();
for segno in segments {
@@ -669,7 +720,7 @@ impl Timeline {
// Then pg_twophase
result.add_key(TWOPHASEDIR_KEY);
let buf = self.get(TWOPHASEDIR_KEY, lsn, ctx).await?;
let twophase_dir = TwoPhaseDirectory::des(&buf).context("deserialization failure")?;
let twophase_dir = TwoPhaseDirectory::des(&buf)?;
let mut xids: Vec<TransactionId> = twophase_dir.xids.iter().cloned().collect();
xids.sort_unstable();
for xid in xids {

View File

@@ -1,106 +1,11 @@
use crate::walrecord::NeonWalRecord;
use anyhow::{bail, Result};
use byteorder::{ByteOrder, BE};
use anyhow::Result;
use bytes::Bytes;
use serde::{Deserialize, Serialize};
use std::fmt;
use std::ops::{AddAssign, Range};
use std::time::Duration;
/// Key used in the Repository kv-store.
///
/// The Repository treats this as an opaque struct, but see the code in pgdatadir_mapping.rs
/// for what we actually store in these fields.
#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, Ord, PartialOrd, Serialize, Deserialize)]
pub struct Key {
pub field1: u8,
pub field2: u32,
pub field3: u32,
pub field4: u32,
pub field5: u8,
pub field6: u32,
}
pub const KEY_SIZE: usize = 18;
impl Key {
/// 'field2' is used to store tablespaceid for relations and small enum numbers for other relish.
/// As long as Neon does not support tablespace (because of lack of access to local file system),
/// we can assume that only some predefined namespace OIDs are used which can fit in u16
pub fn to_i128(&self) -> i128 {
assert!(self.field2 < 0xFFFF || self.field2 == 0xFFFFFFFF || self.field2 == 0x22222222);
(((self.field1 & 0xf) as i128) << 120)
| (((self.field2 & 0xFFFF) as i128) << 104)
| ((self.field3 as i128) << 72)
| ((self.field4 as i128) << 40)
| ((self.field5 as i128) << 32)
| self.field6 as i128
}
pub const fn from_i128(x: i128) -> Self {
Key {
field1: ((x >> 120) & 0xf) as u8,
field2: ((x >> 104) & 0xFFFF) as u32,
field3: (x >> 72) as u32,
field4: (x >> 40) as u32,
field5: (x >> 32) as u8,
field6: x as u32,
}
}
pub fn next(&self) -> Key {
self.add(1)
}
pub fn add(&self, x: u32) -> Key {
let mut key = *self;
let r = key.field6.overflowing_add(x);
key.field6 = r.0;
if r.1 {
let r = key.field5.overflowing_add(1);
key.field5 = r.0;
if r.1 {
let r = key.field4.overflowing_add(1);
key.field4 = r.0;
if r.1 {
let r = key.field3.overflowing_add(1);
key.field3 = r.0;
if r.1 {
let r = key.field2.overflowing_add(1);
key.field2 = r.0;
if r.1 {
let r = key.field1.overflowing_add(1);
key.field1 = r.0;
assert!(!r.1);
}
}
}
}
}
key
}
pub fn from_slice(b: &[u8]) -> Self {
Key {
field1: b[0],
field2: u32::from_be_bytes(b[1..5].try_into().unwrap()),
field3: u32::from_be_bytes(b[5..9].try_into().unwrap()),
field4: u32::from_be_bytes(b[9..13].try_into().unwrap()),
field5: b[13],
field6: u32::from_be_bytes(b[14..18].try_into().unwrap()),
}
}
pub fn write_to_byte_slice(&self, buf: &mut [u8]) {
buf[0] = self.field1;
BE::write_u32(&mut buf[1..5], self.field2);
BE::write_u32(&mut buf[5..9], self.field3);
BE::write_u32(&mut buf[9..13], self.field4);
buf[13] = self.field5;
BE::write_u32(&mut buf[14..18], self.field6);
}
}
pub use pageserver_api::key::{Key, KEY_SIZE};
pub fn key_range_size(key_range: &Range<Key>) -> u32 {
let start = key_range.start;
@@ -129,51 +34,9 @@ pub fn singleton_range(key: Key) -> Range<Key> {
key..key.next()
}
impl fmt::Display for Key {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"{:02X}{:08X}{:08X}{:08X}{:02X}{:08X}",
self.field1, self.field2, self.field3, self.field4, self.field5, self.field6
)
}
}
impl Key {
pub const MIN: Key = Key {
field1: u8::MIN,
field2: u32::MIN,
field3: u32::MIN,
field4: u32::MIN,
field5: u8::MIN,
field6: u32::MIN,
};
pub const MAX: Key = Key {
field1: u8::MAX,
field2: u32::MAX,
field3: u32::MAX,
field4: u32::MAX,
field5: u8::MAX,
field6: u32::MAX,
};
pub fn from_hex(s: &str) -> Result<Self> {
if s.len() != 36 {
bail!("parse error");
}
Ok(Key {
field1: u8::from_str_radix(&s[0..2], 16)?,
field2: u32::from_str_radix(&s[2..10], 16)?,
field3: u32::from_str_radix(&s[10..18], 16)?,
field4: u32::from_str_radix(&s[18..26], 16)?,
field5: u8::from_str_radix(&s[26..28], 16)?,
field6: u32::from_str_radix(&s[28..36], 16)?,
})
}
}
/// A 'value' stored for a one Key.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[cfg_attr(test, derive(PartialEq))]
pub enum Value {
/// An Image value contains a full copy of the value
Image(Bytes),
@@ -197,6 +60,70 @@ impl Value {
}
}
#[cfg(test)]
mod test {
use super::*;
use bytes::Bytes;
use utils::bin_ser::BeSer;
macro_rules! roundtrip {
($orig:expr, $expected:expr) => {{
let orig: Value = $orig;
let actual = Value::ser(&orig).unwrap();
let expected: &[u8] = &$expected;
assert_eq!(utils::Hex(&actual), utils::Hex(expected));
let deser = Value::des(&actual).unwrap();
assert_eq!(orig, deser);
}};
}
#[test]
fn image_roundtrip() {
let image = Bytes::from_static(b"foobar");
let image = Value::Image(image);
#[rustfmt::skip]
let expected = [
// top level discriminator of 4 bytes
0x00, 0x00, 0x00, 0x00,
// 8 byte length
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06,
// foobar
0x66, 0x6f, 0x6f, 0x62, 0x61, 0x72
];
roundtrip!(image, expected);
}
#[test]
fn walrecord_postgres_roundtrip() {
let rec = NeonWalRecord::Postgres {
will_init: true,
rec: Bytes::from_static(b"foobar"),
};
let rec = Value::WalRecord(rec);
#[rustfmt::skip]
let expected = [
// flattened discriminator of total 8 bytes
0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00,
// will_init
0x01,
// 8 byte length
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06,
// foobar
0x66, 0x6f, 0x6f, 0x62, 0x61, 0x72
];
roundtrip!(rec, expected);
}
}
///
/// Result of performing GC
///

View File

@@ -291,6 +291,16 @@ impl From<harness::TestRedoManager> for WalRedoManager {
}
impl WalRedoManager {
pub(crate) fn maybe_quiesce(&self, idle_timeout: Duration) {
match self {
Self::Prod(mgr) => mgr.maybe_quiesce(idle_timeout),
#[cfg(test)]
Self::Test(_) => {
// Not applicable to test redo manager
}
}
}
pub async fn request_redo(
&self,
key: crate::repository::Key,
@@ -1649,22 +1659,16 @@ impl Tenant {
/// This function is periodically called by compactor task.
/// Also it can be explicitly requested per timeline through page server
/// api's 'compact' command.
pub async fn compaction_iteration(
async fn compaction_iteration(
&self,
cancel: &CancellationToken,
ctx: &RequestContext,
) -> anyhow::Result<()> {
// Don't start doing work during shutdown
if let TenantState::Stopping { .. } = self.current_state() {
) -> anyhow::Result<(), timeline::CompactionError> {
// Don't start doing work during shutdown, or when broken, we do not need those in the logs
if !self.is_active() {
return Ok(());
}
// We should only be called once the tenant has activated.
anyhow::ensure!(
self.is_active(),
"Cannot run compaction iteration on inactive tenant"
);
{
let conf = self.tenant_conf.read().unwrap();
if !conf.location.may_delete_layers_hint() || !conf.location.may_upload_layers_hint() {

View File

@@ -2,9 +2,10 @@
//! page server.
use camino::{Utf8DirEntry, Utf8Path, Utf8PathBuf};
use pageserver_api::shard::TenantShardId;
use rand::{distributions::Alphanumeric, Rng};
use std::borrow::Cow;
use std::collections::HashMap;
use std::collections::{BTreeMap, HashMap};
use std::ops::Deref;
use std::sync::Arc;
use std::time::{Duration, Instant};
@@ -30,6 +31,7 @@ use crate::metrics::TENANT_MANAGER as METRICS;
use crate::task_mgr::{self, TaskKind};
use crate::tenant::config::{AttachmentMode, LocationConf, LocationMode, TenantConfOpt};
use crate::tenant::delete::DeleteTenantFlow;
use crate::tenant::span::debug_assert_current_span_has_tenant_id;
use crate::tenant::{create_tenant_files, AttachedTenantConf, SpawnMode, Tenant, TenantState};
use crate::{InitializationOrder, IGNORED_TENANT_FILE_NAME, TEMP_FILE_SUFFIX};
@@ -87,10 +89,37 @@ pub(crate) enum TenantsMap {
Initializing,
/// [`init_tenant_mgr`] is done, all on-disk tenants have been loaded.
/// New tenants can be added using [`tenant_map_acquire_slot`].
Open(HashMap<TenantId, TenantSlot>),
Open(BTreeMap<TenantShardId, TenantSlot>),
/// The pageserver has entered shutdown mode via [`shutdown_all_tenants`].
/// Existing tenants are still accessible, but no new tenants can be created.
ShuttingDown(HashMap<TenantId, TenantSlot>),
ShuttingDown(BTreeMap<TenantShardId, TenantSlot>),
}
/// Helper for mapping shard-unaware functions to a sharding-aware map
/// TODO(sharding): all users of this must be made shard-aware.
fn exactly_one_or_none<'a>(
map: &'a BTreeMap<TenantShardId, TenantSlot>,
tenant_id: &TenantId,
) -> Option<(&'a TenantShardId, &'a TenantSlot)> {
let mut slots = map.range(TenantShardId::tenant_range(*tenant_id));
// Retrieve the first two slots in the range: if both are populated, we must panic because the caller
// needs a shard-naive view of the world in which only one slot can exist for a TenantId at a time.
let slot_a = slots.next();
let slot_b = slots.next();
match (slot_a, slot_b) {
(None, None) => None,
(Some(slot), None) => {
// Exactly one matching slot
Some(slot)
}
(Some(_slot_a), Some(_slot_b)) => {
// Multiple shards for this tenant: cannot handle this yet.
// TODO(sharding): callers of get() should be shard-aware.
todo!("Attaching multiple shards in teh same tenant to the same pageserver")
}
(None, Some(_)) => unreachable!(),
}
}
impl TenantsMap {
@@ -101,7 +130,8 @@ impl TenantsMap {
match self {
TenantsMap::Initializing => None,
TenantsMap::Open(m) | TenantsMap::ShuttingDown(m) => {
m.get(tenant_id).and_then(TenantSlot::get_attached)
// TODO(sharding): callers of get() should be shard-aware.
exactly_one_or_none(m, tenant_id).and_then(|(_, slot)| slot.get_attached())
}
}
}
@@ -109,7 +139,10 @@ impl TenantsMap {
pub(crate) fn remove(&mut self, tenant_id: &TenantId) -> Option<TenantSlot> {
match self {
TenantsMap::Initializing => None,
TenantsMap::Open(m) | TenantsMap::ShuttingDown(m) => m.remove(tenant_id),
TenantsMap::Open(m) | TenantsMap::ShuttingDown(m) => {
let key = exactly_one_or_none(m, tenant_id).map(|(k, _)| *k);
key.and_then(|key| m.remove(&key))
}
}
}
@@ -383,7 +416,7 @@ pub async fn init_tenant_mgr(
init_order: InitializationOrder,
cancel: CancellationToken,
) -> anyhow::Result<TenantManager> {
let mut tenants = HashMap::new();
let mut tenants = BTreeMap::new();
let ctx = RequestContext::todo_child(TaskKind::Startup, DownloadBehavior::Warn);
@@ -404,7 +437,7 @@ pub async fn init_tenant_mgr(
warn!(%tenant_id, "Marking tenant broken, failed to {e:#}");
tenants.insert(
tenant_id,
TenantShardId::unsharded(tenant_id),
TenantSlot::Attached(Tenant::create_broken_tenant(
conf,
tenant_id,
@@ -427,7 +460,7 @@ pub async fn init_tenant_mgr(
// tenants, because they do no remote writes and hence require no
// generation number
info!(%tenant_id, "Loaded tenant in secondary mode");
tenants.insert(tenant_id, TenantSlot::Secondary);
tenants.insert(TenantShardId::unsharded(tenant_id), TenantSlot::Secondary);
}
LocationMode::Attached(_) => {
// TODO: augment re-attach API to enable the control plane to
@@ -470,7 +503,10 @@ pub async fn init_tenant_mgr(
&ctx,
) {
Ok(tenant) => {
tenants.insert(tenant.tenant_id(), TenantSlot::Attached(tenant));
tenants.insert(
TenantShardId::unsharded(tenant.tenant_id()),
TenantSlot::Attached(tenant),
);
}
Err(e) => {
error!(%tenant_id, "Failed to start tenant: {e:#}");
@@ -566,89 +602,80 @@ pub(crate) async fn shutdown_all_tenants() {
async fn shutdown_all_tenants0(tenants: &std::sync::RwLock<TenantsMap>) {
use utils::completion;
// Atomically, 1. extract the list of tenants to shut down and 2. prevent creation of new tenants.
let (in_progress_ops, tenants_to_shut_down) = {
let mut join_set = JoinSet::new();
// Atomically, 1. create the shutdown tasks and 2. prevent creation of new tenants.
let (total_in_progress, total_attached) = {
let mut m = tenants.write().unwrap();
match &mut *m {
TenantsMap::Initializing => {
*m = TenantsMap::ShuttingDown(HashMap::default());
*m = TenantsMap::ShuttingDown(BTreeMap::default());
info!("tenants map is empty");
return;
}
TenantsMap::Open(tenants) => {
let mut shutdown_state = HashMap::new();
let mut in_progress_ops = Vec::new();
let mut tenants_to_shut_down = Vec::new();
let mut shutdown_state = BTreeMap::new();
let mut total_in_progress = 0;
let mut total_attached = 0;
for (k, v) in tenants.drain() {
for (tenant_shard_id, v) in std::mem::take(tenants).into_iter() {
match v {
TenantSlot::Attached(t) => {
tenants_to_shut_down.push(t.clone());
shutdown_state.insert(k, TenantSlot::Attached(t));
shutdown_state.insert(tenant_shard_id, TenantSlot::Attached(t.clone()));
join_set.spawn(
async move {
let freeze_and_flush = true;
let res = {
let (_guard, shutdown_progress) = completion::channel();
t.shutdown(shutdown_progress, freeze_and_flush).await
};
if let Err(other_progress) = res {
// join the another shutdown in progress
other_progress.wait().await;
}
// we cannot afford per tenant logging here, because if s3 is degraded, we are
// going to log too many lines
debug!("tenant successfully stopped");
}
.instrument(info_span!("shutdown", tenant_id=%tenant_shard_id.tenant_id, shard=%tenant_shard_id.shard_slug())),
);
total_attached += 1;
}
TenantSlot::Secondary => {
shutdown_state.insert(k, TenantSlot::Secondary);
shutdown_state.insert(tenant_shard_id, TenantSlot::Secondary);
}
TenantSlot::InProgress(notify) => {
// InProgress tenants are not visible in TenantsMap::ShuttingDown: we will
// wait for their notifications to fire in this function.
in_progress_ops.push(notify);
join_set.spawn(async move {
notify.wait().await;
});
total_in_progress += 1;
}
}
}
*m = TenantsMap::ShuttingDown(shutdown_state);
(in_progress_ops, tenants_to_shut_down)
(total_in_progress, total_attached)
}
TenantsMap::ShuttingDown(_) => {
// TODO: it is possible that detach and shutdown happen at the same time. as a
// result, during shutdown we do not wait for detach.
error!("already shutting down, this function isn't supposed to be called more than once");
return;
}
}
};
let started_at = std::time::Instant::now();
info!(
"Waiting for {} InProgress tenants and {} Attached tenants to shut down",
in_progress_ops.len(),
tenants_to_shut_down.len()
total_in_progress, total_attached
);
for barrier in in_progress_ops {
barrier.wait().await;
}
info!(
"InProgress tenants shut down, waiting for {} Attached tenants to shut down",
tenants_to_shut_down.len()
);
let started_at = std::time::Instant::now();
let mut join_set = JoinSet::new();
for tenant in tenants_to_shut_down {
let tenant_id = tenant.get_tenant_id();
join_set.spawn(
async move {
let freeze_and_flush = true;
let res = {
let (_guard, shutdown_progress) = completion::channel();
tenant.shutdown(shutdown_progress, freeze_and_flush).await
};
if let Err(other_progress) = res {
// join the another shutdown in progress
other_progress.wait().await;
}
// we cannot afford per tenant logging here, because if s3 is degraded, we are
// going to log too many lines
debug!("tenant successfully stopped");
}
.instrument(info_span!("shutdown", %tenant_id)),
);
}
let total = join_set.len();
let mut panicked = 0;
let mut buffering = true;
@@ -661,7 +688,7 @@ async fn shutdown_all_tenants0(tenants: &std::sync::RwLock<TenantsMap>) {
match joined {
Ok(()) => {}
Err(join_error) if join_error.is_cancelled() => {
unreachable!("we are not cancelling any of the futures");
unreachable!("we are not cancelling any of the tasks");
}
Err(join_error) if join_error.is_panic() => {
// cannot really do anything, as this panic is likely a bug
@@ -699,19 +726,22 @@ async fn shutdown_all_tenants0(tenants: &std::sync::RwLock<TenantsMap>) {
pub(crate) async fn create_tenant(
conf: &'static PageServerConf,
tenant_conf: TenantConfOpt,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
generation: Generation,
resources: TenantSharedResources,
ctx: &RequestContext,
) -> Result<Arc<Tenant>, TenantMapInsertError> {
let location_conf = LocationConf::attached_single(tenant_conf, generation);
let slot_guard = tenant_map_acquire_slot(&tenant_id, TenantSlotAcquireMode::MustNotExist)?;
let tenant_path = super::create_tenant_files(conf, &location_conf, &tenant_id).await?;
let slot_guard =
tenant_map_acquire_slot(&tenant_shard_id, TenantSlotAcquireMode::MustNotExist)?;
// TODO(sharding): make local paths shard-aware
let tenant_path =
super::create_tenant_files(conf, &location_conf, &tenant_shard_id.tenant_id).await?;
let created_tenant = tenant_spawn(
conf,
tenant_id,
tenant_shard_id.tenant_id,
&tenant_path,
resources,
AttachedTenantConf::try_from(location_conf)?,
@@ -724,11 +754,7 @@ pub(crate) async fn create_tenant(
// See https://github.com/neondatabase/neon/issues/4233
let created_tenant_id = created_tenant.tenant_id();
if tenant_id != created_tenant_id {
return Err(TenantMapInsertError::Other(anyhow::anyhow!(
"loaded created tenant has unexpected tenant id (expect {tenant_id} != actual {created_tenant_id})",
)));
}
debug_assert_eq!(created_tenant_id, tenant_shard_id.tenant_id);
slot_guard.upsert(TenantSlot::Attached(created_tenant.clone()))?;
@@ -764,21 +790,70 @@ pub(crate) async fn set_new_tenant_config(
}
impl TenantManager {
#[instrument(skip_all, fields(%tenant_id))]
/// Gets the attached tenant from the in-memory data, erroring if it's absent, in secondary mode, or is not fitting to the query.
/// `active_only = true` allows to query only tenants that are ready for operations, erroring on other kinds of tenants.
///
/// This method is cancel-safe.
pub(crate) fn get_attached_tenant_shard(
&self,
tenant_shard_id: TenantShardId,
active_only: bool,
) -> Result<Arc<Tenant>, GetTenantError> {
let locked = self.tenants.read().unwrap();
let peek_slot = tenant_map_peek_slot(&locked, &tenant_shard_id, TenantSlotPeekMode::Read)?;
match peek_slot {
Some(TenantSlot::Attached(tenant)) => match tenant.current_state() {
TenantState::Broken {
reason,
backtrace: _,
} if active_only => Err(GetTenantError::Broken(reason)),
TenantState::Active => Ok(Arc::clone(tenant)),
_ => {
if active_only {
Err(GetTenantError::NotActive(tenant_shard_id.tenant_id))
} else {
Ok(Arc::clone(tenant))
}
}
},
Some(TenantSlot::InProgress(_)) => {
Err(GetTenantError::NotActive(tenant_shard_id.tenant_id))
}
None | Some(TenantSlot::Secondary) => {
Err(GetTenantError::NotFound(tenant_shard_id.tenant_id))
}
}
}
pub(crate) async fn delete_timeline(
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
_ctx: &RequestContext,
) -> Result<(), DeleteTimelineError> {
let tenant = self.get_attached_tenant_shard(tenant_shard_id, true)?;
DeleteTimelineFlow::run(&tenant, timeline_id, false).await?;
Ok(())
}
pub(crate) async fn upsert_location(
&self,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
new_location_config: LocationConf,
ctx: &RequestContext,
) -> Result<(), anyhow::Error> {
info!("configuring tenant location {tenant_id} to state {new_location_config:?}");
debug_assert_current_span_has_tenant_id();
info!("configuring tenant location to state {new_location_config:?}");
// Special case fast-path for updates to Tenant: if our upsert is only updating configuration,
// then we do not need to set the slot to InProgress, we can just call into the
// existng tenant.
{
let locked = self.tenants.read().unwrap();
let peek_slot = tenant_map_peek_slot(&locked, &tenant_id, TenantSlotPeekMode::Write)?;
let peek_slot =
tenant_map_peek_slot(&locked, &tenant_shard_id, TenantSlotPeekMode::Write)?;
match (&new_location_config.mode, peek_slot) {
(LocationMode::Attached(attach_conf), Some(TenantSlot::Attached(tenant))) => {
if attach_conf.generation == tenant.generation {
@@ -809,7 +884,7 @@ impl TenantManager {
// the tenant is inaccessible to the outside world while we are doing this, but that is sensible:
// the state is ill-defined while we're in transition. Transitions are async, but fast: we do
// not do significant I/O, and shutdowns should be prompt via cancellation tokens.
let mut slot_guard = tenant_map_acquire_slot(&tenant_id, TenantSlotAcquireMode::Any)?;
let mut slot_guard = tenant_map_acquire_slot(&tenant_shard_id, TenantSlotAcquireMode::Any)?;
if let Some(TenantSlot::Attached(tenant)) = slot_guard.get_old_value() {
// The case where we keep a Tenant alive was covered above in the special case
@@ -840,25 +915,31 @@ impl TenantManager {
slot_guard.drop_old_value().expect("We just shut it down");
}
let tenant_path = self.conf.tenant_path(&tenant_id);
// TODO(sharding): make local paths sharding-aware
let tenant_path = self.conf.tenant_path(&tenant_shard_id.tenant_id);
let new_slot = match &new_location_config.mode {
LocationMode::Secondary(_) => {
let tenant_path = self.conf.tenant_path(&tenant_id);
// Directory doesn't need to be fsync'd because if we crash it can
// safely be recreated next time this tenant location is configured.
unsafe_create_dir_all(&tenant_path)
.await
.with_context(|| format!("Creating {tenant_path}"))?;
Tenant::persist_tenant_config(self.conf, &tenant_id, &new_location_config)
.await
.map_err(SetNewTenantConfigError::Persist)?;
// TODO(sharding): make local paths sharding-aware
Tenant::persist_tenant_config(
self.conf,
&tenant_shard_id.tenant_id,
&new_location_config,
)
.await
.map_err(SetNewTenantConfigError::Persist)?;
TenantSlot::Secondary
}
LocationMode::Attached(_attach_config) => {
let timelines_path = self.conf.timelines_path(&tenant_id);
// TODO(sharding): make local paths sharding-aware
let timelines_path = self.conf.timelines_path(&tenant_shard_id.tenant_id);
// Directory doesn't need to be fsync'd because we do not depend on
// it to exist after crashes: it may be recreated when tenant is
@@ -867,13 +948,19 @@ impl TenantManager {
.await
.with_context(|| format!("Creating {timelines_path}"))?;
Tenant::persist_tenant_config(self.conf, &tenant_id, &new_location_config)
.await
.map_err(SetNewTenantConfigError::Persist)?;
// TODO(sharding): make local paths sharding-aware
Tenant::persist_tenant_config(
self.conf,
&tenant_shard_id.tenant_id,
&new_location_config,
)
.await
.map_err(SetNewTenantConfigError::Persist)?;
// TODO(sharding): make spawn sharding-aware
let tenant = tenant_spawn(
self.conf,
tenant_id,
tenant_shard_id.tenant_id,
&tenant_path,
self.resources.clone(),
AttachedTenantConf::try_from(new_location_config)?,
@@ -919,7 +1006,11 @@ pub(crate) fn get_tenant(
active_only: bool,
) -> Result<Arc<Tenant>, GetTenantError> {
let locked = TENANTS.read().unwrap();
let peek_slot = tenant_map_peek_slot(&locked, &tenant_id, TenantSlotPeekMode::Read)?;
// TODO(sharding): make all callers of get_tenant shard-aware
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
let peek_slot = tenant_map_peek_slot(&locked, &tenant_shard_id, TenantSlotPeekMode::Read)?;
match peek_slot {
Some(TenantSlot::Attached(tenant)) => match tenant.current_state() {
@@ -979,12 +1070,16 @@ pub(crate) async fn get_active_tenant_with_timeout(
Tenant(Arc<Tenant>),
}
// TODO(sharding): make page service interface sharding-aware (page service should apply ShardIdentity to the key
// to decide which shard services the request)
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
let wait_start = Instant::now();
let deadline = wait_start + timeout;
let wait_for = {
let locked = TENANTS.read().unwrap();
let peek_slot = tenant_map_peek_slot(&locked, &tenant_id, TenantSlotPeekMode::Read)
let peek_slot = tenant_map_peek_slot(&locked, &tenant_shard_id, TenantSlotPeekMode::Read)
.map_err(GetTenantError::MapState)?;
match peek_slot {
Some(TenantSlot::Attached(tenant)) => {
@@ -1028,8 +1123,9 @@ pub(crate) async fn get_active_tenant_with_timeout(
})?;
{
let locked = TENANTS.read().unwrap();
let peek_slot = tenant_map_peek_slot(&locked, &tenant_id, TenantSlotPeekMode::Read)
.map_err(GetTenantError::MapState)?;
let peek_slot =
tenant_map_peek_slot(&locked, &tenant_shard_id, TenantSlotPeekMode::Read)
.map_err(GetTenantError::MapState)?;
match peek_slot {
Some(TenantSlot::Attached(tenant)) => tenant.clone(),
_ => {
@@ -1071,7 +1167,7 @@ pub(crate) async fn get_active_tenant_with_timeout(
pub(crate) async fn delete_tenant(
conf: &'static PageServerConf,
remote_storage: Option<GenericRemoteStorage>,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
) -> Result<(), DeleteTenantError> {
// We acquire a SlotGuard during this function to protect against concurrent
// changes while the ::prepare phase of DeleteTenantFlow executes, but then
@@ -1084,7 +1180,9 @@ pub(crate) async fn delete_tenant(
//
// See https://github.com/neondatabase/neon/issues/5080
let mut slot_guard = tenant_map_acquire_slot(&tenant_id, TenantSlotAcquireMode::MustExist)?;
// TODO(sharding): make delete API sharding-aware
let mut slot_guard =
tenant_map_acquire_slot(&tenant_shard_id, TenantSlotAcquireMode::MustExist)?;
// unwrap is safe because we used MustExist mode when acquiring
let tenant = match slot_guard.get_old_value().as_ref().unwrap() {
@@ -1111,16 +1209,6 @@ pub(crate) enum DeleteTimelineError {
Timeline(#[from] crate::tenant::DeleteTimelineError),
}
pub(crate) async fn delete_timeline(
tenant_id: TenantId,
timeline_id: TimelineId,
_ctx: &RequestContext,
) -> Result<(), DeleteTimelineError> {
let tenant = get_tenant(tenant_id, true)?;
DeleteTimelineFlow::run(&tenant, timeline_id, false).await?;
Ok(())
}
#[derive(Debug, thiserror::Error)]
pub(crate) enum TenantStateError {
#[error("Tenant {0} is stopping")]
@@ -1135,14 +1223,14 @@ pub(crate) enum TenantStateError {
pub(crate) async fn detach_tenant(
conf: &'static PageServerConf,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
detach_ignored: bool,
deletion_queue_client: &DeletionQueueClient,
) -> Result<(), TenantStateError> {
let tmp_path = detach_tenant0(
conf,
&TENANTS,
tenant_id,
tenant_shard_id,
detach_ignored,
deletion_queue_client,
)
@@ -1169,19 +1257,24 @@ pub(crate) async fn detach_tenant(
async fn detach_tenant0(
conf: &'static PageServerConf,
tenants: &std::sync::RwLock<TenantsMap>,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
detach_ignored: bool,
deletion_queue_client: &DeletionQueueClient,
) -> Result<Utf8PathBuf, TenantStateError> {
let tenant_dir_rename_operation = |tenant_id_to_clean| async move {
let local_tenant_directory = conf.tenant_path(&tenant_id_to_clean);
let tenant_dir_rename_operation = |tenant_id_to_clean: TenantShardId| async move {
// TODO(sharding): make local path helpers shard-aware
let local_tenant_directory = conf.tenant_path(&tenant_id_to_clean.tenant_id);
safe_rename_tenant_dir(&local_tenant_directory)
.await
.with_context(|| format!("local tenant directory {local_tenant_directory:?} rename"))
};
let removal_result =
remove_tenant_from_memory(tenants, tenant_id, tenant_dir_rename_operation(tenant_id)).await;
let removal_result = remove_tenant_from_memory(
tenants,
tenant_shard_id,
tenant_dir_rename_operation(tenant_shard_id),
)
.await;
// Flush pending deletions, so that they have a good chance of passing validation
// before this tenant is potentially re-attached elsewhere.
@@ -1195,12 +1288,15 @@ async fn detach_tenant0(
Err(TenantStateError::SlotError(TenantSlotError::NotFound(_)))
)
{
let tenant_ignore_mark = conf.tenant_ignore_mark_file_path(&tenant_id);
// TODO(sharding): make local paths sharding-aware
let tenant_ignore_mark = conf.tenant_ignore_mark_file_path(&tenant_shard_id.tenant_id);
if tenant_ignore_mark.exists() {
info!("Detaching an ignored tenant");
let tmp_path = tenant_dir_rename_operation(tenant_id)
let tmp_path = tenant_dir_rename_operation(tenant_shard_id)
.await
.with_context(|| format!("Ignored tenant {tenant_id} local directory rename"))?;
.with_context(|| {
format!("Ignored tenant {tenant_shard_id} local directory rename")
})?;
return Ok(tmp_path);
}
}
@@ -1217,7 +1313,11 @@ pub(crate) async fn load_tenant(
deletion_queue_client: DeletionQueueClient,
ctx: &RequestContext,
) -> Result<(), TenantMapInsertError> {
let slot_guard = tenant_map_acquire_slot(&tenant_id, TenantSlotAcquireMode::MustNotExist)?;
// This is a legacy API (replaced by `/location_conf`). It does not support sharding
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
let slot_guard =
tenant_map_acquire_slot(&tenant_shard_id, TenantSlotAcquireMode::MustNotExist)?;
let tenant_path = conf.tenant_path(&tenant_id);
let tenant_ignore_mark = conf.tenant_ignore_mark_file_path(&tenant_id);
@@ -1270,7 +1370,10 @@ async fn ignore_tenant0(
tenants: &std::sync::RwLock<TenantsMap>,
tenant_id: TenantId,
) -> Result<(), TenantStateError> {
remove_tenant_from_memory(tenants, tenant_id, async {
// This is a legacy API (replaced by `/location_conf`). It does not support sharding
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
remove_tenant_from_memory(tenants, tenant_shard_id, async {
let ignore_mark_file = conf.tenant_ignore_mark_file_path(&tenant_id);
fs::File::create(&ignore_mark_file)
.await
@@ -1279,7 +1382,7 @@ async fn ignore_tenant0(
crashsafe::fsync_file_and_parent(&ignore_mark_file)
.context("Failed to fsync ignore mark file")
})
.with_context(|| format!("Failed to crate ignore mark for tenant {tenant_id}"))?;
.with_context(|| format!("Failed to crate ignore mark for tenant {tenant_shard_id}"))?;
Ok(())
})
.await
@@ -1302,10 +1405,12 @@ pub(crate) async fn list_tenants() -> Result<Vec<(TenantId, TenantState)>, Tenan
};
Ok(m.iter()
.filter_map(|(id, tenant)| match tenant {
TenantSlot::Attached(tenant) => Some((*id, tenant.current_state())),
TenantSlot::Attached(tenant) => Some((id, tenant.current_state())),
TenantSlot::Secondary => None,
TenantSlot::InProgress(_) => None,
})
// TODO(sharding): make callers of this function shard-aware
.map(|(k, v)| (k.tenant_id, v))
.collect())
}
@@ -1321,7 +1426,11 @@ pub(crate) async fn attach_tenant(
resources: TenantSharedResources,
ctx: &RequestContext,
) -> Result<(), TenantMapInsertError> {
let slot_guard = tenant_map_acquire_slot(&tenant_id, TenantSlotAcquireMode::MustNotExist)?;
// This is a legacy API (replaced by `/location_conf`). It does not support sharding
let tenant_shard_id = TenantShardId::unsharded(tenant_id);
let slot_guard =
tenant_map_acquire_slot(&tenant_shard_id, TenantSlotAcquireMode::MustNotExist)?;
let location_conf = LocationConf::attached_single(tenant_conf, generation);
let tenant_dir = create_tenant_files(conf, &location_conf, &tenant_id).await?;
// TODO: tenant directory remains on disk if we bail out from here on.
@@ -1368,14 +1477,14 @@ pub(crate) enum TenantMapInsertError {
pub enum TenantSlotError {
/// When acquiring a slot with the expectation that the tenant already exists.
#[error("Tenant {0} not found")]
NotFound(TenantId),
NotFound(TenantShardId),
/// When acquiring a slot with the expectation that the tenant does not already exist.
#[error("tenant {0} already exists, state: {1:?}")]
AlreadyExists(TenantId, TenantState),
AlreadyExists(TenantShardId, TenantState),
#[error("tenant {0} already exists in but is not attached")]
Conflict(TenantId),
Conflict(TenantShardId),
// Tried to read a slot that is currently being mutated by another administrative
// operation.
@@ -1437,7 +1546,7 @@ pub enum TenantMapError {
/// `drop_old_value`. It is an error to call this without shutting down
/// the conents of `old_value`.
pub struct SlotGuard {
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
old_value: Option<TenantSlot>,
upserted: bool,
@@ -1448,12 +1557,12 @@ pub struct SlotGuard {
impl SlotGuard {
fn new(
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
old_value: Option<TenantSlot>,
completion: utils::completion::Completion,
) -> Self {
Self {
tenant_id,
tenant_shard_id,
old_value,
upserted: false,
_completion: completion,
@@ -1496,7 +1605,7 @@ impl SlotGuard {
TenantsMap::Open(m) => m,
};
let replaced = m.insert(self.tenant_id, new_value);
let replaced = m.insert(self.tenant_shard_id, new_value);
self.upserted = true;
METRICS.tenant_slots.set(m.len() as u64);
@@ -1515,7 +1624,7 @@ impl SlotGuard {
None => {
METRICS.unexpected_errors.inc();
error!(
tenant_id = %self.tenant_id,
tenant_shard_id = %self.tenant_shard_id,
"Missing InProgress marker during tenant upsert, this is a bug."
);
Err(TenantSlotUpsertError::InternalError(
@@ -1524,7 +1633,7 @@ impl SlotGuard {
}
Some(slot) => {
METRICS.unexpected_errors.inc();
error!(tenant_id=%self.tenant_id, "Unexpected contents of TenantSlot during upsert, this is a bug. Contents: {:?}", slot);
error!(tenant_shard_id=%self.tenant_shard_id, "Unexpected contents of TenantSlot during upsert, this is a bug. Contents: {:?}", slot);
Err(TenantSlotUpsertError::InternalError(
"Unexpected contents of TenantSlot".into(),
))
@@ -1602,12 +1711,12 @@ impl Drop for SlotGuard {
TenantsMap::Open(m) => m,
};
use std::collections::hash_map::Entry;
match m.entry(self.tenant_id) {
use std::collections::btree_map::Entry;
match m.entry(self.tenant_shard_id) {
Entry::Occupied(mut entry) => {
if !matches!(entry.get(), TenantSlot::InProgress(_)) {
METRICS.unexpected_errors.inc();
error!(tenant_id=%self.tenant_id, "Unexpected contents of TenantSlot during drop, this is a bug. Contents: {:?}", entry.get());
error!(tenant_shard_id=%self.tenant_shard_id, "Unexpected contents of TenantSlot during drop, this is a bug. Contents: {:?}", entry.get());
}
if self.old_value_is_shutdown() {
@@ -1619,7 +1728,7 @@ impl Drop for SlotGuard {
Entry::Vacant(_) => {
METRICS.unexpected_errors.inc();
error!(
tenant_id = %self.tenant_id,
tenant_shard_id = %self.tenant_shard_id,
"Missing InProgress marker during SlotGuard drop, this is a bug."
);
}
@@ -1638,7 +1747,7 @@ enum TenantSlotPeekMode {
fn tenant_map_peek_slot<'a>(
tenants: &'a std::sync::RwLockReadGuard<'a, TenantsMap>,
tenant_id: &TenantId,
tenant_shard_id: &TenantShardId,
mode: TenantSlotPeekMode,
) -> Result<Option<&'a TenantSlot>, TenantMapError> {
let m = match tenants.deref() {
@@ -1652,7 +1761,7 @@ fn tenant_map_peek_slot<'a>(
TenantsMap::Open(m) => m,
};
Ok(m.get(tenant_id))
Ok(m.get(tenant_shard_id))
}
enum TenantSlotAcquireMode {
@@ -1665,14 +1774,14 @@ enum TenantSlotAcquireMode {
}
fn tenant_map_acquire_slot(
tenant_id: &TenantId,
tenant_shard_id: &TenantShardId,
mode: TenantSlotAcquireMode,
) -> Result<SlotGuard, TenantSlotError> {
tenant_map_acquire_slot_impl(tenant_id, &TENANTS, mode)
tenant_map_acquire_slot_impl(tenant_shard_id, &TENANTS, mode)
}
fn tenant_map_acquire_slot_impl(
tenant_id: &TenantId,
tenant_shard_id: &TenantShardId,
tenants: &std::sync::RwLock<TenantsMap>,
mode: TenantSlotAcquireMode,
) -> Result<SlotGuard, TenantSlotError> {
@@ -1680,7 +1789,7 @@ fn tenant_map_acquire_slot_impl(
METRICS.tenant_slot_writes.inc();
let mut locked = tenants.write().unwrap();
let span = tracing::info_span!("acquire_slot", %tenant_id);
let span = tracing::info_span!("acquire_slot", tenant_id=%tenant_shard_id.tenant_id, shard=tenant_shard_id.shard_slug());
let _guard = span.enter();
let m = match &mut *locked {
@@ -1689,19 +1798,21 @@ fn tenant_map_acquire_slot_impl(
TenantsMap::Open(m) => m,
};
use std::collections::hash_map::Entry;
let entry = m.entry(*tenant_id);
use std::collections::btree_map::Entry;
let entry = m.entry(*tenant_shard_id);
match entry {
Entry::Vacant(v) => match mode {
MustExist => {
tracing::debug!("Vacant && MustExist: return NotFound");
Err(TenantSlotError::NotFound(*tenant_id))
Err(TenantSlotError::NotFound(*tenant_shard_id))
}
_ => {
let (completion, barrier) = utils::completion::channel();
v.insert(TenantSlot::InProgress(barrier));
tracing::debug!("Vacant, inserted InProgress");
Ok(SlotGuard::new(*tenant_id, None, completion))
Ok(SlotGuard::new(*tenant_shard_id, None, completion))
}
},
Entry::Occupied(mut o) => {
@@ -1715,7 +1826,7 @@ fn tenant_map_acquire_slot_impl(
TenantSlot::Attached(tenant) => {
tracing::debug!("Attached && MustNotExist, return AlreadyExists");
Err(TenantSlotError::AlreadyExists(
*tenant_id,
*tenant_shard_id,
tenant.current_state(),
))
}
@@ -1724,7 +1835,7 @@ fn tenant_map_acquire_slot_impl(
// to get the state from
tracing::debug!("Occupied & MustNotExist, return AlreadyExists");
Err(TenantSlotError::AlreadyExists(
*tenant_id,
*tenant_shard_id,
TenantState::Broken {
reason: "Present but not attached".to_string(),
backtrace: "".to_string(),
@@ -1737,7 +1848,11 @@ fn tenant_map_acquire_slot_impl(
let (completion, barrier) = utils::completion::channel();
let old_value = o.insert(TenantSlot::InProgress(barrier));
tracing::debug!("Occupied, replaced with InProgress");
Ok(SlotGuard::new(*tenant_id, Some(old_value), completion))
Ok(SlotGuard::new(
*tenant_shard_id,
Some(old_value),
completion,
))
}
}
}
@@ -1750,7 +1865,7 @@ fn tenant_map_acquire_slot_impl(
/// operation would be needed to remove it.
async fn remove_tenant_from_memory<V, F>(
tenants: &std::sync::RwLock<TenantsMap>,
tenant_id: TenantId,
tenant_shard_id: TenantShardId,
tenant_cleanup: F,
) -> Result<V, TenantStateError>
where
@@ -1759,7 +1874,7 @@ where
use utils::completion;
let mut slot_guard =
tenant_map_acquire_slot_impl(&tenant_id, tenants, TenantSlotAcquireMode::MustExist)?;
tenant_map_acquire_slot_impl(&tenant_shard_id, tenants, TenantSlotAcquireMode::MustExist)?;
// The SlotGuard allows us to manipulate the Tenant object without fear of some
// concurrent API request doing something else for the same tenant ID.
@@ -1786,7 +1901,7 @@ where
// if pageserver shutdown or other detach/ignore is already ongoing, we don't want to
// wait for it but return an error right away because these are distinct requests.
slot_guard.revert();
return Err(TenantStateError::IsStopping(tenant_id));
return Err(TenantStateError::IsStopping(tenant_shard_id.tenant_id));
}
}
}
@@ -1797,7 +1912,7 @@ where
match tenant_cleanup
.await
.with_context(|| format!("Failed to run cleanup for tenant {tenant_id}"))
.with_context(|| format!("Failed to run cleanup for tenant {tenant_shard_id}"))
{
Ok(hook_value) => {
// Success: drop the old TenantSlot::Attached.
@@ -1876,7 +1991,8 @@ pub(crate) async fn immediate_gc(
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use pageserver_api::shard::TenantShardId;
use std::collections::BTreeMap;
use std::sync::Arc;
use tracing::{info_span, Instrument};
@@ -1884,7 +2000,7 @@ mod tests {
use super::{super::harness::TenantHarness, TenantsMap};
#[tokio::test]
#[tokio::test(start_paused = true)]
async fn shutdown_awaits_in_progress_tenant() {
// Test that if an InProgress tenant is in the map during shutdown, the shutdown will gracefully
// wait for it to complete before proceeding.
@@ -1896,12 +2012,12 @@ mod tests {
// harness loads it to active, which is forced and nothing is running on the tenant
let id = t.tenant_id();
let id = TenantShardId::unsharded(t.tenant_id());
// tenant harness configures the logging and we cannot escape it
let _e = info_span!("testing", tenant_id = %id).entered();
let tenants = HashMap::from([(id, TenantSlot::Attached(t.clone()))]);
let tenants = BTreeMap::from([(id, TenantSlot::Attached(t.clone()))]);
let tenants = Arc::new(std::sync::RwLock::new(TenantsMap::Open(tenants)));
// Invoke remove_tenant_from_memory with a cleanup hook that blocks until we manually

View File

@@ -289,7 +289,9 @@ impl DeltaLayer {
async fn load_inner(&self, ctx: &RequestContext) -> Result<Arc<DeltaLayerInner>> {
let path = self.path();
let loaded = DeltaLayerInner::load(&path, None, ctx).await?;
let loaded = DeltaLayerInner::load(&path, None, ctx)
.await
.and_then(|res| res)?;
// not production code
let actual_filename = path.file_name().unwrap().to_owned();
@@ -610,18 +612,28 @@ impl Drop for DeltaLayerWriter {
}
impl DeltaLayerInner {
/// Returns nested result following Result<Result<_, OpErr>, Critical>:
/// - inner has the success or transient failure
/// - outer has the permanent failure
pub(super) async fn load(
path: &Utf8Path,
summary: Option<Summary>,
ctx: &RequestContext,
) -> anyhow::Result<Self> {
let file = VirtualFile::open(path)
.await
.with_context(|| format!("Failed to open file '{path}'"))?;
) -> Result<Result<Self, anyhow::Error>, anyhow::Error> {
let file = match VirtualFile::open(path).await {
Ok(file) => file,
Err(e) => return Ok(Err(anyhow::Error::new(e).context("open layer file"))),
};
let file = FileBlockReader::new(file);
let summary_blk = file.read_blk(0, ctx).await?;
let actual_summary = Summary::des_prefix(summary_blk.as_ref())?;
let summary_blk = match file.read_blk(0, ctx).await {
Ok(blk) => blk,
Err(e) => return Ok(Err(anyhow::Error::new(e).context("read first block"))),
};
// TODO: this should be an assertion instead; see ImageLayerInner::load
let actual_summary =
Summary::des_prefix(summary_blk.as_ref()).context("deserialize first block")?;
if let Some(mut expected_summary) = summary {
// production code path
@@ -636,11 +648,11 @@ impl DeltaLayerInner {
}
}
Ok(DeltaLayerInner {
Ok(Ok(DeltaLayerInner {
file,
index_start_blk: actual_summary.index_start_blk,
index_root_blk: actual_summary.index_root_blk,
})
}))
}
pub(super) async fn get_value_reconstruct_data(

View File

@@ -249,7 +249,9 @@ impl ImageLayer {
async fn load_inner(&self, ctx: &RequestContext) -> Result<ImageLayerInner> {
let path = self.path();
let loaded = ImageLayerInner::load(&path, self.desc.image_layer_lsn(), None, ctx).await?;
let loaded = ImageLayerInner::load(&path, self.desc.image_layer_lsn(), None, ctx)
.await
.and_then(|res| res)?;
// not production code
let actual_filename = path.file_name().unwrap().to_owned();
@@ -295,18 +297,31 @@ impl ImageLayer {
}
impl ImageLayerInner {
/// Returns nested result following Result<Result<_, OpErr>, Critical>:
/// - inner has the success or transient failure
/// - outer has the permanent failure
pub(super) async fn load(
path: &Utf8Path,
lsn: Lsn,
summary: Option<Summary>,
ctx: &RequestContext,
) -> anyhow::Result<Self> {
let file = VirtualFile::open(path)
.await
.with_context(|| format!("Failed to open file '{}'", path))?;
) -> Result<Result<Self, anyhow::Error>, anyhow::Error> {
let file = match VirtualFile::open(path).await {
Ok(file) => file,
Err(e) => return Ok(Err(anyhow::Error::new(e).context("open layer file"))),
};
let file = FileBlockReader::new(file);
let summary_blk = file.read_blk(0, ctx).await?;
let actual_summary = Summary::des_prefix(summary_blk.as_ref())?;
let summary_blk = match file.read_blk(0, ctx).await {
Ok(blk) => blk,
Err(e) => return Ok(Err(anyhow::Error::new(e).context("read first block"))),
};
// length is the only way how this could fail, so it's not actually likely at all unless
// read_blk returns wrong sized block.
//
// TODO: confirm and make this into assertion
let actual_summary =
Summary::des_prefix(summary_blk.as_ref()).context("deserialize first block")?;
if let Some(mut expected_summary) = summary {
// production code path
@@ -322,12 +337,12 @@ impl ImageLayerInner {
}
}
Ok(ImageLayerInner {
Ok(Ok(ImageLayerInner {
index_start_blk: actual_summary.index_start_blk,
index_root_blk: actual_summary.index_root_blk,
lsn,
file,
})
}))
}
pub(super) async fn get_value_reconstruct_data(

View File

@@ -251,6 +251,7 @@ impl Layer {
layer
.get_value_reconstruct_data(key, lsn_range, reconstruct_data, &self.0, ctx)
.instrument(tracing::info_span!("get_value_reconstruct_data", layer=%self))
.await
}
@@ -867,6 +868,9 @@ impl LayerInner {
}
Ok((Err(e), _permit)) => {
// FIXME: this should be with the spawned task and be cancellation sensitive
//
// while we should not need this, this backoff has turned out to be useful with
// a bug of unexpectedly deleted remote layer file (#5787).
let consecutive_failures =
self.consecutive_failures.fetch_add(1, Ordering::Relaxed);
tracing::error!(consecutive_failures, "layer file download failed: {e:#}");
@@ -1195,7 +1199,7 @@ impl DownloadedLayer {
));
delta_layer::DeltaLayerInner::load(&owner.path, summary, ctx)
.await
.map(LayerKind::Delta)
.map(|res| res.map(LayerKind::Delta))
} else {
let lsn = owner.desc.image_layer_lsn();
let summary = Some(image_layer::Summary::expected(
@@ -1206,21 +1210,32 @@ impl DownloadedLayer {
));
image_layer::ImageLayerInner::load(&owner.path, lsn, summary, ctx)
.await
.map(LayerKind::Image)
}
// this will be a permanent failure
.context("load layer");
.map(|res| res.map(LayerKind::Image))
};
if res.is_err() {
LAYER_IMPL_METRICS.inc_permanent_loading_failures();
match res {
Ok(Ok(layer)) => Ok(Ok(layer)),
Ok(Err(transient)) => Err(transient),
Err(permanent) => {
LAYER_IMPL_METRICS.inc_permanent_loading_failures();
// TODO(#5815): we are not logging all errors, so temporarily log them **once**
// here as well
let permanent = permanent.context("load layer");
tracing::error!("layer loading failed permanently: {permanent:#}");
Ok(Err(permanent))
}
}
res
};
self.kind.get_or_init(init).await.as_ref().map_err(|e| {
// errors are not clonabled, cannot but stringify
// test_broken_timeline matches this string
anyhow::anyhow!("layer loading failed: {e:#}")
})
self.kind
.get_or_try_init(init)
// return transient errors using `?`
.await?
.as_ref()
.map_err(|e| {
// errors are not clonabled, cannot but stringify
// test_broken_timeline matches this string
anyhow::anyhow!("layer loading failed: {e:#}")
})
}
async fn get_value_reconstruct_data(
@@ -1291,6 +1306,7 @@ impl ResidentLayer {
}
/// Loads all keys stored in the layer. Returns key, lsn and value size.
#[tracing::instrument(skip_all, fields(layer=%self))]
pub(crate) async fn load_keys<'a>(
&'a self,
ctx: &RequestContext,

View File

@@ -180,7 +180,7 @@ async fn compaction_loop(tenant: Arc<Tenant>, cancel: CancellationToken) {
// Run compaction
if let Err(e) = tenant.compaction_iteration(&cancel, &ctx).await {
let wait_duration = backoff::exponential_backoff_duration_seconds(
error_run_count,
error_run_count + 1,
1.0,
MAX_BACKOFF_SECS,
);
@@ -198,6 +198,10 @@ async fn compaction_loop(tenant: Arc<Tenant>, cancel: CancellationToken) {
warn_when_period_overrun(started_at.elapsed(), period, BackgroundLoopKind::Compaction);
// Perhaps we did no work and the walredo process has been idle for some time:
// give it a chance to shut down to avoid leaving walredo process running indefinitely.
tenant.walredo_mgr.maybe_quiesce(period * 10);
// Sleep
if tokio::time::timeout(sleep_duration, cancel.cancelled())
.await
@@ -261,7 +265,7 @@ async fn gc_loop(tenant: Arc<Tenant>, cancel: CancellationToken) {
.await;
if let Err(e) = res {
let wait_duration = backoff::exponential_backoff_duration_seconds(
error_run_count,
error_run_count + 1,
1.0,
MAX_BACKOFF_SECS,
);

View File

@@ -26,6 +26,7 @@ use tracing::{debug, error, info, info_span, instrument, warn, Instrument};
use crate::{
context::{DownloadBehavior, RequestContext},
pgdatadir_mapping::CollectKeySpaceError,
task_mgr::{self, TaskKind, BACKGROUND_RUNTIME},
tenant::{
config::{EvictionPolicy, EvictionPolicyLayerAccessThreshold},
@@ -397,9 +398,16 @@ impl Timeline {
if size.is_err() {
// ignore, see above comment
} else {
warn!(
"failed to collect keyspace but succeeded in calculating logical size: {e:#}"
);
match e {
CollectKeySpaceError::Cancelled => {
// Shutting down, ignore
}
err => {
warn!(
"failed to collect keyspace but succeeded in calculating logical size: {err:#}"
);
}
}
}
}
}

View File

@@ -43,7 +43,8 @@ use std::sync::atomic::{AtomicUsize, Ordering};
use crate::config::PageServerConf;
use crate::metrics::{
WAL_REDO_BYTES_HISTOGRAM, WAL_REDO_RECORDS_HISTOGRAM, WAL_REDO_RECORD_COUNTER, WAL_REDO_TIME,
WalRedoKillCause, WAL_REDO_BYTES_HISTOGRAM, WAL_REDO_PROCESS_COUNTERS,
WAL_REDO_RECORDS_HISTOGRAM, WAL_REDO_RECORD_COUNTER, WAL_REDO_TIME,
};
use crate::pgdatadir_mapping::{key_to_rel_block, key_to_slru_block};
use crate::repository::Key;
@@ -90,6 +91,7 @@ struct ProcessOutput {
pub struct PostgresRedoManager {
tenant_id: TenantId,
conf: &'static PageServerConf,
last_redo_at: std::sync::Mutex<Option<Instant>>,
redo_process: RwLock<Option<Arc<WalRedoProcess>>>,
}
@@ -186,10 +188,26 @@ impl PostgresRedoManager {
PostgresRedoManager {
tenant_id,
conf,
last_redo_at: std::sync::Mutex::default(),
redo_process: RwLock::new(None),
}
}
/// This type doesn't have its own background task to check for idleness: we
/// rely on our owner calling this function periodically in its own housekeeping
/// loops.
pub(crate) fn maybe_quiesce(&self, idle_timeout: Duration) {
if let Ok(g) = self.last_redo_at.try_lock() {
if let Some(last_redo_at) = *g {
if last_redo_at.elapsed() >= idle_timeout {
drop(g);
let mut guard = self.redo_process.write().unwrap();
*guard = None;
}
}
}
}
///
/// Process one request for WAL redo using wal-redo postgres
///
@@ -204,6 +222,8 @@ impl PostgresRedoManager {
wal_redo_timeout: Duration,
pg_version: u32,
) -> anyhow::Result<Bytes> {
*(self.last_redo_at.lock().unwrap()) = Some(Instant::now());
let (rel, blknum) = key_to_rel_block(key).context("invalid record")?;
const MAX_RETRY_ATTEMPTS: u32 = 1;
let mut n_attempts = 0u32;
@@ -347,12 +367,13 @@ impl PostgresRedoManager {
self.apply_record_neon(key, &mut page, *record_lsn, record)?;
}
// Success!
let end_time = Instant::now();
let duration = end_time.duration_since(start_time);
let duration = start_time.elapsed();
// FIXME: using the same metric here creates a bimodal distribution by default, and because
// there could be multiple batch sizes this would be N+1 modal.
WAL_REDO_TIME.observe(duration.as_secs_f64());
debug!(
"neon applied {} WAL records in {} ms to reconstruct page image at LSN {}",
"neon applied {} WAL records in {} us to reconstruct page image at LSN {}",
records.len(),
duration.as_micros(),
lsn
@@ -662,10 +683,10 @@ impl WalRedoProcess {
.close_fds()
.spawn_no_leak_child(tenant_id)
.context("spawn process")?;
WAL_REDO_PROCESS_COUNTERS.started.inc();
let mut child = scopeguard::guard(child, |child| {
error!("killing wal-redo-postgres process due to a problem during launch");
child.kill_and_wait();
child.kill_and_wait(WalRedoKillCause::Startup);
});
let stdin = child.stdin.take().unwrap();
@@ -996,7 +1017,7 @@ impl Drop for WalRedoProcess {
self.child
.take()
.expect("we only do this once")
.kill_and_wait();
.kill_and_wait(WalRedoKillCause::WalRedoProcessDrop);
self.stderr_logger_cancel.cancel();
// no way to wait for stderr_logger_task from Drop because that is async only
}
@@ -1032,16 +1053,19 @@ impl NoLeakChild {
})
}
fn kill_and_wait(mut self) {
fn kill_and_wait(mut self, cause: WalRedoKillCause) {
let child = match self.child.take() {
Some(child) => child,
None => return,
};
Self::kill_and_wait_impl(child);
Self::kill_and_wait_impl(child, cause);
}
#[instrument(skip_all, fields(pid=child.id()))]
fn kill_and_wait_impl(mut child: Child) {
#[instrument(skip_all, fields(pid=child.id(), ?cause))]
fn kill_and_wait_impl(mut child: Child, cause: WalRedoKillCause) {
scopeguard::defer! {
WAL_REDO_PROCESS_COUNTERS.killed_by_cause[cause].inc();
}
let res = child.kill();
if let Err(e) = res {
// This branch is very unlikely because:
@@ -1086,7 +1110,7 @@ impl Drop for NoLeakChild {
// This thread here is going to outlive of our dropper.
let span = tracing::info_span!("walredo", %tenant_id);
let _entered = span.enter();
Self::kill_and_wait_impl(child);
Self::kill_and_wait_impl(child, WalRedoKillCause::NoLeakChildDrop);
})
.await
});

View File

@@ -1687,9 +1687,9 @@ neon_extend(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno,
if (current_size >= ((uint64) max_cluster_size) * 1024 * 1024)
ereport(ERROR,
(errcode(ERRCODE_DISK_FULL),
errmsg("could not extend file because cluster size limit (%d MB) has been exceeded",
errmsg("could not extend file because project size limit (%d MB) has been exceeded",
max_cluster_size),
errhint("This limit is defined by neon.max_cluster_size GUC")));
errhint("This limit is defined externally by the project size limit, and internally by neon.max_cluster_size GUC")));
}
/*

178
poetry.lock generated
View File

@@ -2,98 +2,98 @@
[[package]]
name = "aiohttp"
version = "3.8.5"
version = "3.8.6"
description = "Async http client/server framework (asyncio)"
optional = false
python-versions = ">=3.6"
files = [
{file = "aiohttp-3.8.5-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:a94159871304770da4dd371f4291b20cac04e8c94f11bdea1c3478e557fbe0d8"},
{file = "aiohttp-3.8.5-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:13bf85afc99ce6f9ee3567b04501f18f9f8dbbb2ea11ed1a2e079670403a7c84"},
{file = "aiohttp-3.8.5-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2ce2ac5708501afc4847221a521f7e4b245abf5178cf5ddae9d5b3856ddb2f3a"},
{file = "aiohttp-3.8.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:96943e5dcc37a6529d18766597c491798b7eb7a61d48878611298afc1fca946c"},
{file = "aiohttp-3.8.5-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2ad5c3c4590bb3cc28b4382f031f3783f25ec223557124c68754a2231d989e2b"},
{file = "aiohttp-3.8.5-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0c413c633d0512df4dc7fd2373ec06cc6a815b7b6d6c2f208ada7e9e93a5061d"},
{file = "aiohttp-3.8.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:df72ac063b97837a80d80dec8d54c241af059cc9bb42c4de68bd5b61ceb37caa"},
{file = "aiohttp-3.8.5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c48c5c0271149cfe467c0ff8eb941279fd6e3f65c9a388c984e0e6cf57538e14"},
{file = "aiohttp-3.8.5-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:368a42363c4d70ab52c2c6420a57f190ed3dfaca6a1b19afda8165ee16416a82"},
{file = "aiohttp-3.8.5-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:7607ec3ce4993464368505888af5beb446845a014bc676d349efec0e05085905"},
{file = "aiohttp-3.8.5-cp310-cp310-musllinux_1_1_ppc64le.whl", hash = "sha256:0d21c684808288a98914e5aaf2a7c6a3179d4df11d249799c32d1808e79503b5"},
{file = "aiohttp-3.8.5-cp310-cp310-musllinux_1_1_s390x.whl", hash = "sha256:312fcfbacc7880a8da0ae8b6abc6cc7d752e9caa0051a53d217a650b25e9a691"},
{file = "aiohttp-3.8.5-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:ad093e823df03bb3fd37e7dec9d4670c34f9e24aeace76808fc20a507cace825"},
{file = "aiohttp-3.8.5-cp310-cp310-win32.whl", hash = "sha256:33279701c04351a2914e1100b62b2a7fdb9a25995c4a104259f9a5ead7ed4802"},
{file = "aiohttp-3.8.5-cp310-cp310-win_amd64.whl", hash = "sha256:6e4a280e4b975a2e7745573e3fc9c9ba0d1194a3738ce1cbaa80626cc9b4f4df"},
{file = "aiohttp-3.8.5-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:ae871a964e1987a943d83d6709d20ec6103ca1eaf52f7e0d36ee1b5bebb8b9b9"},
{file = "aiohttp-3.8.5-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:461908b2578955045efde733719d62f2b649c404189a09a632d245b445c9c975"},
{file = "aiohttp-3.8.5-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:72a860c215e26192379f57cae5ab12b168b75db8271f111019509a1196dfc780"},
{file = "aiohttp-3.8.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cc14be025665dba6202b6a71cfcdb53210cc498e50068bc088076624471f8bb9"},
{file = "aiohttp-3.8.5-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8af740fc2711ad85f1a5c034a435782fbd5b5f8314c9a3ef071424a8158d7f6b"},
{file = "aiohttp-3.8.5-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:841cd8233cbd2111a0ef0a522ce016357c5e3aff8a8ce92bcfa14cef890d698f"},
{file = "aiohttp-3.8.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5ed1c46fb119f1b59304b5ec89f834f07124cd23ae5b74288e364477641060ff"},
{file = "aiohttp-3.8.5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:84f8ae3e09a34f35c18fa57f015cc394bd1389bce02503fb30c394d04ee6b938"},
{file = "aiohttp-3.8.5-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:62360cb771707cb70a6fd114b9871d20d7dd2163a0feafe43fd115cfe4fe845e"},
{file = "aiohttp-3.8.5-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:23fb25a9f0a1ca1f24c0a371523546366bb642397c94ab45ad3aedf2941cec6a"},
{file = "aiohttp-3.8.5-cp311-cp311-musllinux_1_1_ppc64le.whl", hash = "sha256:b0ba0d15164eae3d878260d4c4df859bbdc6466e9e6689c344a13334f988bb53"},
{file = "aiohttp-3.8.5-cp311-cp311-musllinux_1_1_s390x.whl", hash = "sha256:5d20003b635fc6ae3f96d7260281dfaf1894fc3aa24d1888a9b2628e97c241e5"},
{file = "aiohttp-3.8.5-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:0175d745d9e85c40dcc51c8f88c74bfbaef9e7afeeeb9d03c37977270303064c"},
{file = "aiohttp-3.8.5-cp311-cp311-win32.whl", hash = "sha256:2e1b1e51b0774408f091d268648e3d57f7260c1682e7d3a63cb00d22d71bb945"},
{file = "aiohttp-3.8.5-cp311-cp311-win_amd64.whl", hash = "sha256:043d2299f6dfdc92f0ac5e995dfc56668e1587cea7f9aa9d8a78a1b6554e5755"},
{file = "aiohttp-3.8.5-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:cae533195e8122584ec87531d6df000ad07737eaa3c81209e85c928854d2195c"},
{file = "aiohttp-3.8.5-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4f21e83f355643c345177a5d1d8079f9f28b5133bcd154193b799d380331d5d3"},
{file = "aiohttp-3.8.5-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a7a75ef35f2df54ad55dbf4b73fe1da96f370e51b10c91f08b19603c64004acc"},
{file = "aiohttp-3.8.5-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2e2e9839e14dd5308ee773c97115f1e0a1cb1d75cbeeee9f33824fa5144c7634"},
{file = "aiohttp-3.8.5-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c44e65da1de4403d0576473e2344828ef9c4c6244d65cf4b75549bb46d40b8dd"},
{file = "aiohttp-3.8.5-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:78d847e4cde6ecc19125ccbc9bfac4a7ab37c234dd88fbb3c5c524e8e14da543"},
{file = "aiohttp-3.8.5-cp36-cp36m-musllinux_1_1_aarch64.whl", hash = "sha256:c7a815258e5895d8900aec4454f38dca9aed71085f227537208057853f9d13f2"},
{file = "aiohttp-3.8.5-cp36-cp36m-musllinux_1_1_i686.whl", hash = "sha256:8b929b9bd7cd7c3939f8bcfffa92fae7480bd1aa425279d51a89327d600c704d"},
{file = "aiohttp-3.8.5-cp36-cp36m-musllinux_1_1_ppc64le.whl", hash = "sha256:5db3a5b833764280ed7618393832e0853e40f3d3e9aa128ac0ba0f8278d08649"},
{file = "aiohttp-3.8.5-cp36-cp36m-musllinux_1_1_s390x.whl", hash = "sha256:a0215ce6041d501f3155dc219712bc41252d0ab76474615b9700d63d4d9292af"},
{file = "aiohttp-3.8.5-cp36-cp36m-musllinux_1_1_x86_64.whl", hash = "sha256:fd1ed388ea7fbed22c4968dd64bab0198de60750a25fe8c0c9d4bef5abe13824"},
{file = "aiohttp-3.8.5-cp36-cp36m-win32.whl", hash = "sha256:6e6783bcc45f397fdebc118d772103d751b54cddf5b60fbcc958382d7dd64f3e"},
{file = "aiohttp-3.8.5-cp36-cp36m-win_amd64.whl", hash = "sha256:b5411d82cddd212644cf9360879eb5080f0d5f7d809d03262c50dad02f01421a"},
{file = "aiohttp-3.8.5-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:01d4c0c874aa4ddfb8098e85d10b5e875a70adc63db91f1ae65a4b04d3344cda"},
{file = "aiohttp-3.8.5-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e5980a746d547a6ba173fd5ee85ce9077e72d118758db05d229044b469d9029a"},
{file = "aiohttp-3.8.5-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2a482e6da906d5e6e653be079b29bc173a48e381600161c9932d89dfae5942ef"},
{file = "aiohttp-3.8.5-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:80bd372b8d0715c66c974cf57fe363621a02f359f1ec81cba97366948c7fc873"},
{file = "aiohttp-3.8.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c1161b345c0a444ebcf46bf0a740ba5dcf50612fd3d0528883fdc0eff578006a"},
{file = "aiohttp-3.8.5-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cd56db019015b6acfaaf92e1ac40eb8434847d9bf88b4be4efe5bfd260aee692"},
{file = "aiohttp-3.8.5-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:153c2549f6c004d2754cc60603d4668899c9895b8a89397444a9c4efa282aaf4"},
{file = "aiohttp-3.8.5-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:4a01951fabc4ce26ab791da5f3f24dca6d9a6f24121746eb19756416ff2d881b"},
{file = "aiohttp-3.8.5-cp37-cp37m-musllinux_1_1_ppc64le.whl", hash = "sha256:bfb9162dcf01f615462b995a516ba03e769de0789de1cadc0f916265c257e5d8"},
{file = "aiohttp-3.8.5-cp37-cp37m-musllinux_1_1_s390x.whl", hash = "sha256:7dde0009408969a43b04c16cbbe252c4f5ef4574ac226bc8815cd7342d2028b6"},
{file = "aiohttp-3.8.5-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:4149d34c32f9638f38f544b3977a4c24052042affa895352d3636fa8bffd030a"},
{file = "aiohttp-3.8.5-cp37-cp37m-win32.whl", hash = "sha256:68c5a82c8779bdfc6367c967a4a1b2aa52cd3595388bf5961a62158ee8a59e22"},
{file = "aiohttp-3.8.5-cp37-cp37m-win_amd64.whl", hash = "sha256:2cf57fb50be5f52bda004b8893e63b48530ed9f0d6c96c84620dc92fe3cd9b9d"},
{file = "aiohttp-3.8.5-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:eca4bf3734c541dc4f374ad6010a68ff6c6748f00451707f39857f429ca36ced"},
{file = "aiohttp-3.8.5-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:1274477e4c71ce8cfe6c1ec2f806d57c015ebf84d83373676036e256bc55d690"},
{file = "aiohttp-3.8.5-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:28c543e54710d6158fc6f439296c7865b29e0b616629767e685a7185fab4a6b9"},
{file = "aiohttp-3.8.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:910bec0c49637d213f5d9877105d26e0c4a4de2f8b1b29405ff37e9fc0ad52b8"},
{file = "aiohttp-3.8.5-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5443910d662db951b2e58eb70b0fbe6b6e2ae613477129a5805d0b66c54b6cb7"},
{file = "aiohttp-3.8.5-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2e460be6978fc24e3df83193dc0cc4de46c9909ed92dd47d349a452ef49325b7"},
{file = "aiohttp-3.8.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fb1558def481d84f03b45888473fc5a1f35747b5f334ef4e7a571bc0dfcb11f8"},
{file = "aiohttp-3.8.5-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:34dd0c107799dcbbf7d48b53be761a013c0adf5571bf50c4ecad5643fe9cfcd0"},
{file = "aiohttp-3.8.5-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:aa1990247f02a54185dc0dff92a6904521172a22664c863a03ff64c42f9b5410"},
{file = "aiohttp-3.8.5-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:0e584a10f204a617d71d359fe383406305a4b595b333721fa50b867b4a0a1548"},
{file = "aiohttp-3.8.5-cp38-cp38-musllinux_1_1_ppc64le.whl", hash = "sha256:a3cf433f127efa43fee6b90ea4c6edf6c4a17109d1d037d1a52abec84d8f2e42"},
{file = "aiohttp-3.8.5-cp38-cp38-musllinux_1_1_s390x.whl", hash = "sha256:c11f5b099adafb18e65c2c997d57108b5bbeaa9eeee64a84302c0978b1ec948b"},
{file = "aiohttp-3.8.5-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:84de26ddf621d7ac4c975dbea4c945860e08cccde492269db4e1538a6a6f3c35"},
{file = "aiohttp-3.8.5-cp38-cp38-win32.whl", hash = "sha256:ab88bafedc57dd0aab55fa728ea10c1911f7e4d8b43e1d838a1739f33712921c"},
{file = "aiohttp-3.8.5-cp38-cp38-win_amd64.whl", hash = "sha256:5798a9aad1879f626589f3df0f8b79b3608a92e9beab10e5fda02c8a2c60db2e"},
{file = "aiohttp-3.8.5-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:a6ce61195c6a19c785df04e71a4537e29eaa2c50fe745b732aa937c0c77169f3"},
{file = "aiohttp-3.8.5-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:773dd01706d4db536335fcfae6ea2440a70ceb03dd3e7378f3e815b03c97ab51"},
{file = "aiohttp-3.8.5-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:f83a552443a526ea38d064588613aca983d0ee0038801bc93c0c916428310c28"},
{file = "aiohttp-3.8.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1f7372f7341fcc16f57b2caded43e81ddd18df53320b6f9f042acad41f8e049a"},
{file = "aiohttp-3.8.5-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:ea353162f249c8097ea63c2169dd1aa55de1e8fecbe63412a9bc50816e87b761"},
{file = "aiohttp-3.8.5-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e5d47ae48db0b2dcf70bc8a3bc72b3de86e2a590fc299fdbbb15af320d2659de"},
{file = "aiohttp-3.8.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d827176898a2b0b09694fbd1088c7a31836d1a505c243811c87ae53a3f6273c1"},
{file = "aiohttp-3.8.5-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3562b06567c06439d8b447037bb655ef69786c590b1de86c7ab81efe1c9c15d8"},
{file = "aiohttp-3.8.5-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:4e874cbf8caf8959d2adf572a78bba17cb0e9d7e51bb83d86a3697b686a0ab4d"},
{file = "aiohttp-3.8.5-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:6809a00deaf3810e38c628e9a33271892f815b853605a936e2e9e5129762356c"},
{file = "aiohttp-3.8.5-cp39-cp39-musllinux_1_1_ppc64le.whl", hash = "sha256:33776e945d89b29251b33a7e7d006ce86447b2cfd66db5e5ded4e5cd0340585c"},
{file = "aiohttp-3.8.5-cp39-cp39-musllinux_1_1_s390x.whl", hash = "sha256:eaeed7abfb5d64c539e2db173f63631455f1196c37d9d8d873fc316470dfbacd"},
{file = "aiohttp-3.8.5-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:e91d635961bec2d8f19dfeb41a539eb94bd073f075ca6dae6c8dc0ee89ad6f91"},
{file = "aiohttp-3.8.5-cp39-cp39-win32.whl", hash = "sha256:00ad4b6f185ec67f3e6562e8a1d2b69660be43070bd0ef6fcec5211154c7df67"},
{file = "aiohttp-3.8.5-cp39-cp39-win_amd64.whl", hash = "sha256:c0a9034379a37ae42dea7ac1e048352d96286626251862e448933c0f59cbd79c"},
{file = "aiohttp-3.8.5.tar.gz", hash = "sha256:b9552ec52cc147dbf1944ac7ac98af7602e51ea2dcd076ed194ca3c0d1c7d0bc"},
{file = "aiohttp-3.8.6-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:41d55fc043954cddbbd82503d9cc3f4814a40bcef30b3569bc7b5e34130718c1"},
{file = "aiohttp-3.8.6-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:1d84166673694841d8953f0a8d0c90e1087739d24632fe86b1a08819168b4566"},
{file = "aiohttp-3.8.6-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:253bf92b744b3170eb4c4ca2fa58f9c4b87aeb1df42f71d4e78815e6e8b73c9e"},
{file = "aiohttp-3.8.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3fd194939b1f764d6bb05490987bfe104287bbf51b8d862261ccf66f48fb4096"},
{file = "aiohttp-3.8.6-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6c5f938d199a6fdbdc10bbb9447496561c3a9a565b43be564648d81e1102ac22"},
{file = "aiohttp-3.8.6-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2817b2f66ca82ee699acd90e05c95e79bbf1dc986abb62b61ec8aaf851e81c93"},
{file = "aiohttp-3.8.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0fa375b3d34e71ccccf172cab401cd94a72de7a8cc01847a7b3386204093bb47"},
{file = "aiohttp-3.8.6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9de50a199b7710fa2904be5a4a9b51af587ab24c8e540a7243ab737b45844543"},
{file = "aiohttp-3.8.6-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e1d8cb0b56b3587c5c01de3bf2f600f186da7e7b5f7353d1bf26a8ddca57f965"},
{file = "aiohttp-3.8.6-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:8e31e9db1bee8b4f407b77fd2507337a0a80665ad7b6c749d08df595d88f1cf5"},
{file = "aiohttp-3.8.6-cp310-cp310-musllinux_1_1_ppc64le.whl", hash = "sha256:7bc88fc494b1f0311d67f29fee6fd636606f4697e8cc793a2d912ac5b19aa38d"},
{file = "aiohttp-3.8.6-cp310-cp310-musllinux_1_1_s390x.whl", hash = "sha256:ec00c3305788e04bf6d29d42e504560e159ccaf0be30c09203b468a6c1ccd3b2"},
{file = "aiohttp-3.8.6-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:ad1407db8f2f49329729564f71685557157bfa42b48f4b93e53721a16eb813ed"},
{file = "aiohttp-3.8.6-cp310-cp310-win32.whl", hash = "sha256:ccc360e87341ad47c777f5723f68adbb52b37ab450c8bc3ca9ca1f3e849e5fe2"},
{file = "aiohttp-3.8.6-cp310-cp310-win_amd64.whl", hash = "sha256:93c15c8e48e5e7b89d5cb4613479d144fda8344e2d886cf694fd36db4cc86865"},
{file = "aiohttp-3.8.6-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:6e2f9cc8e5328f829f6e1fb74a0a3a939b14e67e80832975e01929e320386b34"},
{file = "aiohttp-3.8.6-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e6a00ffcc173e765e200ceefb06399ba09c06db97f401f920513a10c803604ca"},
{file = "aiohttp-3.8.6-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:41bdc2ba359032e36c0e9de5a3bd00d6fb7ea558a6ce6b70acedf0da86458321"},
{file = "aiohttp-3.8.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:14cd52ccf40006c7a6cd34a0f8663734e5363fd981807173faf3a017e202fec9"},
{file = "aiohttp-3.8.6-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2d5b785c792802e7b275c420d84f3397668e9d49ab1cb52bd916b3b3ffcf09ad"},
{file = "aiohttp-3.8.6-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1bed815f3dc3d915c5c1e556c397c8667826fbc1b935d95b0ad680787896a358"},
{file = "aiohttp-3.8.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:96603a562b546632441926cd1293cfcb5b69f0b4159e6077f7c7dbdfb686af4d"},
{file = "aiohttp-3.8.6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d76e8b13161a202d14c9584590c4df4d068c9567c99506497bdd67eaedf36403"},
{file = "aiohttp-3.8.6-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:e3f1e3f1a1751bb62b4a1b7f4e435afcdade6c17a4fd9b9d43607cebd242924a"},
{file = "aiohttp-3.8.6-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:76b36b3124f0223903609944a3c8bf28a599b2cc0ce0be60b45211c8e9be97f8"},
{file = "aiohttp-3.8.6-cp311-cp311-musllinux_1_1_ppc64le.whl", hash = "sha256:a2ece4af1f3c967a4390c284797ab595a9f1bc1130ef8b01828915a05a6ae684"},
{file = "aiohttp-3.8.6-cp311-cp311-musllinux_1_1_s390x.whl", hash = "sha256:16d330b3b9db87c3883e565340d292638a878236418b23cc8b9b11a054aaa887"},
{file = "aiohttp-3.8.6-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:42c89579f82e49db436b69c938ab3e1559e5a4409eb8639eb4143989bc390f2f"},
{file = "aiohttp-3.8.6-cp311-cp311-win32.whl", hash = "sha256:efd2fcf7e7b9d7ab16e6b7d54205beded0a9c8566cb30f09c1abe42b4e22bdcb"},
{file = "aiohttp-3.8.6-cp311-cp311-win_amd64.whl", hash = "sha256:3b2ab182fc28e7a81f6c70bfbd829045d9480063f5ab06f6e601a3eddbbd49a0"},
{file = "aiohttp-3.8.6-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:fdee8405931b0615220e5ddf8cd7edd8592c606a8e4ca2a00704883c396e4479"},
{file = "aiohttp-3.8.6-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d25036d161c4fe2225d1abff2bd52c34ed0b1099f02c208cd34d8c05729882f0"},
{file = "aiohttp-3.8.6-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5d791245a894be071d5ab04bbb4850534261a7d4fd363b094a7b9963e8cdbd31"},
{file = "aiohttp-3.8.6-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0cccd1de239afa866e4ce5c789b3032442f19c261c7d8a01183fd956b1935349"},
{file = "aiohttp-3.8.6-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1f13f60d78224f0dace220d8ab4ef1dbc37115eeeab8c06804fec11bec2bbd07"},
{file = "aiohttp-3.8.6-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8a9b5a0606faca4f6cc0d338359d6fa137104c337f489cd135bb7fbdbccb1e39"},
{file = "aiohttp-3.8.6-cp36-cp36m-musllinux_1_1_aarch64.whl", hash = "sha256:13da35c9ceb847732bf5c6c5781dcf4780e14392e5d3b3c689f6d22f8e15ae31"},
{file = "aiohttp-3.8.6-cp36-cp36m-musllinux_1_1_i686.whl", hash = "sha256:4d4cbe4ffa9d05f46a28252efc5941e0462792930caa370a6efaf491f412bc66"},
{file = "aiohttp-3.8.6-cp36-cp36m-musllinux_1_1_ppc64le.whl", hash = "sha256:229852e147f44da0241954fc6cb910ba074e597f06789c867cb7fb0621e0ba7a"},
{file = "aiohttp-3.8.6-cp36-cp36m-musllinux_1_1_s390x.whl", hash = "sha256:713103a8bdde61d13490adf47171a1039fd880113981e55401a0f7b42c37d071"},
{file = "aiohttp-3.8.6-cp36-cp36m-musllinux_1_1_x86_64.whl", hash = "sha256:45ad816b2c8e3b60b510f30dbd37fe74fd4a772248a52bb021f6fd65dff809b6"},
{file = "aiohttp-3.8.6-cp36-cp36m-win32.whl", hash = "sha256:2b8d4e166e600dcfbff51919c7a3789ff6ca8b3ecce16e1d9c96d95dd569eb4c"},
{file = "aiohttp-3.8.6-cp36-cp36m-win_amd64.whl", hash = "sha256:0912ed87fee967940aacc5306d3aa8ba3a459fcd12add0b407081fbefc931e53"},
{file = "aiohttp-3.8.6-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:e2a988a0c673c2e12084f5e6ba3392d76c75ddb8ebc6c7e9ead68248101cd446"},
{file = "aiohttp-3.8.6-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ebf3fd9f141700b510d4b190094db0ce37ac6361a6806c153c161dc6c041ccda"},
{file = "aiohttp-3.8.6-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3161ce82ab85acd267c8f4b14aa226047a6bee1e4e6adb74b798bd42c6ae1f80"},
{file = "aiohttp-3.8.6-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d95fc1bf33a9a81469aa760617b5971331cdd74370d1214f0b3109272c0e1e3c"},
{file = "aiohttp-3.8.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6c43ecfef7deaf0617cee936836518e7424ee12cb709883f2c9a1adda63cc460"},
{file = "aiohttp-3.8.6-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ca80e1b90a05a4f476547f904992ae81eda5c2c85c66ee4195bb8f9c5fb47f28"},
{file = "aiohttp-3.8.6-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:90c72ebb7cb3a08a7f40061079817133f502a160561d0675b0a6adf231382c92"},
{file = "aiohttp-3.8.6-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:bb54c54510e47a8c7c8e63454a6acc817519337b2b78606c4e840871a3e15349"},
{file = "aiohttp-3.8.6-cp37-cp37m-musllinux_1_1_ppc64le.whl", hash = "sha256:de6a1c9f6803b90e20869e6b99c2c18cef5cc691363954c93cb9adeb26d9f3ae"},
{file = "aiohttp-3.8.6-cp37-cp37m-musllinux_1_1_s390x.whl", hash = "sha256:a3628b6c7b880b181a3ae0a0683698513874df63783fd89de99b7b7539e3e8a8"},
{file = "aiohttp-3.8.6-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:fc37e9aef10a696a5a4474802930079ccfc14d9f9c10b4662169671ff034b7df"},
{file = "aiohttp-3.8.6-cp37-cp37m-win32.whl", hash = "sha256:f8ef51e459eb2ad8e7a66c1d6440c808485840ad55ecc3cafefadea47d1b1ba2"},
{file = "aiohttp-3.8.6-cp37-cp37m-win_amd64.whl", hash = "sha256:b2fe42e523be344124c6c8ef32a011444e869dc5f883c591ed87f84339de5976"},
{file = "aiohttp-3.8.6-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:9e2ee0ac5a1f5c7dd3197de309adfb99ac4617ff02b0603fd1e65b07dc772e4b"},
{file = "aiohttp-3.8.6-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:01770d8c04bd8db568abb636c1fdd4f7140b284b8b3e0b4584f070180c1e5c62"},
{file = "aiohttp-3.8.6-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:3c68330a59506254b556b99a91857428cab98b2f84061260a67865f7f52899f5"},
{file = "aiohttp-3.8.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:89341b2c19fb5eac30c341133ae2cc3544d40d9b1892749cdd25892bbc6ac951"},
{file = "aiohttp-3.8.6-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:71783b0b6455ac8f34b5ec99d83e686892c50498d5d00b8e56d47f41b38fbe04"},
{file = "aiohttp-3.8.6-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f628dbf3c91e12f4d6c8b3f092069567d8eb17814aebba3d7d60c149391aee3a"},
{file = "aiohttp-3.8.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b04691bc6601ef47c88f0255043df6f570ada1a9ebef99c34bd0b72866c217ae"},
{file = "aiohttp-3.8.6-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:7ee912f7e78287516df155f69da575a0ba33b02dd7c1d6614dbc9463f43066e3"},
{file = "aiohttp-3.8.6-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:9c19b26acdd08dd239e0d3669a3dddafd600902e37881f13fbd8a53943079dbc"},
{file = "aiohttp-3.8.6-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:99c5ac4ad492b4a19fc132306cd57075c28446ec2ed970973bbf036bcda1bcc6"},
{file = "aiohttp-3.8.6-cp38-cp38-musllinux_1_1_ppc64le.whl", hash = "sha256:f0f03211fd14a6a0aed2997d4b1c013d49fb7b50eeb9ffdf5e51f23cfe2c77fa"},
{file = "aiohttp-3.8.6-cp38-cp38-musllinux_1_1_s390x.whl", hash = "sha256:8d399dade330c53b4106160f75f55407e9ae7505263ea86f2ccca6bfcbdb4921"},
{file = "aiohttp-3.8.6-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:ec4fd86658c6a8964d75426517dc01cbf840bbf32d055ce64a9e63a40fd7b771"},
{file = "aiohttp-3.8.6-cp38-cp38-win32.whl", hash = "sha256:33164093be11fcef3ce2571a0dccd9041c9a93fa3bde86569d7b03120d276c6f"},
{file = "aiohttp-3.8.6-cp38-cp38-win_amd64.whl", hash = "sha256:bdf70bfe5a1414ba9afb9d49f0c912dc524cf60141102f3a11143ba3d291870f"},
{file = "aiohttp-3.8.6-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:d52d5dc7c6682b720280f9d9db41d36ebe4791622c842e258c9206232251ab2b"},
{file = "aiohttp-3.8.6-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:4ac39027011414dbd3d87f7edb31680e1f430834c8cef029f11c66dad0670aa5"},
{file = "aiohttp-3.8.6-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:3f5c7ce535a1d2429a634310e308fb7d718905487257060e5d4598e29dc17f0b"},
{file = "aiohttp-3.8.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b30e963f9e0d52c28f284d554a9469af073030030cef8693106d918b2ca92f54"},
{file = "aiohttp-3.8.6-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:918810ef188f84152af6b938254911055a72e0f935b5fbc4c1a4ed0b0584aed1"},
{file = "aiohttp-3.8.6-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:002f23e6ea8d3dd8d149e569fd580c999232b5fbc601c48d55398fbc2e582e8c"},
{file = "aiohttp-3.8.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4fcf3eabd3fd1a5e6092d1242295fa37d0354b2eb2077e6eb670accad78e40e1"},
{file = "aiohttp-3.8.6-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:255ba9d6d5ff1a382bb9a578cd563605aa69bec845680e21c44afc2670607a95"},
{file = "aiohttp-3.8.6-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:d67f8baed00870aa390ea2590798766256f31dc5ed3ecc737debb6e97e2ede78"},
{file = "aiohttp-3.8.6-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:86f20cee0f0a317c76573b627b954c412ea766d6ada1a9fcf1b805763ae7feeb"},
{file = "aiohttp-3.8.6-cp39-cp39-musllinux_1_1_ppc64le.whl", hash = "sha256:39a312d0e991690ccc1a61f1e9e42daa519dcc34ad03eb6f826d94c1190190dd"},
{file = "aiohttp-3.8.6-cp39-cp39-musllinux_1_1_s390x.whl", hash = "sha256:e827d48cf802de06d9c935088c2924e3c7e7533377d66b6f31ed175c1620e05e"},
{file = "aiohttp-3.8.6-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:bd111d7fc5591ddf377a408ed9067045259ff2770f37e2d94e6478d0f3fc0c17"},
{file = "aiohttp-3.8.6-cp39-cp39-win32.whl", hash = "sha256:caf486ac1e689dda3502567eb89ffe02876546599bbf915ec94b1fa424eeffd4"},
{file = "aiohttp-3.8.6-cp39-cp39-win_amd64.whl", hash = "sha256:3f0e27e5b733803333bb2371249f41cf42bae8884863e8e8965ec69bebe53132"},
{file = "aiohttp-3.8.6.tar.gz", hash = "sha256:b0cf2a4501bff9330a8a5248b4ce951851e415bdcce9dc158e76cfd55e15085c"},
]
[package.dependencies]
@@ -2719,4 +2719,4 @@ cffi = ["cffi (>=1.11)"]
[metadata]
lock-version = "2.0"
python-versions = "^3.9"
content-hash = "74649cf47c52f21b01b096a42044750b1c9677576b405be0489c2909127a9bf1"
content-hash = "0834e5cb69e5457741d4f476c3e49a4dc83598b5730685c8755da651b96ad3ec"

View File

@@ -51,6 +51,7 @@ serde_json.workspace = true
sha2.workspace = true
socket2.workspace = true
sync_wrapper.workspace = true
task-local-extensions.workspace = true
thiserror.workspace = true
tls-listener.workspace = true
tokio-postgres.workspace = true

View File

@@ -1,7 +1,9 @@
//! User credentials used in authentication.
use crate::{
auth::password_hack::parse_endpoint_param, error::UserFacingError, proxy::neon_options,
auth::password_hack::parse_endpoint_param,
error::UserFacingError,
proxy::{neon_options, NUM_CONNECTION_ACCEPTED_BY_SNI},
};
use itertools::Itertools;
use pq_proto::StartupMessageParams;
@@ -124,6 +126,22 @@ impl<'a> ClientCredentials<'a> {
.transpose()?;
info!(user, project = project.as_deref(), "credentials");
if sni.is_some() {
info!("Connection with sni");
NUM_CONNECTION_ACCEPTED_BY_SNI
.with_label_values(&["sni"])
.inc();
} else if project.is_some() {
NUM_CONNECTION_ACCEPTED_BY_SNI
.with_label_values(&["no_sni"])
.inc();
info!("Connection without sni");
} else {
NUM_CONNECTION_ACCEPTED_BY_SNI
.with_label_values(&["password_hack"])
.inc();
info!("Connection with password hack");
}
let cache_key = format!(
"{}{}",

View File

@@ -4,6 +4,7 @@ use proxy::config::AuthenticationConfig;
use proxy::config::HttpConfig;
use proxy::console;
use proxy::http;
use proxy::rate_limiter::RateLimiterConfig;
use proxy::usage_metrics;
use anyhow::bail;
@@ -95,6 +96,20 @@ struct ProxyCliArgs {
/// Require that all incoming requests have a Proxy Protocol V2 packet **and** have an IP address associated.
#[clap(long, default_value_t = false, value_parser = clap::builder::BoolishValueParser::new(), action = clap::ArgAction::Set)]
require_client_ip: bool,
/// Disable dynamic rate limiter and store the metrics to ensure its production behaviour.
#[clap(long, default_value_t = true, value_parser = clap::builder::BoolishValueParser::new(), action = clap::ArgAction::Set)]
disable_dynamic_rate_limiter: bool,
/// Rate limit algorithm. Makes sense only if `disable_rate_limiter` is `false`.
#[clap(value_enum, long, default_value_t = proxy::rate_limiter::RateLimitAlgorithm::Aimd)]
rate_limit_algorithm: proxy::rate_limiter::RateLimitAlgorithm,
/// Timeout for rate limiter. If it didn't manage to aquire a permit in this time, it will return an error.
#[clap(long, default_value = "15s", value_parser = humantime::parse_duration)]
rate_limiter_timeout: tokio::time::Duration,
/// Initial limit for dynamic rate limiter. Makes sense only if `rate_limit_algorithm` is *not* `None`.
#[clap(long, default_value_t = 100)]
initial_limit: usize,
#[clap(flatten)]
aimd_config: proxy::rate_limiter::AimdConfig,
}
#[tokio::main]
@@ -213,6 +228,13 @@ fn build_config(args: &ProxyCliArgs) -> anyhow::Result<&'static ProxyConfig> {
and metric-collection-interval must be specified"
),
};
let rate_limiter_config = RateLimiterConfig {
disable: args.disable_dynamic_rate_limiter,
algorithm: args.rate_limit_algorithm,
timeout: args.rate_limiter_timeout,
initial_limit: args.initial_limit,
aimd_config: Some(args.aimd_config),
};
let auth_backend = match &args.auth_backend {
AuthBackend::Console => {
@@ -237,7 +259,7 @@ fn build_config(args: &ProxyCliArgs) -> anyhow::Result<&'static ProxyConfig> {
tokio::spawn(locks.garbage_collect_worker(epoch));
let url = args.auth_endpoint.parse()?;
let endpoint = http::Endpoint::new(url, http::new_client());
let endpoint = http::Endpoint::new(url, http::new_client(rate_limiter_config));
let api = console::provider::neon::Api::new(endpoint, caches, locks);
auth::BackendType::Console(Cow::Owned(api), ())

View File

@@ -248,6 +248,7 @@ impl ConnCfg {
// connect_raw() will not use TLS if sslmode is "disable"
let (client, connection) = self.0.connect_raw(stream, tls).await?;
tracing::Span::current().record("pid", &tracing::field::display(client.get_process_id()));
let stream = connection.stream.into_inner();
info!(

View File

@@ -13,13 +13,13 @@ pub use reqwest_retry::{policies::ExponentialBackoff, RetryTransientMiddleware};
use tokio::time::Instant;
use tracing::trace;
use crate::url::ApiUrl;
use crate::{rate_limiter, url::ApiUrl};
use reqwest_middleware::RequestBuilder;
/// This is the preferred way to create new http clients,
/// because it takes care of observability (OpenTelemetry).
/// We deliberately don't want to replace this with a public static.
pub fn new_client() -> ClientWithMiddleware {
pub fn new_client(rate_limiter_config: rate_limiter::RateLimiterConfig) -> ClientWithMiddleware {
let client = reqwest::ClientBuilder::new()
.dns_resolver(Arc::new(GaiResolver::default()))
.connection_verbose(true)
@@ -28,6 +28,7 @@ pub fn new_client() -> ClientWithMiddleware {
reqwest_middleware::ClientBuilder::new(client)
.with(reqwest_tracing::TracingMiddleware::default())
.with(rate_limiter::Limiter::new(rate_limiter_config))
.build()
}

View File

@@ -19,6 +19,7 @@ pub mod logging;
pub mod parse;
pub mod protocol2;
pub mod proxy;
pub mod rate_limiter;
pub mod sasl;
pub mod scram;
pub mod serverless;

View File

@@ -19,7 +19,10 @@ use itertools::Itertools;
use metrics::{exponential_buckets, register_int_counter_vec, IntCounterVec};
use once_cell::sync::{Lazy, OnceCell};
use pq_proto::{BeMessage as Be, FeStartupPacket, StartupMessageParams};
use prometheus::{register_histogram_vec, HistogramVec};
use prometheus::{
register_histogram, register_histogram_vec, register_int_gauge_vec, Histogram, HistogramVec,
IntGaugeVec,
};
use regex::Regex;
use std::{error::Error, io, ops::ControlFlow, sync::Arc, time::Instant};
use tokio::{
@@ -107,6 +110,34 @@ static COMPUTE_CONNECTION_LATENCY: Lazy<HistogramVec> = Lazy::new(|| {
.unwrap()
});
pub static RATE_LIMITER_ACQUIRE_LATENCY: Lazy<Histogram> = Lazy::new(|| {
register_histogram!(
"semaphore_control_plane_token_acquire_seconds",
"Time it took for proxy to establish a connection to the compute endpoint",
// largest bucket = 2^16 * 0.5ms = 32s
exponential_buckets(0.0005, 2.0, 16).unwrap(),
)
.unwrap()
});
pub static RATE_LIMITER_LIMIT: Lazy<IntGaugeVec> = Lazy::new(|| {
register_int_gauge_vec!(
"semaphore_control_plane_limit",
"Current limit of the semaphore control plane",
&["limit"], // 2 counters
)
.unwrap()
});
pub static NUM_CONNECTION_ACCEPTED_BY_SNI: Lazy<IntCounterVec> = Lazy::new(|| {
register_int_counter_vec!(
"proxy_accepted_connections_by_sni",
"Number of connections (per sni).",
&["kind"],
)
.unwrap()
});
pub struct LatencyTimer {
// time since the stopwatch was started
start: Option<Instant>,
@@ -483,7 +514,7 @@ pub fn invalidate_cache(node_info: console::CachedNodeInfo) -> compute::ConnCfg
}
/// Try to connect to the compute node once.
#[tracing::instrument(name = "connect_once", skip_all)]
#[tracing::instrument(name = "connect_once", fields(pid = tracing::field::Empty), skip_all)]
async fn connect_to_compute_once(
node_info: &console::CachedNodeInfo,
timeout: time::Duration,

View File

@@ -0,0 +1,6 @@
mod aimd;
mod limit_algorithm;
mod limiter;
pub use aimd::Aimd;
pub use limit_algorithm::{AimdConfig, Fixed, RateLimitAlgorithm, RateLimiterConfig};
pub use limiter::Limiter;

View File

@@ -0,0 +1,199 @@
use std::usize;
use async_trait::async_trait;
use super::limit_algorithm::{AimdConfig, LimitAlgorithm, Sample};
use super::limiter::Outcome;
/// Loss-based congestion avoidance.
///
/// Additive-increase, multiplicative decrease.
///
/// Adds available currency when:
/// 1. no load-based errors are observed, and
/// 2. the utilisation of the current limit is high.
///
/// Reduces available concurrency by a factor when load-based errors are detected.
pub struct Aimd {
min_limit: usize,
max_limit: usize,
decrease_factor: f32,
increase_by: usize,
min_utilisation_threshold: f32,
}
impl Aimd {
pub fn new(config: AimdConfig) -> Self {
Self {
min_limit: config.aimd_min_limit,
max_limit: config.aimd_max_limit,
decrease_factor: config.aimd_decrease_factor,
increase_by: config.aimd_increase_by,
min_utilisation_threshold: config.aimd_min_utilisation_threshold,
}
}
pub fn decrease_factor(self, factor: f32) -> Self {
assert!((0.5..1.0).contains(&factor));
Self {
decrease_factor: factor,
..self
}
}
pub fn increase_by(self, increase: usize) -> Self {
assert!(increase > 0);
Self {
increase_by: increase,
..self
}
}
pub fn with_max_limit(self, max: usize) -> Self {
assert!(max > 0);
Self {
max_limit: max,
..self
}
}
/// A threshold below which the limit won't be increased. 0.5 = 50%.
pub fn with_min_utilisation_threshold(self, min_util: f32) -> Self {
assert!(min_util > 0. && min_util < 1.);
Self {
min_utilisation_threshold: min_util,
..self
}
}
}
#[async_trait]
impl LimitAlgorithm for Aimd {
async fn update(&mut self, old_limit: usize, sample: Sample) -> usize {
use Outcome::*;
match sample.outcome {
Success => {
let utilisation = sample.in_flight as f32 / old_limit as f32;
if utilisation > self.min_utilisation_threshold {
let limit = old_limit + self.increase_by;
limit.clamp(self.min_limit, self.max_limit)
} else {
old_limit
}
}
Overload => {
let limit = old_limit as f32 * self.decrease_factor;
// Floor instead of round, so the limit reduces even with small numbers.
// E.g. round(2 * 0.9) = 2, but floor(2 * 0.9) = 1
let limit = limit.floor() as usize;
limit.clamp(self.min_limit, self.max_limit)
}
}
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use tokio::sync::Notify;
use super::*;
use crate::rate_limiter::{Limiter, RateLimiterConfig};
#[tokio::test]
async fn should_decrease_limit_on_overload() {
let config = RateLimiterConfig {
initial_limit: 10,
aimd_config: Some(AimdConfig {
aimd_decrease_factor: 0.5,
..Default::default()
}),
disable: false,
..Default::default()
};
let release_notifier = Arc::new(Notify::new());
let limiter = Limiter::new(config).with_release_notifier(release_notifier.clone());
let token = limiter.try_acquire().unwrap();
limiter.release(token, Some(Outcome::Overload)).await;
release_notifier.notified().await;
assert_eq!(limiter.state().limit(), 5, "overload: decrease");
}
#[tokio::test]
async fn should_increase_limit_on_success_when_using_gt_util_threshold() {
let config = RateLimiterConfig {
initial_limit: 4,
aimd_config: Some(AimdConfig {
aimd_decrease_factor: 0.5,
aimd_min_utilisation_threshold: 0.5,
aimd_increase_by: 1,
..Default::default()
}),
disable: false,
..Default::default()
};
let limiter = Limiter::new(config);
let token = limiter.try_acquire().unwrap();
let _token = limiter.try_acquire().unwrap();
let _token = limiter.try_acquire().unwrap();
limiter.release(token, Some(Outcome::Success)).await;
assert_eq!(limiter.state().limit(), 5, "success: increase");
}
#[tokio::test]
async fn should_not_change_limit_on_success_when_using_lt_util_threshold() {
let config = RateLimiterConfig {
initial_limit: 4,
aimd_config: Some(AimdConfig {
aimd_decrease_factor: 0.5,
aimd_min_utilisation_threshold: 0.5,
..Default::default()
}),
disable: false,
..Default::default()
};
let limiter = Limiter::new(config);
let token = limiter.try_acquire().unwrap();
limiter.release(token, Some(Outcome::Success)).await;
assert_eq!(
limiter.state().limit(),
4,
"success: ignore when < half limit"
);
}
#[tokio::test]
async fn should_not_change_limit_when_no_outcome() {
let config = RateLimiterConfig {
initial_limit: 10,
aimd_config: Some(AimdConfig {
aimd_decrease_factor: 0.5,
aimd_min_utilisation_threshold: 0.5,
..Default::default()
}),
disable: false,
..Default::default()
};
let limiter = Limiter::new(config);
let token = limiter.try_acquire().unwrap();
limiter.release(token, None).await;
assert_eq!(limiter.state().limit(), 10, "ignore");
}
}

View File

@@ -0,0 +1,98 @@
//! Algorithms for controlling concurrency limits.
use async_trait::async_trait;
use std::time::Duration;
use super::{limiter::Outcome, Aimd};
/// An algorithm for controlling a concurrency limit.
#[async_trait]
pub trait LimitAlgorithm: Send + Sync + 'static {
/// Update the concurrency limit in response to a new job completion.
async fn update(&mut self, old_limit: usize, sample: Sample) -> usize;
}
/// The result of a job (or jobs), including the [Outcome] (loss) and latency (delay).
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Sample {
pub(crate) latency: Duration,
/// Jobs in flight when the sample was taken.
pub(crate) in_flight: usize,
pub(crate) outcome: Outcome,
}
#[derive(Clone, Copy, Debug, Default, clap::ValueEnum)]
pub enum RateLimitAlgorithm {
Fixed,
#[default]
Aimd,
}
pub struct Fixed;
#[async_trait]
impl LimitAlgorithm for Fixed {
async fn update(&mut self, old_limit: usize, _sample: Sample) -> usize {
old_limit
}
}
#[derive(Clone, Copy, Debug)]
pub struct RateLimiterConfig {
pub disable: bool,
pub algorithm: RateLimitAlgorithm,
pub timeout: Duration,
pub initial_limit: usize,
pub aimd_config: Option<AimdConfig>,
}
impl RateLimiterConfig {
pub fn create_rate_limit_algorithm(self) -> Box<dyn LimitAlgorithm> {
match self.algorithm {
RateLimitAlgorithm::Fixed => Box::new(Fixed),
RateLimitAlgorithm::Aimd => Box::new(Aimd::new(self.aimd_config.unwrap())), // For aimd algorithm config is mandatory.
}
}
}
impl Default for RateLimiterConfig {
fn default() -> Self {
Self {
disable: true,
algorithm: RateLimitAlgorithm::Aimd,
timeout: Duration::from_secs(1),
initial_limit: 100,
aimd_config: Some(AimdConfig::default()),
}
}
}
#[derive(clap::Parser, Clone, Copy, Debug)]
pub struct AimdConfig {
/// Minimum limit for AIMD algorithm. Makes sense only if `rate_limit_algorithm` is `Aimd`.
#[clap(long, default_value_t = 1)]
pub aimd_min_limit: usize,
/// Maximum limit for AIMD algorithm. Makes sense only if `rate_limit_algorithm` is `Aimd`.
#[clap(long, default_value_t = 1500)]
pub aimd_max_limit: usize,
/// Increase AIMD increase by value in case of success. Makes sense only if `rate_limit_algorithm` is `Aimd`.
#[clap(long, default_value_t = 10)]
pub aimd_increase_by: usize,
/// Decrease AIMD decrease by value in case of timout/429. Makes sense only if `rate_limit_algorithm` is `Aimd`.
#[clap(long, default_value_t = 0.9)]
pub aimd_decrease_factor: f32,
/// A threshold below which the limit won't be increased. Makes sense only if `rate_limit_algorithm` is `Aimd`.
#[clap(long, default_value_t = 0.8)]
pub aimd_min_utilisation_threshold: f32,
}
impl Default for AimdConfig {
fn default() -> Self {
Self {
aimd_min_limit: 1,
aimd_max_limit: 1500,
aimd_increase_by: 10,
aimd_decrease_factor: 0.9,
aimd_min_utilisation_threshold: 0.8,
}
}
}

View File

@@ -0,0 +1,441 @@
use std::{
sync::{
atomic::{AtomicUsize, Ordering},
Arc,
},
time::Duration,
};
use tokio::sync::{Mutex as AsyncMutex, Semaphore, SemaphorePermit};
use tokio::time::{timeout, Instant};
use tracing::info;
use super::{
limit_algorithm::{LimitAlgorithm, Sample},
RateLimiterConfig,
};
/// Limits the number of concurrent jobs.
///
/// Concurrency is limited through the use of [Token]s. Acquire a token to run a job, and release the
/// token once the job is finished.
///
/// The limit will be automatically adjusted based on observed latency (delay) and/or failures
/// caused by overload (loss).
pub struct Limiter {
limit_algo: AsyncMutex<Box<dyn LimitAlgorithm>>,
semaphore: std::sync::Arc<Semaphore>,
config: RateLimiterConfig,
// ONLY WRITE WHEN LIMIT_ALGO IS LOCKED
limits: AtomicUsize,
// ONLY USE ATOMIC ADD/SUB
in_flight: Arc<AtomicUsize>,
#[cfg(test)]
notifier: Option<std::sync::Arc<tokio::sync::Notify>>,
}
/// A concurrency token, required to run a job.
///
/// Release the token back to the [Limiter] after the job is complete.
#[derive(Debug)]
pub struct Token<'t> {
permit: Option<tokio::sync::SemaphorePermit<'t>>,
start: Instant,
in_flight: Arc<AtomicUsize>,
}
/// A snapshot of the state of the [Limiter].
///
/// Not guaranteed to be consistent under high concurrency.
#[derive(Debug, Clone, Copy)]
pub struct LimiterState {
limit: usize,
available: usize,
in_flight: usize,
}
/// Whether a job succeeded or failed as a result of congestion/overload.
///
/// Errors not considered to be caused by overload should be ignored.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Outcome {
/// The job succeeded, or failed in a way unrelated to overload.
Success,
/// The job failed because of overload, e.g. it timed out or an explicit backpressure signal
/// was observed.
Overload,
}
impl Outcome {
fn from_reqwest_error(error: &reqwest_middleware::Error) -> Self {
match error {
reqwest_middleware::Error::Middleware(_) => Outcome::Success,
reqwest_middleware::Error::Reqwest(e) => {
if let Some(status) = e.status() {
if status.is_server_error()
|| reqwest::StatusCode::TOO_MANY_REQUESTS.as_u16() == status
{
Outcome::Overload
} else {
Outcome::Success
}
} else {
Outcome::Success
}
}
}
}
fn from_reqwest_response(response: &reqwest::Response) -> Self {
if response.status().is_server_error()
|| response.status() == reqwest::StatusCode::TOO_MANY_REQUESTS
{
Outcome::Overload
} else {
Outcome::Success
}
}
}
impl Limiter {
/// Create a limiter with a given limit control algorithm.
pub fn new(config: RateLimiterConfig) -> Self {
assert!(config.initial_limit > 0);
Self {
limit_algo: AsyncMutex::new(config.create_rate_limit_algorithm()),
semaphore: Arc::new(Semaphore::new(config.initial_limit)),
config,
limits: AtomicUsize::new(config.initial_limit),
in_flight: Arc::new(AtomicUsize::new(0)),
#[cfg(test)]
notifier: None,
}
}
// pub fn new(limit_algorithm: T, timeout: Duration, initial_limit: usize) -> Self {
// assert!(initial_limit > 0);
// Self {
// limit_algo: AsyncMutex::new(limit_algorithm),
// semaphore: Arc::new(Semaphore::new(initial_limit)),
// timeout,
// limits: AtomicUsize::new(initial_limit),
// in_flight: Arc::new(AtomicUsize::new(0)),
// #[cfg(test)]
// notifier: None,
// }
// }
/// In some cases [Token]s are acquired asynchronously when updating the limit.
#[cfg(test)]
pub fn with_release_notifier(mut self, n: std::sync::Arc<tokio::sync::Notify>) -> Self {
self.notifier = Some(n);
self
}
/// Try to immediately acquire a concurrency [Token].
///
/// Returns `None` if there are none available.
pub fn try_acquire(&self) -> Option<Token> {
let result = if self.config.disable {
// If the rate limiter is disabled, we can always acquire a token.
Some(Token::new(None, self.in_flight.clone()))
} else {
self.semaphore
.try_acquire()
.map(|permit| Token::new(Some(permit), self.in_flight.clone()))
.ok()
};
if result.is_some() {
self.in_flight.fetch_add(1, Ordering::AcqRel);
}
result
}
/// Try to acquire a concurrency [Token], waiting for `duration` if there are none available.
///
/// Returns `None` if there are none available after `duration`.
pub async fn acquire_timeout(&self, duration: Duration) -> Option<Token<'_>> {
info!("acquiring token: {:?}", self.semaphore.available_permits());
let result = if self.config.disable {
// If the rate limiter is disabled, we can always acquire a token.
Some(Token::new(None, self.in_flight.clone()))
} else {
match timeout(duration, self.semaphore.acquire()).await {
Ok(maybe_permit) => maybe_permit
.map(|permit| Token::new(Some(permit), self.in_flight.clone()))
.ok(),
Err(_) => None,
}
};
if result.is_some() {
self.in_flight.fetch_add(1, Ordering::AcqRel);
}
result
}
/// Return the concurrency [Token], along with the outcome of the job.
///
/// The [Outcome] of the job, and the time taken to perform it, may be used
/// to update the concurrency limit.
///
/// Set the outcome to `None` to ignore the job.
pub async fn release(&self, mut token: Token<'_>, outcome: Option<Outcome>) {
tracing::info!("outcome is {:?}", outcome);
let in_flight = self.in_flight.load(Ordering::Acquire);
let old_limit = self.limits.load(Ordering::Acquire);
let available = if self.config.disable {
0 // This is not used in the algorithm and can be anything. If the config disable it makes sense to set it to 0.
} else {
self.semaphore.available_permits()
};
let total = in_flight + available;
let mut algo = self.limit_algo.lock().await;
let new_limit = if let Some(outcome) = outcome {
let sample = Sample {
latency: token.start.elapsed(),
in_flight,
outcome,
};
algo.update(old_limit, sample).await
} else {
old_limit
};
tracing::info!("new limit is {}", new_limit);
let actual_limit = if new_limit < total {
token.forget();
total.saturating_sub(1)
} else {
if !self.config.disable {
self.semaphore.add_permits(new_limit.saturating_sub(total));
}
new_limit
};
crate::proxy::RATE_LIMITER_LIMIT
.with_label_values(&["expected"])
.set(new_limit as i64);
crate::proxy::RATE_LIMITER_LIMIT
.with_label_values(&["actual"])
.set(actual_limit as i64);
self.limits.store(new_limit, Ordering::Release);
#[cfg(test)]
if let Some(n) = &self.notifier {
n.notify_one();
}
}
/// The current state of the limiter.
pub fn state(&self) -> LimiterState {
let limit = self.limits.load(Ordering::Relaxed);
let in_flight = self.in_flight.load(Ordering::Relaxed);
LimiterState {
limit,
available: limit.saturating_sub(in_flight),
in_flight,
}
}
}
impl<'t> Token<'t> {
fn new(permit: Option<SemaphorePermit<'t>>, in_flight: Arc<AtomicUsize>) -> Self {
Self {
permit,
start: Instant::now(),
in_flight,
}
}
#[cfg(test)]
pub fn set_latency(&mut self, latency: Duration) {
use std::ops::Sub;
self.start = Instant::now().sub(latency);
}
pub fn forget(&mut self) {
if let Some(permit) = self.permit.take() {
permit.forget();
}
}
}
impl Drop for Token<'_> {
fn drop(&mut self) {
self.in_flight.fetch_sub(1, Ordering::AcqRel);
}
}
impl LimiterState {
/// The current concurrency limit.
pub fn limit(&self) -> usize {
self.limit
}
/// The amount of concurrency available to use.
pub fn available(&self) -> usize {
self.available
}
/// The number of jobs in flight.
pub fn in_flight(&self) -> usize {
self.in_flight
}
}
#[async_trait::async_trait]
impl reqwest_middleware::Middleware for Limiter {
async fn handle(
&self,
req: reqwest::Request,
extensions: &mut task_local_extensions::Extensions,
next: reqwest_middleware::Next<'_>,
) -> reqwest_middleware::Result<reqwest::Response> {
let start = Instant::now();
let token = self
.acquire_timeout(self.config.timeout)
.await
.ok_or_else(|| {
reqwest_middleware::Error::Middleware(
// TODO: Should we map it into user facing errors?
crate::console::errors::ApiError::Console {
status: crate::http::StatusCode::TOO_MANY_REQUESTS,
text: "Too many requests".into(),
}
.into(),
)
})?;
info!(duration = ?start.elapsed(), "waiting for token to connect to the control plane");
crate::proxy::RATE_LIMITER_ACQUIRE_LATENCY.observe(start.elapsed().as_secs_f64());
match next.run(req, extensions).await {
Ok(response) => {
self.release(token, Some(Outcome::from_reqwest_response(&response)))
.await;
Ok(response)
}
Err(e) => {
self.release(token, Some(Outcome::from_reqwest_error(&e)))
.await;
Err(e)
}
}
}
}
#[cfg(test)]
mod tests {
use std::{pin::pin, task::Context, time::Duration};
use futures::{task::noop_waker_ref, Future};
use super::{Limiter, Outcome};
use crate::rate_limiter::RateLimitAlgorithm;
#[tokio::test]
async fn it_works() {
let config = super::RateLimiterConfig {
algorithm: RateLimitAlgorithm::Fixed,
timeout: Duration::from_secs(1),
initial_limit: 10,
disable: false,
..Default::default()
};
let limiter = Limiter::new(config);
let token = limiter.try_acquire().unwrap();
limiter.release(token, Some(Outcome::Success)).await;
assert_eq!(limiter.state().limit(), 10);
}
#[tokio::test]
async fn is_fair() {
let config = super::RateLimiterConfig {
algorithm: RateLimitAlgorithm::Fixed,
timeout: Duration::from_secs(1),
initial_limit: 1,
disable: false,
..Default::default()
};
let limiter = Limiter::new(config);
// === TOKEN 1 ===
let token1 = limiter.try_acquire().unwrap();
let mut token2_fut = pin!(limiter.acquire_timeout(Duration::from_secs(1)));
assert!(
token2_fut
.as_mut()
.poll(&mut Context::from_waker(noop_waker_ref()))
.is_pending(),
"token is acquired by token1"
);
let mut token3_fut = pin!(limiter.acquire_timeout(Duration::from_secs(1)));
assert!(
token3_fut
.as_mut()
.poll(&mut Context::from_waker(noop_waker_ref()))
.is_pending(),
"token is acquired by token1"
);
limiter.release(token1, Some(Outcome::Success)).await;
// === END TOKEN 1 ===
// === TOKEN 2 ===
assert!(
limiter.try_acquire().is_none(),
"token is acquired by token2"
);
assert!(
token3_fut
.as_mut()
.poll(&mut Context::from_waker(noop_waker_ref()))
.is_pending(),
"token is acquired by token2"
);
let token2 = token2_fut.await.unwrap();
limiter.release(token2, Some(Outcome::Success)).await;
// === END TOKEN 2 ===
// === TOKEN 3 ===
assert!(
limiter.try_acquire().is_none(),
"token is acquired by token3"
);
let token3 = token3_fut.await.unwrap();
limiter.release(token3, Some(Outcome::Success)).await;
// === END TOKEN 3 ===
// === TOKEN 4 ===
let token4 = limiter.try_acquire().unwrap();
limiter.release(token4, Some(Outcome::Success)).await;
}
#[tokio::test]
async fn disable() {
let config = super::RateLimiterConfig {
algorithm: RateLimitAlgorithm::Fixed,
timeout: Duration::from_secs(1),
initial_limit: 1,
disable: true,
..Default::default()
};
let limiter = Limiter::new(config);
// === TOKEN 1 ===
let token1 = limiter.try_acquire().unwrap();
let token2 = limiter.try_acquire().unwrap();
let state = limiter.state();
assert_eq!(state.limit(), 1);
assert_eq!(state.in_flight(), 2); // For disabled limiter, it's expected.
limiter.release(token1, None).await;
limiter.release(token2, None).await;
}
}

View File

@@ -208,14 +208,13 @@ impl GlobalConnPool {
} else {
info!("pool: reusing connection '{conn_info}'");
client.session.send(session_id)?;
tracing::Span::current().record(
"pid",
&tracing::field::display(client.inner.get_process_id()),
);
latency_timer.pool_hit();
latency_timer.success();
return Ok(Client {
conn_id: client.conn_id,
inner: Some(client),
span: Span::current(),
pool,
});
return Ok(Client::new(client, pool).await);
}
} else {
let conn_id = uuid::Uuid::new_v4();
@@ -229,6 +228,12 @@ impl GlobalConnPool {
)
.await
};
if let Ok(client) = &new_client {
tracing::Span::current().record(
"pid",
&tracing::field::display(client.inner.get_process_id()),
);
}
match &new_client {
// clear the hash. it's no longer valid
@@ -262,13 +267,8 @@ impl GlobalConnPool {
}
_ => {}
}
new_client.map(|inner| Client {
conn_id: inner.conn_id,
inner: Some(inner),
span: Span::current(),
pool,
})
let new_client = new_client?;
Ok(Client::new(new_client, pool).await)
}
fn put(&self, conn_info: &ConnInfo, client: ClientInner) -> anyhow::Result<()> {
@@ -394,7 +394,7 @@ impl ConnectMechanism for TokioMechanism<'_> {
// Wake up the destination if needed. Code here is a bit involved because
// we reuse the code from the usual proxy and we need to prepare few structures
// that this code expects.
#[tracing::instrument(skip_all)]
#[tracing::instrument(fields(pid = tracing::field::Empty), skip_all)]
async fn connect_to_compute(
config: &config::ProxyConfig,
conn_info: &ConnInfo,
@@ -461,6 +461,7 @@ async fn connect_to_compute_once(
.connect_timeout(timeout)
.connect(tokio_postgres::NoTls)
.await?;
tracing::Span::current().record("pid", &tracing::field::display(client.get_process_id()));
let (tx, mut rx) = tokio::sync::watch::channel(session);
@@ -547,6 +548,17 @@ pub struct Discard<'a> {
}
impl Client {
pub(self) async fn new(
inner: ClientInner,
pool: Option<(ConnInfo, Arc<GlobalConnPool>)>,
) -> Self {
Self {
conn_id: inner.conn_id,
inner: Some(inner),
span: Span::current(),
pool,
}
}
pub fn inner(&mut self) -> (&mut tokio_postgres::Client, Discard<'_>) {
let Self {
inner,

View File

@@ -250,7 +250,7 @@ pub async fn handle(
Ok(response)
}
#[instrument(name = "sql-over-http", skip_all)]
#[instrument(name = "sql-over-http", fields(pid = tracing::field::Empty), skip_all)]
async fn handle_inner(
request: Request<Body>,
sni_hostname: Option<String>,

View File

@@ -249,7 +249,7 @@ mod tests {
use url::Url;
use super::{collect_metrics_iteration, Ids, Metrics};
use crate::http;
use crate::{http, rate_limiter::RateLimiterConfig};
#[tokio::test]
async fn metrics() {
@@ -279,7 +279,7 @@ mod tests {
tokio::spawn(server);
let metrics = Metrics::default();
let client = http::new_client();
let client = http::new_client(RateLimiterConfig::default());
let endpoint = Url::parse(&format!("http://{addr}")).unwrap();
let now = Utc::now();

View File

@@ -33,7 +33,7 @@ psutil = "^5.9.4"
types-psutil = "^5.9.5.12"
types-toml = "^0.10.8.6"
pytest-httpserver = "^1.0.8"
aiohttp = "3.8.5"
aiohttp = "3.8.6"
pytest-rerunfailures = "^11.1.2"
types-pytest-lazy-fixture = "^0.6.3.3"
pytest-split = "^0.8.1"

View File

@@ -18,7 +18,7 @@
# 6. We wait for the new pageserver's remote_consistent_lsn to catch up
#
# For more context on how to use this, see:
# https://github.com/neondatabase/cloud/wiki/Storage-format-migration
# https://www.notion.so/neondatabase/Storage-format-migration-9a8eba33ccf8417ea8cf50e6a0c542cf
import argparse
import os

View File

@@ -1871,6 +1871,8 @@ def append_pageserver_param_overrides(
params_to_update.append(
f"--pageserver-config-override=remote_storage={remote_storage_toml_table}"
)
else:
params_to_update.append('--pageserver-config-override=remote_storage=""')
env_overrides = os.getenv("NEON_PAGESERVER_OVERRIDES")
if env_overrides is not None:
@@ -2177,6 +2179,29 @@ class NeonProxy(PgProtocol):
*["--allow-self-signed-compute", "true"],
]
class Console(AuthBackend):
def __init__(self, endpoint: str, fixed_rate_limit: Optional[int] = None):
self.endpoint = endpoint
self.fixed_rate_limit = fixed_rate_limit
def extra_args(self) -> list[str]:
args = [
# Console auth backend params
*["--auth-backend", "console"],
*["--auth-endpoint", self.endpoint],
]
if self.fixed_rate_limit is not None:
args += [
*["--disable-dynamic-rate-limiter", "false"],
*["--rate-limit-algorithm", "aimd"],
*["--initial-limit", str(1)],
*["--rate-limiter-timeout", "1s"],
*["--aimd-min-limit", "0"],
*["--aimd-increase-by", "1"],
*["--wake-compute-cache", "size=0"], # Disable cache to test rate limiter.
]
return args
@dataclass(frozen=True)
class Postgres(AuthBackend):
pg_conn_url: str

View File

@@ -26,6 +26,7 @@ def test_local_corruption(neon_env_builder: NeonEnvBuilder):
".*will not become active. Current state: Broken.*",
".*failed to load metadata.*",
".*load failed.*load local timeline.*",
".*layer loading failed permanently: load layer: .*",
]
)

View File

@@ -449,7 +449,7 @@ def check_neon_works(
)
# Check that project can be recovered from WAL
# loosely based on https://github.com/neondatabase/cloud/wiki/Recovery-from-WAL
# loosely based on https://www.notion.so/neondatabase/Storage-Recovery-from-WAL-d92c0aac0ebf40df892b938045d7d720
tenant_id = snapshot_config["default_tenant_id"]
timeline_id = dict(snapshot_config["branch_name_mappings"]["main"])[tenant_id]
pageserver_port = snapshot_config["pageservers"][0]["listen_http_addr"].split(":")[-1]

View File

@@ -79,13 +79,32 @@ def test_lsn_mapping_old(neon_env_builder: NeonEnvBuilder):
def test_lsn_mapping(neon_env_builder: NeonEnvBuilder):
env = neon_env_builder.init_start()
new_timeline_id = env.neon_cli.create_branch("test_lsn_mapping")
endpoint_main = env.endpoints.create_start("test_lsn_mapping")
log.info("postgres is running on 'test_lsn_mapping' branch")
tenant_id, _ = env.neon_cli.create_tenant(
conf={
# disable default GC and compaction
"gc_period": "1000 m",
"compaction_period": "0 s",
"gc_horizon": f"{1024 ** 2}",
"checkpoint_distance": f"{1024 ** 2}",
"compaction_target_size": f"{1024 ** 2}",
}
)
timeline_id = env.neon_cli.create_branch("test_lsn_mapping", tenant_id=tenant_id)
endpoint_main = env.endpoints.create_start("test_lsn_mapping", tenant_id=tenant_id)
timeline_id = endpoint_main.safe_psql("show neon.timeline_id")[0][0]
log.info("postgres is running on 'main' branch")
cur = endpoint_main.connect().cursor()
# Obtain an lsn before all write operations on this branch
start_lsn = Lsn(query_scalar(cur, "SELECT pg_current_wal_lsn()"))
# Create table, and insert rows, each in a separate transaction
# Disable synchronous_commit to make this initialization go faster.
# Disable `synchronous_commit` to make this initialization go faster.
# XXX: on my laptop this test takes 7s, and setting `synchronous_commit=off`
# doesn't change anything.
#
# Each row contains current insert LSN and the current timestamp, when
# the row was inserted.
@@ -104,40 +123,63 @@ def test_lsn_mapping(neon_env_builder: NeonEnvBuilder):
cur.execute("INSERT INTO foo VALUES (-1)")
# Wait until WAL is received by pageserver
wait_for_last_flush_lsn(env, endpoint_main, env.initial_tenant, new_timeline_id)
last_flush_lsn = wait_for_last_flush_lsn(env, endpoint_main, tenant_id, timeline_id)
with env.pageserver.http_client() as client:
# Check edge cases: timestamp in the future
# Check edge cases
# Timestamp is in the future
probe_timestamp = tbl[-1][1] + timedelta(hours=1)
result = client.timeline_get_lsn_by_timestamp(
env.initial_tenant, new_timeline_id, f"{probe_timestamp.isoformat()}Z", 2
tenant_id, timeline_id, f"{probe_timestamp.isoformat()}Z", 2
)
assert result["kind"] == "future"
# make sure that we return a well advanced lsn here
assert Lsn(result["lsn"]) > start_lsn
# timestamp too the far history
# Timestamp is in the unreachable past
probe_timestamp = tbl[0][1] - timedelta(hours=10)
result = client.timeline_get_lsn_by_timestamp(
env.initial_tenant, new_timeline_id, f"{probe_timestamp.isoformat()}Z", 2
tenant_id, timeline_id, f"{probe_timestamp.isoformat()}Z", 2
)
assert result["kind"] == "past"
# make sure that we return the minimum lsn here at the start of the range
assert Lsn(result["lsn"]) < start_lsn
# Probe a bunch of timestamps in the valid range
for i in range(1, len(tbl), 100):
probe_timestamp = tbl[i][1]
result = client.timeline_get_lsn_by_timestamp(
env.initial_tenant, new_timeline_id, f"{probe_timestamp.isoformat()}Z", 2
tenant_id, timeline_id, f"{probe_timestamp.isoformat()}Z", 2
)
assert result["kind"] not in ["past", "nodata"]
lsn = result["lsn"]
# Call get_lsn_by_timestamp to get the LSN
# Launch a new read-only node at that LSN, and check that only the rows
# that were supposed to be committed at that point in time are visible.
endpoint_here = env.endpoints.create_start(
branch_name="test_lsn_mapping", endpoint_id="ep-lsn_mapping_read", lsn=lsn
branch_name="test_lsn_mapping",
endpoint_id="ep-lsn_mapping_read",
lsn=lsn,
tenant_id=tenant_id,
)
assert endpoint_here.safe_psql("SELECT max(x) FROM foo")[0][0] == i
endpoint_here.stop_and_destroy()
# Do the "past" check again at a new branch to ensure that we don't return something before the branch cutoff
timeline_id_child = env.neon_cli.create_branch(
"test_lsn_mapping_child", tenant_id=tenant_id, ancestor_branch_name="test_lsn_mapping"
)
# Timestamp is in the unreachable past
probe_timestamp = tbl[0][1] - timedelta(hours=10)
result = client.timeline_get_lsn_by_timestamp(
tenant_id, timeline_id_child, f"{probe_timestamp.isoformat()}Z", 2
)
assert result["kind"] == "past"
# make sure that we return the minimum lsn here at the start of the range
assert Lsn(result["lsn"]) >= last_flush_lsn
# Test pageserver get_timestamp_of_lsn API
def test_ts_of_lsn_api(neon_env_builder: NeonEnvBuilder):

View File

@@ -1,3 +1,4 @@
import asyncio
import json
import subprocess
import time
@@ -11,6 +12,29 @@ from fixtures.neon_fixtures import PSQL, NeonProxy, VanillaPostgres
GET_CONNECTION_PID_QUERY = "SELECT pid FROM pg_stat_activity WHERE state = 'active'"
@pytest.mark.asyncio
async def test_http_pool_begin_1(static_proxy: NeonProxy):
static_proxy.safe_psql("create user http_auth with password 'http' superuser")
def query(*args) -> Any:
static_proxy.http_query(
"SELECT pg_sleep(10);",
args,
user="http_auth",
password="http",
expected_code=200,
)
query()
loop = asyncio.get_running_loop()
tasks = [loop.run_in_executor(None, query) for _ in range(10)]
# Wait for all the tasks to complete
completed, pending = await asyncio.wait(tasks)
# Get the results
results = [task.result() for task in completed]
print(results)
def test_proxy_select_1(static_proxy: NeonProxy):
"""
A simplest smoke test: check proxy against a local postgres instance.

View File

@@ -0,0 +1,84 @@
import asyncio
import time
from pathlib import Path
from typing import Iterator
import pytest
from fixtures.neon_fixtures import (
PSQL,
NeonProxy,
)
from fixtures.port_distributor import PortDistributor
from pytest_httpserver import HTTPServer
from werkzeug.wrappers.response import Response
def waiting_handler(status_code: int) -> Response:
# wait more than timeout to make sure that both (two) connections are open.
# It would be better to use a barrier here, but I don't know how to do that together with pytest-httpserver.
time.sleep(2)
return Response(status=status_code)
@pytest.fixture(scope="function")
def proxy_with_rate_limit(
port_distributor: PortDistributor,
neon_binpath: Path,
httpserver_listen_address,
test_output_dir: Path,
) -> Iterator[NeonProxy]:
"""Neon proxy that routes directly to vanilla postgres."""
proxy_port = port_distributor.get_port()
mgmt_port = port_distributor.get_port()
http_port = port_distributor.get_port()
external_http_port = port_distributor.get_port()
(host, port) = httpserver_listen_address
endpoint = f"http://{host}:{port}/billing/api/v1/usage_events"
with NeonProxy(
neon_binpath=neon_binpath,
test_output_dir=test_output_dir,
proxy_port=proxy_port,
http_port=http_port,
mgmt_port=mgmt_port,
external_http_port=external_http_port,
auth_backend=NeonProxy.Console(endpoint, fixed_rate_limit=5),
) as proxy:
proxy.start()
yield proxy
@pytest.mark.asyncio
async def test_proxy_rate_limit(
httpserver: HTTPServer,
proxy_with_rate_limit: NeonProxy,
):
uri = "/billing/api/v1/usage_events/proxy_get_role_secret"
# mock control plane service
httpserver.expect_ordered_request(uri, method="GET").respond_with_handler(
lambda _: Response(status=200)
)
httpserver.expect_ordered_request(uri, method="GET").respond_with_handler(
lambda _: waiting_handler(429)
)
httpserver.expect_ordered_request(uri, method="GET").respond_with_handler(
lambda _: waiting_handler(500)
)
psql = PSQL(host=proxy_with_rate_limit.host, port=proxy_with_rate_limit.proxy_port)
f = await psql.run("select 42;")
await proxy_with_rate_limit.find_auth_link(uri, f)
# Limit should be 2.
# Run two queries in parallel.
f1, f2 = await asyncio.gather(psql.run("select 42;"), psql.run("select 42;"))
await proxy_with_rate_limit.find_auth_link(uri, f1)
await proxy_with_rate_limit.find_auth_link(uri, f2)
# Now limit should be 0.
f = await psql.run("select 42;")
await proxy_with_rate_limit.find_auth_link(uri, f)
# There last query shouldn't reach the http-server.
assert httpserver.assertions == []

126
vm-image-spec.yaml Normal file
View File

@@ -0,0 +1,126 @@
# Supplemental file for neondatabase/autoscaling's vm-builder, for producing the VM compute image.
---
commands:
- name: cgconfigparser
user: root
sysvInitAction: sysinit
shell: 'cgconfigparser -l /etc/cgconfig.conf -s 1664'
- name: pgbouncer
user: nobody
sysvInitAction: respawn
shell: '/usr/local/bin/pgbouncer /etc/pgbouncer.ini'
- name: postgres-exporter
user: nobody
sysvInitAction: respawn
shell: 'DATA_SOURCE_NAME="user=cloud_admin sslmode=disable dbname=postgres" /bin/postgres_exporter'
shutdownHook: |
su -p postgres --session-command '/usr/local/bin/pg_ctl stop -D /var/db/postgres/compute/pgdata -m fast --wait -t 10'
files:
- filename: pgbouncer.ini
content: |
[databases]
*=host=localhost port=5432 auth_user=cloud_admin
[pgbouncer]
listen_port=6432
listen_addr=0.0.0.0
auth_type=scram-sha-256
auth_user=cloud_admin
auth_dbname=postgres
client_tls_sslmode=disable
server_tls_sslmode=disable
pool_mode=transaction
max_client_conn=10000
default_pool_size=16
max_prepared_statements=0
- filename: cgconfig.conf
content: |
# Configuration for cgroups in VM compute nodes
group neon-postgres {
perm {
admin {
uid = postgres;
}
task {
gid = users;
}
}
memory {}
}
build: |
# Build cgroup-tools
#
# At time of writing (2023-03-14), debian bullseye has a version of cgroup-tools (technically
# libcgroup) that doesn't support cgroup v2 (version 0.41-11). Unfortunately, the vm-monitor
# requires cgroup v2, so we'll build cgroup-tools ourselves.
FROM debian:bullseye-slim as libcgroup-builder
ENV LIBCGROUP_VERSION v2.0.3
RUN set -exu \
&& apt update \
&& apt install --no-install-recommends -y \
git \
ca-certificates \
automake \
cmake \
make \
gcc \
byacc \
flex \
libtool \
libpam0g-dev \
&& git clone --depth 1 -b $LIBCGROUP_VERSION https://github.com/libcgroup/libcgroup \
&& INSTALL_DIR="/libcgroup-install" \
&& mkdir -p "$INSTALL_DIR/bin" "$INSTALL_DIR/include" \
&& cd libcgroup \
# extracted from bootstrap.sh, with modified flags:
&& (test -d m4 || mkdir m4) \
&& autoreconf -fi \
&& rm -rf autom4te.cache \
&& CFLAGS="-O3" ./configure --prefix="$INSTALL_DIR" --sysconfdir=/etc --localstatedir=/var --enable-opaque-hierarchy="name=systemd" \
# actually build the thing...
&& make install
FROM quay.io/prometheuscommunity/postgres-exporter:v0.12.0 AS postgres-exporter
# Build pgbouncer
#
FROM debian:bullseye-slim AS pgbouncer
RUN set -e \
&& apt-get update \
&& apt-get install -y \
curl \
build-essential \
pkg-config \
libevent-dev \
libssl-dev
ENV PGBOUNCER_VERSION 1.21.0
ENV PGBOUNCER_GITPATH 1_21_0
RUN set -e \
&& curl -sfSL https://github.com/pgbouncer/pgbouncer/releases/download/pgbouncer_${PGBOUNCER_GITPATH}/pgbouncer-${PGBOUNCER_VERSION}.tar.gz -o pgbouncer-${PGBOUNCER_VERSION}.tar.gz \
&& tar xzvf pgbouncer-${PGBOUNCER_VERSION}.tar.gz \
&& cd pgbouncer-${PGBOUNCER_VERSION} \
&& LDFLAGS=-static ./configure --prefix=/usr/local/pgbouncer --without-openssl \
&& make -j $(nproc) \
&& make install
merge: |
# tweak nofile limits
RUN set -e \
&& echo 'fs.file-max = 1048576' >>/etc/sysctl.conf \
&& test ! -e /etc/security || ( \
echo '* - nofile 1048576' >>/etc/security/limits.conf \
&& echo 'root - nofile 1048576' >>/etc/security/limits.conf \
)
COPY cgconfig.conf /etc/cgconfig.conf
COPY pgbouncer.ini /etc/pgbouncer.ini
RUN set -e \
&& chown postgres:postgres /etc/pgbouncer.ini \
&& chmod 0644 /etc/pgbouncer.ini \
&& chmod 0644 /etc/cgconfig.conf
COPY --from=libcgroup-builder /libcgroup-install/bin/* /usr/bin/
COPY --from=libcgroup-builder /libcgroup-install/lib/* /usr/lib/
COPY --from=libcgroup-builder /libcgroup-install/sbin/* /usr/sbin/
COPY --from=postgres-exporter /bin/postgres_exporter /bin/postgres_exporter
COPY --from=pgbouncer /usr/local/pgbouncer/bin/pgbouncer /usr/local/bin/pgbouncer

View File

@@ -39,7 +39,7 @@ hex = { version = "0.4", features = ["serde"] }
hyper = { version = "0.14", features = ["full"] }
itertools = { version = "0.10" }
libc = { version = "0.2", features = ["extra_traits"] }
log = { version = "0.4", default-features = false, features = ["kv_unstable", "std"] }
log = { version = "0.4", default-features = false, features = ["std"] }
memchr = { version = "2" }
nom = { version = "7" }
num-bigint = { version = "0.4" }
@@ -56,7 +56,6 @@ scopeguard = { version = "1" }
serde = { version = "1", features = ["alloc", "derive"] }
serde_json = { version = "1", features = ["raw_value"] }
smallvec = { version = "1", default-features = false, features = ["write"] }
standback = { version = "0.2", default-features = false, features = ["std"] }
time = { version = "0.3", features = ["local-offset", "macros", "serde-well-known"] }
tokio = { version = "1", features = ["fs", "io-std", "io-util", "macros", "net", "process", "rt-multi-thread", "signal", "test-util"] }
tokio-rustls = { version = "0.24" }
@@ -77,14 +76,13 @@ cc = { version = "1", default-features = false, features = ["parallel"] }
either = { version = "1" }
itertools = { version = "0.10" }
libc = { version = "0.2", features = ["extra_traits"] }
log = { version = "0.4", default-features = false, features = ["kv_unstable", "std"] }
log = { version = "0.4", default-features = false, features = ["std"] }
memchr = { version = "2" }
nom = { version = "7" }
prost = { version = "0.11" }
regex = { version = "1" }
regex-syntax = { version = "0.7" }
serde = { version = "1", features = ["alloc", "derive"] }
standback = { version = "0.2", default-features = false, features = ["std"] }
syn-dff4ba8e3ae991db = { package = "syn", version = "1", features = ["extra-traits", "full", "visit"] }
syn-f595c2ba2a3f28df = { package = "syn", version = "2", features = ["extra-traits", "full", "visit", "visit-mut"] }
time-macros = { version = "0.2", default-features = false, features = ["formatting", "parsing", "serde"] }