Compare commits

...

48 Commits

Author SHA1 Message Date
Alexey Masterov
d20e851f4f change runner 2025-05-22 09:09:31 +02:00
Alexey Masterov
6748aa2e4a increase the timeout 2025-05-22 09:06:25 +02:00
Alexey Masterov
03474c09bd add 495 patch 2025-05-21 15:37:44 +02:00
Alexey Masterov
69e61294ba reduce tenk 2025-05-20 17:17:38 +02:00
Alexey Masterov
6018dc0cce reduce tenk 2025-05-20 15:11:03 +02:00
Alexey Masterov
d07ae3e32c multiply tenk 2025-05-20 12:57:58 +02:00
Alexey Masterov
31df7205c6 multiply onek 2025-05-20 12:13:35 +02:00
Alexey Masterov
c6ddb4b236 multiply onek 2025-05-20 12:06:06 +02:00
Alexey Masterov
01b5fc902f change patch 2025-05-20 11:56:44 +02:00
Alexey Masterov
73aa5ede11 Change fix patching 2025-05-20 09:42:49 +02:00
Alexey Masterov
ad0b4c6d01 Change region 2025-05-20 09:31:41 +02:00
Alexey Masterov
975a5c22c8 Change patch to 95++ 2025-05-20 09:25:30 +02:00
Alexey Masterov
f7abc25c3e Change patch to 95 2025-05-19 17:03:47 +02:00
Alexey Masterov
2b2df45e76 change to devin-generated 2025-05-19 14:50:03 +02:00
Alexey Masterov
a3d5ed9d2f x100 2025-05-19 12:34:22 +02:00
Alexey Masterov
f02f19a1d0 fix path 2025-05-19 12:03:20 +02:00
Alexey Masterov
a9f7a96cb7 Add patches 2025-05-19 11:49:45 +02:00
Konstantin Knizhnik
81c557d87e Unlogged build get smgr (#11954)
## Problem

See https://github.com/neondatabase/neon/issues/11910
and https://neondb.slack.com/archives/C04DGM6SMTM/p1747314649059129

## Summary of changes

Do not change persistence in `start_unlogged_build`

Postgres PRs:
https://github.com/neondatabase/postgres/pull/642
https://github.com/neondatabase/postgres/pull/641
https://github.com/neondatabase/postgres/pull/640
https://github.com/neondatabase/postgres/pull/639

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-18 05:02:47 +00:00
Trung Dinh
e963129678 pagesteam_handle_batched_message -> pagestream_handle_batched_message (#11916)
## Problem
Found a typo in code.

## Summary of changes

Co-authored-by: Trung Dinh <tdinh@roblox.com>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-05-17 22:30:29 +00:00
dependabot[bot]
4f0a9fc569 chore(deps): bump flask-cors from 5.0.0 to 6.0.0 in the pip group across 1 directory (#11960)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-17 22:06:32 +00:00
Emmanuel Ferdman
81c6a5a796 Migrate to correct logger interface (#11956)
## Problem
Currently the `logger` library throws annoying deprecation warnings:
```python
DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
```

## Summary of changes
This small PR resolves the annoying deprecation warnings by migrating to
`.warning` as suggested.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-17 21:12:01 +00:00
Konstantin Knizhnik
8e05639dbf Invalidate LFC after unlogged build (#11951)
## Problem


See https://neondb.slack.com/archives/C04DGM6SMTM/p1747391617951239

LFC is not always properly updated during unlogged build so it can
contain stale content.

## Summary of changes

Invalidate LFC content at the end of unlogged build

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-17 19:06:59 +00:00
Alexander Bayandin
deed46015d CI(test-images): increase timeout from 20m to 60m (#11955)
## Problem

For some reason (unknown yet) 20m timeout is not enough for
`test-images` job on arm runners.
Ref:
https://github.com/neondatabase/neon/actions/runs/15075321681/job/42387530399?pr=11953

## Summary of changes
- Increase the timeout from 20m to 1h
2025-05-17 06:34:54 +00:00
Heikki Linnakangas
532d9b646e Add simple facility for an extendable shared memory area (#11929)
You still need to provide a max size up-front, but memory is only
allocated for the portion that is in use.

The module is currently unused, but will be used by the new compute
communicator project, in the neon Postgres extension. See
https://github.com/neondatabase/neon/issues/11729

---------

Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-05-16 21:22:36 +00:00
Heikki Linnakangas
55f91cf10b Update 'nix' package (#11948)
There were some incompatible changes. Most churn was from switching from
the now-deprecated fcntl:flock() function to
fcntl::Flock::lock(). The new function returns a guard object, while
with the old function, the lock was associated directly with the file
descriptor.

It's good to stay up-to-date in general, but the impetus to do this now
is that in https://github.com/neondatabase/neon/pull/11929, I want to
use some functions that were added only in the latest version of 'nix',
and it's nice to not have to build multiple versions. (Although,
different versions of 'nix' are still pulled in as indirect dependencies
from other packages)
2025-05-16 14:45:08 +00:00
Folke Behrens
baafcc5d41 proxy: Fix misspelled flag value alias, swap names and aliases (#11949)
## Problem

There's a misspelled flag value alias that's not really used anywhere.

## Summary of changes

Fix the alias and make aliases the official flag values and keep old
values as aliases.
Also rename enum variant. No need for it to carry the version now.
2025-05-16 14:12:39 +00:00
Evan Fleming
aa22572d8c safekeeper: refactor static remote storage usage to use Arc (#10179)
Greetings! Please add `w=1` to github url when viewing diff
(sepcifically `wal_backup.rs`)

## Problem

This PR is aimed at addressing the remaining work of #8200. Namely,
removing static usage of remote storage in favour of arc. I did not opt
to pass `Arc<RemoteStorage>` directly since it is actually
`Optional<RemoteStorage>` as it is not necessarily always configured. I
wanted to avoid having to pass `Arc<Optional<RemoteStorage>>` everywhere
with individual consuming functions likely needing to handle unwrapping.

Instead I've added a `WalBackup` struct that holds
`Optional<RemoteStorage>` and handles initialization/unwrapping
RemoteStorage internally. wal_backup functions now take self and
`Arc<WalBackup>` is passed as a dependency through the various consumers
that need it.

## Summary of changes
- Add `WalBackup` that holds `Optional<RemoteStorage>` and handles
initialization and unwrapping
- Modify wal_backup functions to take `WalBackup` as self (Add `w=1` to
github url when viewing diff here)
- Initialize `WalBackup` in safekeeper root
- Store `Arc<WalBackup>` in `GlobalTimelineMap` and pass and store in
each Timeline as loaded
- use `WalBackup` through Timeline as needed

## Refs

- task to remove global variables
https://github.com/neondatabase/neon/issues/8200
- drive-by fixes https://github.com/neondatabase/neon/issues/11501 
by turning the panic reported there into an error `remote storage not
configured`

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-05-16 12:41:10 +00:00
Arpad Müller
2d247375b3 Update rust to 1.87.0 (#11938)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

The 1.87.0 release marks 10 years of Rust.

[Announcement blog
post](https://blog.rust-lang.org/2025/05/15/Rust-1.87.0/)

Prior update was in #11431
2025-05-16 12:21:24 +00:00
Christian Schwarz
a7ce323949 benchmarking: extend test_page_service_batching.py to cover concurrent IO + batching under random reads (#10466)
This PR commits the benchmarks I ran to qualify concurrent IO before we
released it.

Changes:
- Add `l0stack` fixture; a reusable abstraction for creating a stack of
L0 deltas
  each of which has 1 Value::Delta per page.
- Such a stack of L0 deltas is a good and understandable demo for
concurrent IO
because to reconstruct any page, $layer_stack_height` Values need to be
read.
  Before concurrent IO, the reads were sequential.
  With concurrent IO, they are executed concurrently.
- So, switch `test_latency` to use the l0stack.
- Teach `pagebench`, which is used by `test_latency`, to limit itself to
the blocks of the relation created by the l0stack abstraction.
- Additional parametrization of `test_latency` over dimensions
`ps_io_concurrency,l0_stack_height,queue_depth`
- Use better names for the tests to reflect what they do, leave
interpretation of the (now quite high-dimensional) results to the reader
  - `test_{throughput => postgres_seqscan}`
  - `test_{latency => random_reads}`
- Cut down on permutations to those we use in production. Runtime is
about 2min.

Refs
- concurrent IO epic https://github.com/neondatabase/neon/issues/9378 
- batching task: fixes https://github.com/neondatabase/neon/issues/9837

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-05-15 17:48:13 +00:00
Vlad Lazar
31026d5a3c pageserver: support import schema evolution (#11935)
## Problem

Imports don't support schema evolution nicely. If we want to change the
stuff we keep in storcon,
we'd have to carry the old cruft around.

## Summary of changes

Version import progress. Note that the import progress version
determines the version of the import
job split and execution. This means that we can also use it as a
mechanism for deploying new import
implementations in the future.
2025-05-15 16:13:15 +00:00
Vlad Lazar
2621ce2daf pageserver: checkpoint import progress in the storage controller (#11862)
## Problem

Timeline imports do not have progress checkpointing. Any time that the
tenant is shut-down, all progress is lost
and the import restarts from the beginning when the tenant is
re-attached.

## Summary of changes

This PR adds progress checkpointing.


### Preliminaries

The **unit of work** is a `ChunkProcessingJob`. Each
`ChunkProcessingJob` deals with the import for a set of key ranges. The
job split is done by using an estimation of how many pages each job will
produce.

The planning stage must be **pure**: given a fixed set of contents in
the import bucket, it will always yield the same plan. This property is
enforced by checking that the hash of the plan is identical when
resuming from a checkpoint.

The storage controller tracks the progress of each shard in the import
in the database in the form of the **latest
job** that has has completed.

### Flow

This is the high level flow for the happy path:
1. On the first run of the import task, the import task queries storcon
for the progress and sees that none is recorded.
2. Execute the preparatory stage of the import
3. Import jobs start running concurrently in a `FuturesOrdered`. Every
time the checkpointing threshold of jobs has been reached, notify the
storage controller.
4. Tenant is detached and re-attached
5. Import task starts up again and gets the latest progress checkpoint
from the storage controller in the form of a job index.
6. The plan is computed again and we check that the hash matches with
the original plan.
7. Jobs are spawned from where the previous import task left off. Note
that we will not report progress after the completion of each job, so
some jobs might run twice.

Closes https://github.com/neondatabase/neon/issues/11568
Closes https://github.com/neondatabase/neon/issues/11664
2025-05-15 13:18:22 +00:00
Vlad Lazar
a703cd342b storage_controller: enforce generations in import upcalls (#11900)
## Problem

Import up-calls did not enforce the usage of the latest generation. The
import might have finished in one previous generation, but not in the
latest one. Hence, the controller might try to activate a timeline
before it is ready. In theory, that would be fine, but it's tricky to
reason about.

## Summary of Changes

Pageserver provides the current generation in the upcall to the storage
controller and the later validates the generation. If the generation is
stale, we return an error which stops progress of the import job. Note
that the import job will retry the upcall until the stale location is
detached.

I'll add some proper tests for this as part of the [checkpointing
PR](https://github.com/neondatabase/neon/pull/11862).

Closes https://github.com/neondatabase/neon/issues/11884
2025-05-15 10:02:11 +00:00
Alexander Bayandin
42e4cf18c9 CI(neon_extra_builds): fix workflow syntax (#11932)
## Problem

```
Error when evaluating 'strategy' for job 'build-pgxn'. neondatabase/neon/.github/workflows/build-macos.yml@7907a9e2bf898f3d22b98d9d4d2c6ffc4d480fc3 (Line: 45, Col: 27): Matrix vector 'postgres-version' does not contain any values
```
See
https://github.com/neondatabase/neon/actions/runs/15039594216/job/42268015127?pr=11929

## Summary of changes
- Fix typo: `.chnages` -> `.changes`
- Ensure JSON is JSON by moving step output to env variable
2025-05-15 09:53:59 +00:00
Alex Chi Z.
9e5a41a342 fix(scrubber): remote_storage error causes layers to be deleted as orphans (#11924)
## Problem

close https://github.com/neondatabase/neon/issues/11159 ; we get
occasional wrong deletions of layer files being used and errors in
staging. This patch fixed it.

Example errors:

```
Timeline metadata errors: ["index_part.json contains a layer .... (shard 0000) that is not present in remote storage (layer_is_l0: false) with error: Failed to download a remote file: s3 head object\n\nCaused by:\n    0: dispatch failure\n    1: timeout\n    2: error trying to connect: HTTP connect timeout occurred after 3.1s\n
```

This error should not be fired because the file could exist, but we
cannot know if it exists due to head request failure.

## Summary of changes

Only generate cannot find layer errors when the head_object return type
is `NotFound`.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-15 07:02:16 +00:00
Konstantin Knizhnik
48b870bc07 Use unlogged build in GIST for storing root page (#11892)
## Problem

See https://github.com/neondatabase/neon/issues/11891

Newly added assert is first when root page of GIST index is written to
the disk as part of sorted build.

## Summary of changes

Wrap writing of root page in unlogged build.

https://github.com/neondatabase/postgres/pull/632
https://github.com/neondatabase/postgres/pull/633
https://github.com/neondatabase/postgres/pull/634

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-15 04:45:22 +00:00
Christian Schwarz
32a12783fd pageserver: batching & concurrent IO: update binary-built-in defaults; reduce CI matrix (#11923)
Use the current production config for batching & concurrent IO.

Remove the permutation testing for unit tests from CI.
(The pageserver unit test matrix takes ~10min for debug builds).

Drive-by-fix use of `if cfg!(test)` inside crate `pageserver_api`.
It is ineffective for early-enabling new defaults for pageserver unit
tests only.
The reason is that the `test` cfg is only set for the crate under test
but not its dependencies.
So, `cargo test -p pageserver` will build `pageserver_api` with
`cfg!(test) == false`.
Resort to checking for feature flag `testing` instead, since all our
unit tests are run with `--feature testing`.

refs
- `scattered-lsn` batching has been implemented and rolled out in all
envs, cf https://github.com/neondatabase/neon/issues/10765
- preliminary for https://github.com/neondatabase/neon/pull/10466
- epic https://github.com/neondatabase/neon/issues/9377
- epic https://github.com/neondatabase/neon/issues/9378
- drive-by fix
https://neondb.slack.com/archives/C0277TKAJCA/p1746821515504219
2025-05-14 16:30:21 +00:00
a-masterov
68120cfa31 Fix Cloud Extensions Regression (#11907)
## Problem
The regression test on extensions relied on the admin API to set the
default endpoint settings, which is not stable and requires admin
privileges. Specifically:
- The workflow was using `default_endpoint_settings` to configure
necessary PostgreSQL settings like `DateStyle`, `TimeZone`, and
`neon.allow_unstable_extensions`
- This approach was failing because the API endpoint for setting
`default_endpoint_settings` was changed (referenced in a comment as
issue #27108)
- The admin API requires special privileges.
## Summary of changes
We get rid of the admin API dependency and use ALTER DATABASE statements
instead:
**Removed the default_endpoint_settings mechanism:**
- Removed the default_endpoint_settings input parameter from the
neon-project-create action
- Removed the API call that was attempting to set these settings at the
project level
- Completely removed the default_endpoint_settings configuration from
the cloud-extensions workflow
**Added database-level settings:**
- Created a new `alter_db.sh` script that applies the same settings
directly to each test database
- Modified all extension test scripts to call this script after database
creation
2025-05-14 13:19:53 +00:00
Alex Chi Z.
a8e652d47e rfc: add bottommost garbage-collection compaction (#8425)
Add the RFC for bottommost garbage-collection compaction

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-05-14 09:25:57 +00:00
Alex Chi Z.
81fd652151 fix(pageserver): use better estimation for compaction memory usage (#11904)
## Problem

Hopefully resolves `test_gc_feedback` flakiness.

## Summary of changes

`accumulated_values` should not exceed 512MB to avoid OOM. Previously we
only use number of items, which is not a good estimation.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-14 08:32:55 +00:00
Elizabeth Murray
d47e88e353 Update the pgrag version in the compute dockerfile. (#11867)
## Problem

The extensions test are hanging because of pgrag. The new version of
pgrag contains a fix for the hang.

## Summary of changes
2025-05-14 07:00:59 +00:00
Vlad Lazar
045ae13e06 pageserver: make imports work with tenant shut downs (#11855)
## Problem

Lifetime of imported timelines (and implicitly the import background
task) has some shortcomings:
1. Timeline activation upon import completion is tricky. Previously, a
timeline that finished importing
after a tenant detach would not get activated and there's concerns about
the safety of activating
concurrently with shut-down.
2. Import jobs can prevent tenant shut down since they hold the tenant
gate

## Summary of Changes

Track the import tasks in memory and abort them explicitly on tenant
shutdown.

Integrate more closely with the storage controller:
1. When an import task has finished all of its jobs, it notifies the
storage controller, but **does not** mark the import as done in the
index_part. When all shards have finished importing, the storage
controller will call the `/activate_post_import` idempotent endpoint for
all of them. The handler, marks the import complete in index part,
resets the tenant if required and checks if the timeline is active yet.
2. Not directly related, but the import job now gets the starting state
from the storage controller instead of the import bucket. This paves the
way for progress checkpointing.

Related: https://github.com/neondatabase/neon/issues/11568
2025-05-13 17:49:49 +00:00
Folke Behrens
234c882a07 proxy: Expose handlers for cpu and heap profiling (#11912)
## Problem

It's difficult to understand where proxy spends most of cpu and memory.

## Summary of changes

Expose cpu and heap profiling handlers for continuous profiling.

neondatabase/cloud#22670
2025-05-13 14:58:37 +00:00
Konstantin Knizhnik
290369061f Check prefetch result in DEBUG_COMPARE_LOCAL mode (#11502)
## Problem

Prefetched and LFC results are not checked in DEBUG_COMPARE_LOCAL mode

## Summary of changes

Add check for this results as well.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-13 14:13:42 +00:00
Anastasia Lubennikova
25ab16ee24 chore(compute): Postgres 17.5, 16.9, 15.13 and 14.18 (#11886)
Bump all minor versions. 

the only conflict was
src/backend/storage/smgr/smgr.c in v17
where our smgr changes conflicted with

ee578921b6
but it was trivial to resolve.
2025-05-13 13:30:09 +00:00
Vlad Lazar
cfbef4d586 safekeeper: downgrade stream from future WAL log (#11909)
## Problem

1. Safekeeper selection on the pageserver side isn't very dynamic. Once
you connect to one safekeeper, you'll use that one for as long as the
safekeeper keeps the connection alive. In principle, we could be more
eager, since the wal receiver connection can be cancelled but we don't
do that. We wait until the "session" is done and then we pick a new SK.
2. Picking a new SK is quite conservative. We will switch if: 
a. We haven't received anything from the SK within the last 10 seconds
(wal_connect_timeout) or
b. The candidate SK is 1GiB ahead or
c. The candidate SK is in the same AZ as the PS or d. There's a
candidate that is ahead and we've not had any WAL within the last 10
seconds (lagging_wal_timeout)

Hence, we can end up with pageservers that are requesting WAL which
their safekeeper hasn't seen yet.

## Summary of changes

Downgrade warning log to info.
2025-05-13 13:02:25 +00:00
Alex Chi Z.
34a42b00ca feat(pageserver): add PostHog lite client (#11821)
## Problem

part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

Add a lite PostHog client that only uses the local flag evaluation
functionality. Added a test case that parses an example feature flag and
gets the evaluation result.

TODO: support boolean flag, remote config; implement all operators in
PostHog.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-13 09:49:14 +00:00
Alex Chi Z.
a9979620c5 fix(remote_storage): continue on Azure+AWS retryable error (#11903)
## Problem

We implemented the retry logic in AWS S3 but not in Azure. Therefore, if
there is an error during Azure listing, we will return an Err to the
caller, and the stream will end without fetching more tenants.

Part of https://github.com/neondatabase/neon/issues/11159

Without this fix, listing tenant will stop once we hit an error (could
be network errors -- that happens more frequent on Azure). If we happen
to stop at a point that we only listed part of the shards, we will hit
the "missed shards" error or even remove layers being used.

This bug (for Azure listing) was introduced as part of
https://github.com/neondatabase/neon/pull/9840

There is also a bug that stops the stream for AWS when there's a timeout
-- this is fixed along with this patch.

## Summary of changes

Retry the request on error. In the future, we should make such streams
return something like `Result<Result<T>>` where the outer result is the
error that ends the stream and the inner one is the error that should be
retried by the caller.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-13 08:53:35 +00:00
Conrad Ludgate
a113c48c43 proxy: fix redis batching support (#11905)
## Problem

For `StoreCancelKey`, we were inserting 2 commands, but we were not
inserting two replies. This mismatch leads to errors when decoding the
response.

## Summary of changes

Abstract the command + reply pipeline so that commands and replies are
registered at the same time.
2025-05-13 08:33:53 +00:00
122 changed files with 14859 additions and 1099 deletions

View File

@@ -49,10 +49,6 @@ inputs:
description: 'A JSON object with project settings'
required: false
default: '{}'
default_endpoint_settings:
description: 'A JSON object with the default endpoint settings'
required: false
default: '{}'
outputs:
dsn:
@@ -139,21 +135,6 @@ runs:
-H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
-d "{\"scheduling\": \"Essential\"}"
fi
# XXX
# This is a workaround for the default endpoint settings, which currently do not allow some settings in the public API.
# https://github.com/neondatabase/cloud/issues/27108
if [[ -n ${DEFAULT_ENDPOINT_SETTINGS} && ${DEFAULT_ENDPOINT_SETTINGS} != "{}" ]] ; then
PROJECT_DATA=$(curl -X GET \
"https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/projects/${project_id}" \
-H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
-d "{\"scheduling\": \"Essential\"}"
)
NEW_DEFAULT_ENDPOINT_SETTINGS=$(echo ${PROJECT_DATA} | jq -rc ".project.default_endpoint_settings + ${DEFAULT_ENDPOINT_SETTINGS}")
curl -X POST --fail \
"https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/projects/${project_id}/default_endpoint_settings" \
-H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
--data "${NEW_DEFAULT_ENDPOINT_SETTINGS}"
fi
env:
@@ -171,4 +152,3 @@ runs:
PSQL: ${{ inputs.psql_path }}
LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}
PROJECT_SETTINGS: ${{ inputs.project_settings }}
DEFAULT_ENDPOINT_SETTINGS: ${{ inputs.default_endpoint_settings }}

View File

@@ -279,18 +279,14 @@ jobs:
# run all non-pageserver tests
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E '!package(pageserver)'
# run pageserver tests with different settings
for get_vectored_concurrent_io in sequential sidecar-task; do
for io_engine in std-fs tokio-epoll-uring ; do
for io_mode in buffered direct direct-rw ; do
NEON_PAGESERVER_UNIT_TEST_GET_VECTORED_CONCURRENT_IO=$get_vectored_concurrent_io \
NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=$io_engine \
NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IO_MODE=$io_mode \
${cov_prefix} \
cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(pageserver)'
done
done
done
# run pageserver tests
# (When developing new pageserver features gated by config fields, we commonly make the rust
# unit tests sensitive to an environment variable NEON_PAGESERVER_UNIT_TEST_FEATURENAME.
# Then run the nextest invocation below for all relevant combinations. Singling out the
# pageserver tests from non-pageserver tests cuts down the time it takes for this CI step.)
NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=tokio-epoll-uring \
${cov_prefix} \
cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(pageserver)'
# Run separate tests for real S3
export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty
@@ -405,8 +401,6 @@ jobs:
CHECK_ONDISK_DATA_COMPATIBILITY: nonempty
BUILD_TAG: ${{ inputs.build-tag }}
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
PAGESERVER_GET_VECTORED_CONCURRENT_IO: sidecar-task
PAGESERVER_VIRTUAL_FILE_IO_MODE: direct-rw
USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}
# Temporary disable this step until we figure out why it's so flaky

View File

@@ -323,8 +323,6 @@ jobs:
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
TEST_RESULT_CONNSTR: "${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}"
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
PAGESERVER_GET_VECTORED_CONCURRENT_IO: sidecar-task
PAGESERVER_VIRTUAL_FILE_IO_MODE: direct-rw
SYNC_BETWEEN_TESTS: true
# XXX: no coverage data handling here, since benchmarks are run on release builds,
# while coverage is currently collected for the debug ones
@@ -965,7 +963,7 @@ jobs:
fi
- name: Verify docker-compose example and test extensions
timeout-minutes: 20
timeout-minutes: 60
env:
TAG: >-
${{

View File

@@ -35,7 +35,7 @@ jobs:
matrix:
pg-version: [16, 17]
runs-on: [ self-hosted, small ]
runs-on: us-east-2
container:
# We use the neon-test-extensions image here as it contains the source code for the extensions.
image: ghcr.io/neondatabase/neon-test-extensions-v${{ matrix.pg-version }}:latest
@@ -71,20 +71,7 @@ jobs:
region_id: ${{ inputs.region_id || 'aws-us-east-2' }}
postgres_version: ${{ matrix.pg-version }}
project_settings: ${{ steps.project-settings.outputs.settings }}
# We need these settings to get the expected output results.
# We cannot use the environment variables e.g. PGTZ due to
# https://github.com/neondatabase/neon/issues/1287
default_endpoint_settings: >
{
"pg_settings": {
"DateStyle": "Postgres,MDY",
"TimeZone": "America/Los_Angeles",
"compute_query_id": "off",
"neon.allow_unstable_extensions": "on"
}
}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }}
- name: Run the regression tests
run: /run-tests.sh -r /ext-src

View File

@@ -33,9 +33,10 @@ jobs:
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
pg-version: [16]
runs-on: us-east-2
#runs-on: us-east-2
runs-on: small
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
@@ -59,6 +60,13 @@ jobs:
run: |
cd "vendor/postgres-v${PG_VERSION}"
patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}.patch"
patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}_ha_495.patch"
cd src/test/regress/data
#mv onek.data onek.data.tmp
#mv tenk.data tenk.data.tmp
#awk 'BEGIN {OFS="\t";n=1000} {for(i=0;i<n;i++){ print $1+i*1000, $2+i*1000, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16 }}' <onek.data.tmp > onek.data
#awk 'BEGIN {OFS="\t";n=10} {for(i=0;i<n;i++){ print $1+i*10000, $2+i*10000, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16 }}' <tenk.data.tmp > tenk.data
- name: Generate a random password
id: pwgen

View File

@@ -63,8 +63,10 @@ jobs:
- name: Filter out only v-string for build matrix
id: postgres_changes
env:
CHANGES: ${{ steps.files_changed.outputs.changes }}
run: |
v_strings_only_as_json_array=$(echo ${{ steps.files_changed.outputs.chnages }} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)
v_strings_only_as_json_array=$(echo ${CHANGES} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)
echo "changes=${v_strings_only_as_json_array}" | tee -a "${GITHUB_OUTPUT}"
check-macos-build:

53
Cargo.lock generated
View File

@@ -1112,6 +1112,12 @@ version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "cfg_aliases"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724"
[[package]]
name = "cgroups-rs"
version = "0.3.3"
@@ -1306,7 +1312,7 @@ dependencies = [
"itertools 0.10.5",
"jsonwebtoken",
"metrics",
"nix 0.27.1",
"nix 0.30.1",
"notify",
"num_cpus",
"once_cell",
@@ -1429,7 +1435,7 @@ dependencies = [
"humantime-serde",
"hyper 0.14.30",
"jsonwebtoken",
"nix 0.27.1",
"nix 0.30.1",
"once_cell",
"pageserver_api",
"pageserver_client",
@@ -3512,9 +3518,9 @@ dependencies = [
[[package]]
name = "libc"
version = "0.2.169"
version = "0.2.172"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b5aba8db14291edd000dfcc4d620c7ebfb122c613afb886ca8803fa4e128a20a"
checksum = "d750af042f7ef4f724306de029d18836c26c1765a54a6a3f094cbd23a7267ffa"
[[package]]
name = "libloading"
@@ -3788,6 +3794,16 @@ version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e5ce46fe64a9d73be07dcbe690a38ce1b293be448fd8ce1e6c1b8062c9f72c6a"
[[package]]
name = "neon-shmem"
version = "0.1.0"
dependencies = [
"nix 0.30.1",
"tempfile",
"thiserror 1.0.69",
"workspace_hack",
]
[[package]]
name = "never-say-never"
version = "6.6.666"
@@ -3821,12 +3837,13 @@ dependencies = [
[[package]]
name = "nix"
version = "0.27.1"
version = "0.30.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2eb04e9c688eff1c89d72b407f168cf79bb9e867a9d3323ed6c01519eb9cc053"
checksum = "74523f3a35e05aba87a1d978330aef40f67b0304ac79c1c00b294c9830543db6"
dependencies = [
"bitflags 2.8.0",
"cfg-if",
"cfg_aliases",
"libc",
"memoffset 0.9.0",
]
@@ -4280,7 +4297,7 @@ dependencies = [
"jsonwebtoken",
"md5",
"metrics",
"nix 0.27.1",
"nix 0.30.1",
"num-traits",
"num_cpus",
"once_cell",
@@ -4331,6 +4348,7 @@ dependencies = [
"toml_edit",
"tracing",
"tracing-utils",
"twox-hash",
"url",
"utils",
"uuid",
@@ -4355,7 +4373,7 @@ dependencies = [
"humantime",
"humantime-serde",
"itertools 0.10.5",
"nix 0.27.1",
"nix 0.30.1",
"once_cell",
"postgres_backend",
"postgres_ffi",
@@ -4848,6 +4866,19 @@ dependencies = [
"workspace_hack",
]
[[package]]
name = "posthog_client_lite"
version = "0.1.0"
dependencies = [
"anyhow",
"reqwest",
"serde",
"serde_json",
"sha2",
"thiserror 1.0.69",
"workspace_hack",
]
[[package]]
name = "powerfmt"
version = "0.2.0"
@@ -7885,7 +7916,7 @@ dependencies = [
"humantime",
"jsonwebtoken",
"metrics",
"nix 0.27.1",
"nix 0.30.1",
"once_cell",
"pem",
"pin-project-lite",
@@ -8439,8 +8470,10 @@ dependencies = [
"fail",
"form_urlencoded",
"futures-channel",
"futures-core",
"futures-executor",
"futures-io",
"futures-task",
"futures-util",
"generic-array",
"getrandom 0.2.11",
@@ -8459,6 +8492,7 @@ dependencies = [
"log",
"memchr",
"nix 0.26.4",
"nix 0.30.1",
"nom",
"num",
"num-bigint",
@@ -8470,6 +8504,7 @@ dependencies = [
"once_cell",
"p256 0.13.2",
"parquet",
"percent-encoding",
"prettyplease",
"proc-macro2",
"prost 0.13.3",

View File

@@ -23,9 +23,11 @@ members = [
"libs/postgres_ffi",
"libs/safekeeper_api",
"libs/desim",
"libs/neon-shmem",
"libs/utils",
"libs/consumption_metrics",
"libs/postgres_backend",
"libs/posthog_client_lite",
"libs/pq_proto",
"libs/tenant_size_model",
"libs/metrics",
@@ -126,7 +128,7 @@ md5 = "0.7.0"
measured = { version = "0.0.22", features=["lasso"] }
measured-process = { version = "0.0.22" }
memoffset = "0.9"
nix = { version = "0.27", features = ["dir", "fs", "process", "socket", "signal", "poll"] }
nix = { version = "0.30.1", features = ["dir", "fs", "mman", "process", "socket", "signal", "poll"] }
# Do not update to >= 7.0.0, at least. The update will have a significant impact
# on compute startup metrics (start_postgres_ms), >= 25% degradation.
notify = "6.0.0"

View File

@@ -292,7 +292,7 @@ WORKDIR /home/nonroot
# Rust
# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)
ENV RUSTC_VERSION=1.86.0
ENV RUSTC_VERSION=1.87.0
ENV RUSTUP_HOME="/home/nonroot/.rustup"
ENV PATH="/home/nonroot/.cargo/bin:${PATH}"
ARG RUSTFILT_VERSION=0.2.1

View File

@@ -1117,8 +1117,8 @@ RUN wget https://github.com/microsoft/onnxruntime/archive/refs/tags/v1.18.1.tar.
mkdir onnxruntime-src && cd onnxruntime-src && tar xzf ../onnxruntime.tar.gz --strip-components=1 -C . && \
echo "#nothing to test here" > neon-test.sh
RUN wget https://github.com/neondatabase-labs/pgrag/archive/refs/tags/v0.1.1.tar.gz -O pgrag.tar.gz && \
echo "087b2ecd11ba307dc968042ef2e9e43dc04d9ba60e8306e882c407bbe1350a50 pgrag.tar.gz" | sha256sum --check && \
RUN wget https://github.com/neondatabase-labs/pgrag/archive/refs/tags/v0.1.2.tar.gz -O pgrag.tar.gz && \
echo "7361654ea24f08cbb9db13c2ee1c0fe008f6114076401bb871619690dafc5225 pgrag.tar.gz" | sha256sum --check && \
mkdir pgrag-src && cd pgrag-src && tar xzf ../pgrag.tar.gz --strip-components=1 -C .
FROM rust-extensions-build-pgrx14 AS pgrag-build

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,129 @@
diff --git a/src/test/regress/sql/box.sql b/src/test/regress/sql/box.sql
index 249636c76c3..540c2b54dda 100644
--- a/src/test/regress/sql/box.sql
+++ b/src/test/regress/sql/box.sql
@@ -196,7 +196,7 @@ CREATE TABLE quad_box_tbl (id int, b box);
INSERT INTO quad_box_tbl
SELECT (x - 1) * 100 + y, box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
- FROM generate_series(1, 95 * 100) x,
+ FROM generate_series(1, 100) x,
generate_series(1, 95 * 100) y;
-- insert repeating data to test allTheSame
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 3ca8a2d6090..a8e40f906c4 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -533,7 +533,7 @@ create temp table prtx2_3 partition of prtx2 for values from (21) to (31);
insert into prtx1 select 1 + i%30, i, i
from generate_series(1, 95 * 1000) i;
insert into prtx2 select 1 + i%30, i, i
- from generate_series(1, 95 * 500) i, generate_series(1, 95 * 10) j;
+ from generate_series(1, 500) i, generate_series(1, 95 * 10) j;
create index on prtx2 (b);
create index on prtx2 (c);
analyze prtx1;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 82ac39d5dc8..bef0a891ade 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1274,9 +1274,9 @@ select
case c when 0 then null else 3 end,
case d when 0 then null else 4 end
from
- generate_series(0, 95 * 1) a,
- generate_series(0, 95 * 1) b,
- generate_series(0, 95 * 1) c,
+ generate_series(0, 1) a,
+ generate_series(0, 1) b,
+ generate_series(0, 1) c,
generate_series(0, 95 * 1) d;
-- Ensure partition pruning works correctly for each combination of IS NULL
diff --git a/src/test/regress/sql/polygon.sql b/src/test/regress/sql/polygon.sql
index d39a2b4e8f8..2d862985510 100644
--- a/src/test/regress/sql/polygon.sql
+++ b/src/test/regress/sql/polygon.sql
@@ -42,7 +42,7 @@ CREATE TABLE quad_poly_tbl (id int, p polygon);
INSERT INTO quad_poly_tbl
SELECT (x - 1) * 100 + y, polygon(circle(point(x * 10, y * 10), 1 + (x + y) % 10))
- FROM generate_series(1, 95 * 100) x,
+ FROM generate_series(1, 100) x,
generate_series(1, 95 * 100) y;
INSERT INTO quad_poly_tbl
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index b51d6c405c2..4138418c7a6 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -314,13 +314,13 @@ select count(*) from test_range_gist where ir -|- int4multirange(int4range(100,2
create table test_range_spgist(ir int4range);
create index test_range_spgist_idx on test_range_spgist using spgist (ir);
-insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 95 * 2000) g;
-insert into test_range_spgist select 'empty'::int4range from generate_series(1, 95 * 500) g;
-insert into test_range_spgist select int4range(g, g+10000) from generate_series(1, 95 * 1000) g;
-insert into test_range_spgist select 'empty'::int4range from generate_series(1, 95 * 500) g;
-insert into test_range_spgist select int4range(NULL,g*10,'(]') from generate_series(1, 95 * 100) g;
-insert into test_range_spgist select int4range(g*10,NULL,'(]') from generate_series(1, 95 * 100) g;
-insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 95 * 2000) g;
+insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 0.1 * 95 * 2000) g;
+insert into test_range_spgist select 'empty'::int4range from generate_series(1, 0.1 * 95 * 500) g;
+insert into test_range_spgist select int4range(g, g+10000) from generate_series(1, 0.1 * 95 * 1000) g;
+insert into test_range_spgist select 'empty'::int4range from generate_series(1, 0.1 * 95 * 500) g;
+insert into test_range_spgist select int4range(NULL,g*10,'(]') from generate_series(1, 0.1 * 95 * 100) g;
+insert into test_range_spgist select int4range(g*10,NULL,'(]') from generate_series(1, 0.1 * 95 * 100) g;
+insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 0.1 * 95 * 2000) g;
-- first, verify non-indexed results
SET enable_seqscan = t;
diff --git a/src/test/regress/sql/spgist.sql b/src/test/regress/sql/spgist.sql
index 0c4f24e1d49..61e53375539 100644
--- a/src/test/regress/sql/spgist.sql
+++ b/src/test/regress/sql/spgist.sql
@@ -16,9 +16,9 @@ vacuum spgist_point_tbl;
-- Insert more data, to make the index a few levels deep.
insert into spgist_point_tbl (id, p)
-select g, point(g*10, g*10) from generate_series(1, 95 * 10000) g;
+select g, point(g*10, g*10) from generate_series(1, 0.1 * 95 * 10000) g;
insert into spgist_point_tbl (id, p)
-select g+100000, point(g*10+1, g*10+1) from generate_series(1, 95 * 10000) g;
+select g+100000, point(g*10+1, g*10+1) from generate_series(1, 0.1 * 95 * 10000) g;
-- To test vacuum, delete some entries from all over the index.
delete from spgist_point_tbl where id % 2 = 1;
@@ -37,8 +37,8 @@ vacuum spgist_point_tbl;
create table spgist_box_tbl(id serial, b box);
insert into spgist_box_tbl(b)
select box(point(i,j),point(i+s,j+s))
- from generate_series(1, 95 * 100,5) i,
- generate_series(1, 95 * 100,5) j,
+ from generate_series(1,100,5) i,
+ generate_series(1,100,5) j,
generate_series(1, 95 * 10) s;
create index spgist_box_idx on spgist_box_tbl using spgist (b);
@@ -86,6 +86,6 @@ create unlogged table spgist_unlogged_tbl(id serial, b box);
create index spgist_unlogged_idx on spgist_unlogged_tbl using spgist (b);
insert into spgist_unlogged_tbl(b)
select box(point(i,j))
- from generate_series(1, 95 * 100,5) i,
+ from generate_series(1,100,5) i,
generate_series(1, 95 * 10,5) j;
-- leave this table around, to help in testing dump/restore
diff --git a/src/test/regress/sql/tuplesort.sql b/src/test/regress/sql/tuplesort.sql
index fa762f26ac7..7a1fd619eba 100644
--- a/src/test/regress/sql/tuplesort.sql
+++ b/src/test/regress/sql/tuplesort.sql
@@ -276,7 +276,7 @@ ROLLBACK;
CREATE TEMP TABLE test_mark_restore(col1 int, col2 int, col12 int);
-- need a few duplicates for mark/restore to matter
INSERT INTO test_mark_restore(col1, col2, col12)
- SELECT a.i, b.i, a.i * b.i FROM generate_series(1, 95 * 500) a(i), generate_series(1, 95 * 5) b(i);
+ SELECT a.i, b.i, a.i * b.i FROM generate_series(1, 500) a(i), generate_series(1, 95 * 5) b(i);
BEGIN;

View File

@@ -0,0 +1,593 @@
diff --git a/src/test/regress/sql/box.sql b/src/test/regress/sql/box.sql
index 249636c76c3..540c2b54dda 100644
--- a/src/test/regress/sql/box.sql
+++ b/src/test/regress/sql/box.sql
@@ -196,7 +196,7 @@ CREATE TABLE quad_box_tbl (id int, b box);
INSERT INTO quad_box_tbl
SELECT (x - 1) * 100 + y, box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
- FROM generate_series(1, 95 * 100) x,
+ FROM generate_series(1, 100) x,
generate_series(1, 95 * 100) y;
-- insert repeating data to test allTheSame
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index 39d3cd7821a..86efbb72609 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -476,7 +476,7 @@ CREATE TABLE brintest_3 (a text, b text, c text, d text);
-- long random strings (~2000 chars each, so ~6kB for min/max on two
-- columns) to trigger toasting
-WITH rand_value AS (SELECT string_agg(fipshash(i::text),'') AS val FROM generate_series(1, 95 * 60) s(i))
+WITH rand_value AS (SELECT string_agg(fipshash(i::text),'') AS val FROM generate_series(1,60) s(i))
INSERT INTO brintest_3
SELECT val, val, val, val FROM rand_value;
@@ -495,7 +495,7 @@ VACUUM brintest_3;
-- retry insert with a different random-looking (but deterministic) value
-- the value is different, and so should replace either min or max in the
-- brin summary
-WITH rand_value AS (SELECT string_agg(fipshash((-i)::text),'') AS val FROM generate_series(1, 95 * 60) s(i))
+WITH rand_value AS (SELECT string_agg(fipshash((-i)::text),'') AS val FROM generate_series(1,60) s(i))
INSERT INTO brintest_3
SELECT val, val, val, val FROM rand_value;
diff --git a/src/test/regress/sql/brin_multi.sql b/src/test/regress/sql/brin_multi.sql
index b7f7a9e8803..b1a109fe07f 100644
--- a/src/test/regress/sql/brin_multi.sql
+++ b/src/test/regress/sql/brin_multi.sql
@@ -612,7 +612,7 @@ CREATE TABLE brin_date_test(a DATE);
INSERT INTO brin_date_test SELECT '4713-01-01 BC'::date + i FROM generate_series(1, 95 * 30) s(i);
-- insert values close to date minimum
-INSERT INTO brin_date_test SELECT '5874897-12-01'::date + i FROM generate_series(1, 95 * 30) s(i);
+INSERT INTO brin_date_test SELECT '5874897-12-01'::date + i FROM generate_series(1, 30) s(i);
CREATE INDEX ON brin_date_test USING brin (a date_minmax_multi_ops) WITH (pages_per_range=1);
diff --git a/src/test/regress/sql/btree_index.sql b/src/test/regress/sql/btree_index.sql
index d0d86db1667..88a752264a0 100644
--- a/src/test/regress/sql/btree_index.sql
+++ b/src/test/regress/sql/btree_index.sql
@@ -267,7 +267,7 @@ VACUUM delete_test_table;
--
-- The vacuum above should've turned the leaf page into a fast root. We just
-- need to insert some rows to cause the fast root page to split.
-INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM generate_series(1, 95 * 1000) i;
+INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM generate_series(1,1000) i;
-- Test unsupported btree opclass parameters
create index on btree_tall_tbl (id int4_ops(foo=1));
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index 13006372064..1fd4cbfa7ef 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -47,7 +47,7 @@ DEALLOCATE select1;
-- (temporarily hide query, to avoid the long CREATE TABLE stmt)
\set ECHO none
SELECT 'CREATE TABLE extra_wide_table(firstc text, '|| array_to_string(array_agg('c'||i||' bool'),',')||', lastc text);'
-FROM generate_series(1, 95 * 1100) g(i)
+FROM generate_series(1, 1100) g(i)
\gexec
\set ECHO all
INSERT INTO extra_wide_table(firstc, lastc) VALUES('first col', 'last col');
@@ -74,7 +74,7 @@ CREATE TABLE default_expr_agg (a int DEFAULT (avg(1)));
-- invalid use of subquery
CREATE TABLE default_expr_agg (a int DEFAULT (select 1));
-- invalid use of set-returning function
-CREATE TABLE default_expr_agg (a int DEFAULT (generate_series(1, 95 * 3)));
+CREATE TABLE default_expr_agg (a int DEFAULT (generate_series(1,3)));
-- Verify that subtransaction rollback restores rd_createSubid.
BEGIN;
@@ -359,7 +359,7 @@ CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
FOR VALUES FROM ((select 1)) TO ('2019-01-01');
CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
- FOR VALUES FROM (generate_series(1, 95 * 3)) TO ('2019-01-01');
+ FOR VALUES FROM (generate_series(1, 3)) TO ('2019-01-01');
-- trying to specify list for range partitioned table
CREATE TABLE fail_part PARTITION OF range_parted FOR VALUES IN ('a');
diff --git a/src/test/regress/sql/fast_default.sql b/src/test/regress/sql/fast_default.sql
index 28fefad6fe6..7d7060820e4 100644
--- a/src/test/regress/sql/fast_default.sql
+++ b/src/test/regress/sql/fast_default.sql
@@ -318,7 +318,7 @@ CREATE TABLE T (pk INT NOT NULL PRIMARY KEY);
SELECT set('t');
-INSERT INTO T SELECT * FROM generate_series(1, 95 * 10) a;
+INSERT INTO T SELECT * FROM generate_series(1, 10) a;
ALTER TABLE T ADD COLUMN c_bigint BIGINT NOT NULL DEFAULT -1;
@@ -326,7 +326,7 @@ INSERT INTO T SELECT b, b - 10 FROM generate_series(11, 20) a(b);
ALTER TABLE T ADD COLUMN c_text TEXT DEFAULT 'hello';
-INSERT INTO T SELECT b, b - 10, (b + 10)::text FROM generate_series(21, 30) a(b);
+INSERT INTO T SELECT b, b - 10, (b + 10)::text FROM generate_series(21, 95 * 30) a(b);
-- WHERE clause
SELECT c_bigint, c_text FROM T WHERE c_bigint = -1 LIMIT 1;
diff --git a/src/test/regress/sql/hash_index.sql b/src/test/regress/sql/hash_index.sql
index fcd5f91a39f..6ac90c57730 100644
--- a/src/test/regress/sql/hash_index.sql
+++ b/src/test/regress/sql/hash_index.sql
@@ -220,7 +220,7 @@ SELECT h.seqno AS f20000
CREATE TABLE hash_split_heap (keycol INT);
INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 95 * 500) a;
CREATE INDEX hash_split_index on hash_split_heap USING HASH (keycol);
-INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 95 * 5000) a;
+INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, POW(95, 0.5) * 5000) a;
-- Let's do a backward scan.
BEGIN;
@@ -236,7 +236,7 @@ END;
-- DELETE, INSERT, VACUUM.
DELETE FROM hash_split_heap WHERE keycol = 1;
-INSERT INTO hash_split_heap SELECT a/2 FROM generate_series(1, 95 * 25000) a;
+INSERT INTO hash_split_heap SELECT a/2 FROM generate_series(1, POW(95, 0.5) * 25000) a;
VACUUM hash_split_heap;
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 3920a9528ae..d6ce372d799 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -551,14 +551,14 @@ SELECT to_timestamp('2011-12-18 11:38 +01:xyz', 'YYYY-MM-DD HH12:MI OF'); -- er
SELECT to_timestamp('2018-11-02 12:34:56.025', 'YYYY-MM-DD HH24:MI:SS.MS');
SELECT i, to_timestamp('2018-11-02 12:34:56', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('2018-11-02 12:34:56.1', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('2018-11-02 12:34:56.12', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('2018-11-02 12:34:56.123', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('2018-11-02 12:34:56.1234', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('2018-11-02 12:34:56.12345', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
+SELECT i, to_timestamp('2018-11-02 12:34:56.1', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i;
+SELECT i, to_timestamp('2018-11-02 12:34:56.12', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i;
+SELECT i, to_timestamp('2018-11-02 12:34:56.123', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i;
+SELECT i, to_timestamp('2018-11-02 12:34:56.1234', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i;
+SELECT i, to_timestamp('2018-11-02 12:34:56.12345', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 6) i;
SELECT i, to_timestamp('2018-11-02 12:34:56.123456', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
SELECT i, to_timestamp('2018-11-02 12:34:56.123456789', 'YYYY-MM-DD HH24:MI:SS.FF' || i) FROM generate_series(1, 95 * 6) i;
-SELECT i, to_timestamp('20181102123456123456', 'YYYYMMDDHH24MISSFF' || i) FROM generate_series(1, 95 * 6) i;
+SELECT i, to_timestamp('20181102123456123456', 'YYYYMMDDHH24MISSFF' || i) FROM generate_series(1, 6) i;
SELECT to_date('1 4 1902', 'Q MM YYYY'); -- Q is ignored
SELECT to_date('3 4 21 01', 'W MM CC YY');
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 96c19fa5297..276f6d25c67 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -742,7 +742,7 @@ create table inhcld1(f2 name, f1 int primary key);
create table inhcld2(f1 int primary key, f2 name);
alter table inhpar attach partition inhcld1 for values from (1) to (5);
alter table inhpar attach partition inhcld2 for values from (5) to (100);
-insert into inhpar select x, x::text from generate_series(1, 95 * 10) x;
+insert into inhpar select x, x::text from generate_series(1,10) x;
explain (verbose, costs off)
update inhpar i set (f1, f2) = (select i.f1, i.f2 || '-' from int4_tbl limit 1);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index c9fdd126d15..bbbda3d6237 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -320,8 +320,8 @@ create table part_ee_ff3_2 partition of part_ee_ff3 for values from (25) to (30)
truncate list_parted;
insert into list_parted values ('aa'), ('cc');
-insert into list_parted select 'Ff', s.a from generate_series(1, 95 * 29) s(a);
-insert into list_parted select 'gg', s.a from generate_series(1, 95 * 9) s(a);
+insert into list_parted select 'Ff', s.a from generate_series(1, 29) s(a);
+insert into list_parted select 'gg', s.a from generate_series(1, 9) s(a);
insert into list_parted (b) values (1);
select tableoid::regclass::text, a, min(b) as min_b, max(b) as max_b from list_parted group by 1, 2 order by 1;
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index 47abc031c0f..34c4d8c1312 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -310,9 +310,9 @@ rollback to settings;
-- Exercise rescans. We'll turn off parallel_leader_participation so
-- that we can check that instrumentation comes back correctly.
-create table join_foo as select generate_series(1, 95 * 3) as id, 'xxxxx'::text as t;
+create table join_foo as select generate_series(1, POW(95, 0.5) * 3) as id, 'xxxxx'::text as t;
alter table join_foo set (parallel_workers = 0);
-create table join_bar as select generate_series(1, 95 * 10000) as id, 'xxxxx'::text as t;
+create table join_bar as select generate_series(1, POW(95, 0.5) * 10000) as id, 'xxxxx'::text as t;
alter table join_bar set (parallel_workers = 2);
-- multi-batch with rescan, parallel-oblivious
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
index b60271d9400..7d89c85179f 100644
--- a/src/test/regress/sql/merge.sql
+++ b/src/test/regress/sql/merge.sql
@@ -1457,7 +1457,7 @@ CREATE TABLE pa_source (sid integer, delta float)
-- insert many rows to the source table
INSERT INTO pa_source SELECT id, id * 10 FROM generate_series(1, 95 * 14) AS id;
-- insert a few rows in the target table (odd numbered tid)
-INSERT INTO pa_target SELECT '2017-01-31', id, id * 100, 'initial' FROM generate_series(1, 95 * 9,3) AS id;
+INSERT INTO pa_target SELECT '2017-01-31', id, id * 100, 'initial' FROM generate_series(1,9,3) AS id;
INSERT INTO pa_target SELECT '2017-02-28', id, id * 100, 'initial' FROM generate_series(2,9,3) AS id;
-- try simple MERGE
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 53a9b26d4c4..0c48dd2be78 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -13,7 +13,7 @@ CREATE TABLE prt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_p1 PARTITION OF prt1 FOR VALUES FROM (0) TO (250);
CREATE TABLE prt1_p3 PARTITION OF prt1 FOR VALUES FROM (500) TO (600);
CREATE TABLE prt1_p2 PARTITION OF prt1 FOR VALUES FROM (250) TO (500);
-INSERT INTO prt1 SELECT i, i % 25, to_char(i, 'FM0000') FROM generate_series(0, 95 * 599) i WHERE i % 2 = 0;
+INSERT INTO prt1 SELECT i, i % 25, to_char(i, 'FM0000') FROM generate_series(0,599) i WHERE i % 2 = 0;
CREATE INDEX iprt1_p1_a on prt1_p1(a);
CREATE INDEX iprt1_p2_a on prt1_p2(a);
CREATE INDEX iprt1_p3_a on prt1_p3(a);
@@ -23,7 +23,7 @@ CREATE TABLE prt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
CREATE TABLE prt2_p1 PARTITION OF prt2 FOR VALUES FROM (0) TO (250);
CREATE TABLE prt2_p2 PARTITION OF prt2 FOR VALUES FROM (250) TO (500);
CREATE TABLE prt2_p3 PARTITION OF prt2 FOR VALUES FROM (500) TO (600);
-INSERT INTO prt2 SELECT i % 25, i, to_char(i, 'FM0000') FROM generate_series(0, 95 * 599) i WHERE i % 3 = 0;
+INSERT INTO prt2 SELECT i % 25, i, to_char(i, 'FM0000') FROM generate_series(0,599) i WHERE i % 3 = 0;
CREATE INDEX iprt2_p1_b on prt2_p1(b);
CREATE INDEX iprt2_p2_b on prt2_p2(b);
CREATE INDEX iprt2_p3_b on prt2_p3(b);
@@ -149,7 +149,7 @@ CREATE TABLE prt1_e (a int, b int, c int) PARTITION BY RANGE(((a + b)/2));
CREATE TABLE prt1_e_p1 PARTITION OF prt1_e FOR VALUES FROM (0) TO (250);
CREATE TABLE prt1_e_p2 PARTITION OF prt1_e FOR VALUES FROM (250) TO (500);
CREATE TABLE prt1_e_p3 PARTITION OF prt1_e FOR VALUES FROM (500) TO (600);
-INSERT INTO prt1_e SELECT i, i, i % 25 FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt1_e SELECT i, i, i % 25 FROM generate_series(0, 599, 2) i;
CREATE INDEX iprt1_e_p1_ab2 on prt1_e_p1(((a+b)/2));
CREATE INDEX iprt1_e_p2_ab2 on prt1_e_p2(((a+b)/2));
CREATE INDEX iprt1_e_p3_ab2 on prt1_e_p3(((a+b)/2));
@@ -159,7 +159,7 @@ CREATE TABLE prt2_e (a int, b int, c int) PARTITION BY RANGE(((b + a)/2));
CREATE TABLE prt2_e_p1 PARTITION OF prt2_e FOR VALUES FROM (0) TO (250);
CREATE TABLE prt2_e_p2 PARTITION OF prt2_e FOR VALUES FROM (250) TO (500);
CREATE TABLE prt2_e_p3 PARTITION OF prt2_e FOR VALUES FROM (500) TO (600);
-INSERT INTO prt2_e SELECT i, i, i % 25 FROM generate_series(0, 95 * 599, 3) i;
+INSERT INTO prt2_e SELECT i, i, i % 25 FROM generate_series(0, 599, 3) i;
ANALYZE prt2_e;
EXPLAIN (COSTS OFF)
@@ -248,14 +248,14 @@ CREATE TABLE prt1_m (a int, b int, c int) PARTITION BY RANGE(a, ((a + b)/2));
CREATE TABLE prt1_m_p1 PARTITION OF prt1_m FOR VALUES FROM (0, 0) TO (250, 250);
CREATE TABLE prt1_m_p2 PARTITION OF prt1_m FOR VALUES FROM (250, 250) TO (500, 500);
CREATE TABLE prt1_m_p3 PARTITION OF prt1_m FOR VALUES FROM (500, 500) TO (600, 600);
-INSERT INTO prt1_m SELECT i, i, i % 25 FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt1_m SELECT i, i, i % 25 FROM generate_series(0, 599, 2) i;
ANALYZE prt1_m;
CREATE TABLE prt2_m (a int, b int, c int) PARTITION BY RANGE(((b + a)/2), b);
CREATE TABLE prt2_m_p1 PARTITION OF prt2_m FOR VALUES FROM (0, 0) TO (250, 250);
CREATE TABLE prt2_m_p2 PARTITION OF prt2_m FOR VALUES FROM (250, 250) TO (500, 500);
CREATE TABLE prt2_m_p3 PARTITION OF prt2_m FOR VALUES FROM (500, 500) TO (600, 600);
-INSERT INTO prt2_m SELECT i, i, i % 25 FROM generate_series(0, 95 * 599, 3) i;
+INSERT INTO prt2_m SELECT i, i, i % 25 FROM generate_series(0, 599, 3) i;
ANALYZE prt2_m;
EXPLAIN (COSTS OFF)
@@ -269,14 +269,14 @@ CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE plt1_p3 PARTITION OF plt1 FOR VALUES IN ('0006', '0007', '0008', '0011');
-INSERT INTO plt1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO plt1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE plt1;
CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE plt2_p3 PARTITION OF plt2 FOR VALUES IN ('0006', '0007', '0008', '0011');
-INSERT INTO plt2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 95 * 599, 3) i;
+INSERT INTO plt2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 3) i;
ANALYZE plt2;
--
@@ -286,7 +286,7 @@ CREATE TABLE plt1_e (a int, b int, c text) PARTITION BY LIST(ltrim(c, 'A'));
CREATE TABLE plt1_e_p1 PARTITION OF plt1_e FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE plt1_e_p2 PARTITION OF plt1_e FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE plt1_e_p3 PARTITION OF plt1_e FOR VALUES IN ('0006', '0007', '0008', '0011');
-INSERT INTO plt1_e SELECT i, i, 'A' || to_char(i/50, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO plt1_e SELECT i, i, 'A' || to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE plt1_e;
-- test partition matching with N-way join
@@ -371,7 +371,7 @@ CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES IN ('0002', '0003');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES FROM (500) TO (600) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p3_p1 PARTITION OF prt1_l_p3 FOR VALUES FROM (0) TO (13);
CREATE TABLE prt1_l_p3_p2 PARTITION OF prt1_l_p3 FOR VALUES FROM (13) TO (25);
-INSERT INTO prt1_l SELECT i, i % 25, to_char(i % 4, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt1_l SELECT i, i % 25, to_char(i % 4, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE prt1_l;
CREATE TABLE prt2_l (a int, b int, c varchar) PARTITION BY RANGE(b);
@@ -382,7 +382,7 @@ CREATE TABLE prt2_l_p2_p2 PARTITION OF prt2_l_p2 FOR VALUES IN ('0002', '0003');
CREATE TABLE prt2_l_p3 PARTITION OF prt2_l FOR VALUES FROM (500) TO (600) PARTITION BY RANGE (a);
CREATE TABLE prt2_l_p3_p1 PARTITION OF prt2_l_p3 FOR VALUES FROM (0) TO (13);
CREATE TABLE prt2_l_p3_p2 PARTITION OF prt2_l_p3 FOR VALUES FROM (13) TO (25);
-INSERT INTO prt2_l SELECT i % 25, i, to_char(i % 4, 'FM0000') FROM generate_series(0, 95 * 599, 3) i;
+INSERT INTO prt2_l SELECT i % 25, i, to_char(i % 4, 'FM0000') FROM generate_series(0, 599, 3) i;
ANALYZE prt2_l;
-- inner join, qual covering only top-level partitions
@@ -453,27 +453,27 @@ WHERE EXISTS (
CREATE TABLE prt1_n (a int, b int, c varchar) PARTITION BY RANGE(c);
CREATE TABLE prt1_n_p1 PARTITION OF prt1_n FOR VALUES FROM ('0000') TO ('0250');
CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500');
-INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(0, 95 * 499, 2) i;
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(0, 499, 2) i;
ANALYZE prt1_n;
CREATE TABLE prt2_n (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE prt2_n_p1 PARTITION OF prt2_n FOR VALUES IN ('0000', '0003', '0004', '0010', '0006', '0007');
CREATE TABLE prt2_n_p2 PARTITION OF prt2_n FOR VALUES IN ('0001', '0005', '0002', '0009', '0008', '0011');
-INSERT INTO prt2_n SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt2_n SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE prt2_n;
CREATE TABLE prt3_n (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE prt3_n_p1 PARTITION OF prt3_n FOR VALUES IN ('0000', '0004', '0006', '0007');
CREATE TABLE prt3_n_p2 PARTITION OF prt3_n FOR VALUES IN ('0001', '0002', '0008', '0010');
CREATE TABLE prt3_n_p3 PARTITION OF prt3_n FOR VALUES IN ('0003', '0005', '0009', '0011');
-INSERT INTO prt2_n SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt2_n SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE prt3_n;
CREATE TABLE prt4_n (a int, b int, c text) PARTITION BY RANGE(a);
CREATE TABLE prt4_n_p1 PARTITION OF prt4_n FOR VALUES FROM (0) TO (300);
CREATE TABLE prt4_n_p2 PARTITION OF prt4_n FOR VALUES FROM (300) TO (500);
CREATE TABLE prt4_n_p3 PARTITION OF prt4_n FOR VALUES FROM (500) TO (600);
-INSERT INTO prt4_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(0, 95 * 599, 2) i;
+INSERT INTO prt4_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE prt4_n;
-- partitionwise join can not be applied if the partition ranges differ
@@ -533,7 +533,7 @@ create temp table prtx2_3 partition of prtx2 for values from (21) to (31);
insert into prtx1 select 1 + i%30, i, i
from generate_series(1, 95 * 1000) i;
insert into prtx2 select 1 + i%30, i, i
- from generate_series(1, 95 * 500) i, generate_series(1, 95 * 10) j;
+ from generate_series(1, 500) i, generate_series(1, 95 * 10) j;
create index on prtx2 (b);
create index on prtx2 (c);
analyze prtx1;
@@ -1202,7 +1202,7 @@ CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
-- insert data
-INSERT INTO fract_t (id) (SELECT generate_series(0, 95 * 1999));
+INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
ANALYZE fract_t;
-- verify plan; nested index only scans
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 82ac39d5dc8..6a0c7a3666d 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -512,7 +512,7 @@ create table list_part2 partition of list_part for values in (2);
create table list_part3 partition of list_part for values in (3);
create table list_part4 partition of list_part for values in (4);
-insert into list_part select generate_series(1, 95 * 4);
+insert into list_part select generate_series(1, 4);
begin;
@@ -940,7 +940,7 @@ create table ma_test (a int, b int) partition by range (a);
create table ma_test_p1 partition of ma_test for values from (0) to (10);
create table ma_test_p2 partition of ma_test for values from (10) to (20);
create table ma_test_p3 partition of ma_test for values from (20) to (30);
-insert into ma_test select x,x from generate_series(0, 95 * 29) t(x);
+insert into ma_test select x,x from generate_series(0,29) t(x);
create index on ma_test (b);
analyze ma_test;
@@ -1263,7 +1263,7 @@ create table hp_prefix_test (a int, b int, c int, d int)
-- create 8 partitions
select 'create table hp_prefix_test_p' || x::text || ' partition of hp_prefix_test for values with (modulus 8, remainder ' || x::text || ');'
-from generate_series(0, 95 * 7) x;
+from generate_series(0, 7) x;
\gexec
-- insert 16 rows, one row for each test to perform.
@@ -1274,9 +1274,9 @@ select
case c when 0 then null else 3 end,
case d when 0 then null else 4 end
from
- generate_series(0, 95 * 1) a,
- generate_series(0, 95 * 1) b,
- generate_series(0, 95 * 1) c,
+ generate_series(0, 1) a,
+ generate_series(0, 1) b,
+ generate_series(0, 1) c,
generate_series(0, 95 * 1) d;
-- Ensure partition pruning works correctly for each combination of IS NULL
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index d18cc331561..435d3d718e1 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -4581,12 +4581,12 @@ CREATE TRIGGER transition_table_level2_ri_child_upd_trigger
-- create initial test data
INSERT INTO transition_table_level1 (level1_no)
- SELECT generate_series(1, 95 * 200);
+ SELECT generate_series(1,200);
ANALYZE transition_table_level1;
INSERT INTO transition_table_level2 (level2_no, parent_no)
SELECT level2_no, level2_no / 50 + 1 AS parent_no
- FROM generate_series(1, 95 * 9999) level2_no;
+ FROM generate_series(1,9999) level2_no;
ANALYZE transition_table_level2;
INSERT INTO transition_table_status (level, node_no, status)
diff --git a/src/test/regress/sql/polygon.sql b/src/test/regress/sql/polygon.sql
index d39a2b4e8f8..2d862985510 100644
--- a/src/test/regress/sql/polygon.sql
+++ b/src/test/regress/sql/polygon.sql
@@ -42,7 +42,7 @@ CREATE TABLE quad_poly_tbl (id int, p polygon);
INSERT INTO quad_poly_tbl
SELECT (x - 1) * 100 + y, polygon(circle(point(x * 10, y * 10), 1 + (x + y) % 10))
- FROM generate_series(1, 95 * 100) x,
+ FROM generate_series(1, 100) x,
generate_series(1, 95 * 100) y;
INSERT INTO quad_poly_tbl
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 12c40039b18..e08b0aee00e 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -187,7 +187,7 @@ select 'drop table gexec_test', 'select ''2000-01-01''::date as party_over'
prepare q as select array_to_string(array_agg(repeat('x',2*n)),E'\n') as "ab
c", array_to_string(array_agg(repeat('y',20-2*n)),E'\n') as "a
-bc" from generate_series(1, 95 * 10) as n(n) group by n>1 order by n>1;
+bc" from generate_series(1,10) as n(n) group by n>1 order by n>1;
\pset linestyle ascii
@@ -304,7 +304,7 @@ execute q;
deallocate q;
-- test single-line header and data
-prepare q as select repeat('x',2*n) as "0123456789abcdef", repeat('y',20-2*n) as "0123456789" from generate_series(1, 95 * 10) as n;
+prepare q as select repeat('x',2*n) as "0123456789abcdef", repeat('y',20-2*n) as "0123456789" from generate_series(1,10) as n;
\pset linestyle ascii
@@ -1220,7 +1220,7 @@ create table child_10_20 partition of parent_tab
for values from (10) to (20);
create table child_20_30 partition of parent_tab
for values from (20) to (30);
-insert into parent_tab values (generate_series(0, 95 * 29));
+insert into parent_tab values (generate_series(0,29));
create table child_30_40 partition of parent_tab
for values from (30) to (40)
partition by range(id);
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index b51d6c405c2..a2d50d7bb43 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -314,13 +314,13 @@ select count(*) from test_range_gist where ir -|- int4multirange(int4range(100,2
create table test_range_spgist(ir int4range);
create index test_range_spgist_idx on test_range_spgist using spgist (ir);
-insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 95 * 2000) g;
-insert into test_range_spgist select 'empty'::int4range from generate_series(1, 95 * 500) g;
-insert into test_range_spgist select int4range(g, g+10000) from generate_series(1, 95 * 1000) g;
-insert into test_range_spgist select 'empty'::int4range from generate_series(1, 95 * 500) g;
-insert into test_range_spgist select int4range(NULL,g*10,'(]') from generate_series(1, 95 * 100) g;
-insert into test_range_spgist select int4range(g*10,NULL,'(]') from generate_series(1, 95 * 100) g;
-insert into test_range_spgist select int4range(g, g+10) from generate_series(1, 95 * 2000) g;
+insert into test_range_spgist select int4range(g, g+10) from generate_series(1, POW(95, 0.5)::int * 2000) g;
+insert into test_range_spgist select 'empty'::int4range from generate_series(1, POW(95, 0.5)::int * 500) g;
+insert into test_range_spgist select int4range(g, g+10000) from generate_series(1, POW(95, 0.5)::int * 1000) g;
+insert into test_range_spgist select 'empty'::int4range from generate_series(1, POW(95, 0.5)::int * 500) g;
+insert into test_range_spgist select int4range(NULL,g*10,'(]') from generate_series(1, POW(95, 0.5)::int * 100) g;
+insert into test_range_spgist select int4range(g*10,NULL,'(]') from generate_series(1, POW(95, 0.5)::int * 100) g;
+insert into test_range_spgist select int4range(g, g+10) from generate_series(1, POW(95, 0.5)::int * 2000) g;
-- first, verify non-indexed results
SET enable_seqscan = t;
diff --git a/src/test/regress/sql/spgist.sql b/src/test/regress/sql/spgist.sql
index 0c4f24e1d49..ed9f7c45411 100644
--- a/src/test/regress/sql/spgist.sql
+++ b/src/test/regress/sql/spgist.sql
@@ -16,9 +16,9 @@ vacuum spgist_point_tbl;
-- Insert more data, to make the index a few levels deep.
insert into spgist_point_tbl (id, p)
-select g, point(g*10, g*10) from generate_series(1, 95 * 10000) g;
+select g, point(g*10, g*10) from generate_series(1, POW(95, 0.5) * 10000) g;
insert into spgist_point_tbl (id, p)
-select g+100000, point(g*10+1, g*10+1) from generate_series(1, 95 * 10000) g;
+select g+100000, point(g*10+1, g*10+1) from generate_series(1, POW(95, 0.5) * 10000) g;
-- To test vacuum, delete some entries from all over the index.
delete from spgist_point_tbl where id % 2 = 1;
@@ -37,8 +37,8 @@ vacuum spgist_point_tbl;
create table spgist_box_tbl(id serial, b box);
insert into spgist_box_tbl(b)
select box(point(i,j),point(i+s,j+s))
- from generate_series(1, 95 * 100,5) i,
- generate_series(1, 95 * 100,5) j,
+ from generate_series(1,100,5) i,
+ generate_series(1,100,5) j,
generate_series(1, 95 * 10) s;
create index spgist_box_idx on spgist_box_tbl using spgist (b);
@@ -86,6 +86,6 @@ create unlogged table spgist_unlogged_tbl(id serial, b box);
create index spgist_unlogged_idx on spgist_unlogged_tbl using spgist (b);
insert into spgist_unlogged_tbl(b)
select box(point(i,j))
- from generate_series(1, 95 * 100,5) i,
+ from generate_series(1,100,5) i,
generate_series(1, 95 * 10,5) j;
-- leave this table around, to help in testing dump/restore
diff --git a/src/test/regress/sql/tuplesort.sql b/src/test/regress/sql/tuplesort.sql
index 133491a0d70..0642902ad53 100644
--- a/src/test/regress/sql/tuplesort.sql
+++ b/src/test/regress/sql/tuplesort.sql
@@ -19,7 +19,7 @@ INSERT INTO abbrev_abort_uuids (abort_increasing, abort_decreasing, noabort_incr
('00000000-0000-0000-0000-'||to_char(20000 - g.i, '000000000000FM'))::uuid abort_decreasing,
(to_char(g.i % 10009, '00000000FM')||'-0000-0000-0000-'||to_char(g.i, '000000000000FM'))::uuid noabort_increasing,
(to_char(((20000 - g.i) % 10009), '00000000FM')||'-0000-0000-0000-'||to_char(20000 - g.i, '000000000000FM'))::uuid noabort_decreasing
- FROM generate_series(0, 95 * 20000, 1) g(i);
+ FROM generate_series(0, 20000, 1) g(i);
-- and a few NULLs
INSERT INTO abbrev_abort_uuids(id) VALUES(0);
@@ -276,7 +276,7 @@ ROLLBACK;
CREATE TEMP TABLE test_mark_restore(col1 int, col2 int, col12 int);
-- need a few duplicates for mark/restore to matter
INSERT INTO test_mark_restore(col1, col2, col12)
- SELECT a.i, b.i, a.i * b.i FROM generate_series(1, 95 * 500) a(i), generate_series(1, 95 * 5) b(i);
+ SELECT a.i, b.i, a.i * b.i FROM generate_series(1, 500) a(i), generate_series(1, 95 * 5) b(i);
BEGIN;
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index e4ad5c274fe..e1894d2d9cc 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -494,7 +494,7 @@ MERGE INTO rw_view2 t
SELECT * FROM base_tbl ORDER BY a;
MERGE INTO rw_view2 t
- USING (SELECT x, 'r'||x FROM generate_series(0, 95 * 2) x) AS s(a,b) ON t.a = s.a
+ USING (SELECT x, 'r'||x FROM generate_series(0,2) x) AS s(a,b) ON t.a = s.a
WHEN MATCHED THEN UPDATE SET b = s.b
WHEN NOT MATCHED AND s.a > 0 THEN INSERT VALUES (s.a, s.b)
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET b = 'Not matched by source'
@@ -519,7 +519,7 @@ MERGE INTO rw_view2 t
WHEN MATCHED THEN UPDATE SET b = s.b
WHEN NOT MATCHED AND s.a > 0 THEN INSERT VALUES (s.a, s.b); -- should fail
MERGE INTO rw_view2 t
- USING (SELECT x, 'R'||x FROM generate_series(0, 95 * 3) x) AS s(a,b) ON t.a = s.a
+ USING (SELECT x, 'R'||x FROM generate_series(0,3) x) AS s(a,b) ON t.a = s.a
WHEN MATCHED THEN UPDATE SET b = s.b
WHEN NOT MATCHED AND s.a > 0 THEN INSERT VALUES (s.a, s.b); -- ok
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 6a2f5815ab2..a63cf5cd12c 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -156,7 +156,7 @@ CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
-- Use uncompressed data stored in toast.
CREATE INDEX no_index_cleanup_idx ON no_index_cleanup(t);
ALTER TABLE no_index_cleanup ALTER COLUMN t SET STORAGE EXTERNAL;
-INSERT INTO no_index_cleanup(i, t) VALUES (generate_series(1, 95 * 30),
+INSERT INTO no_index_cleanup(i, t) VALUES (generate_series(1,30),
repeat('1234567890',269));
-- index cleanup option is ignored if VACUUM FULL
VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup;

View File

@@ -7,7 +7,7 @@ index 255e616..1c6edb7 100644
RelationGetRelationName(index));
+#ifdef NEON_SMGR
+ smgr_start_unlogged_build(index->rd_smgr);
+ smgr_start_unlogged_build(RelationGetSmgr(index));
+#endif
+
initRumState(&buildstate.rumstate, index);
@@ -18,7 +18,7 @@ index 255e616..1c6edb7 100644
rumUpdateStats(index, &buildstate.buildStats, buildstate.rumstate.isBuild);
+#ifdef NEON_SMGR
+ smgr_finish_unlogged_build_phase_1(index->rd_smgr);
+ smgr_finish_unlogged_build_phase_1(RelationGetSmgr(index));
+#endif
+
/*
@@ -29,7 +29,7 @@ index 255e616..1c6edb7 100644
}
+#ifdef NEON_SMGR
+ smgr_end_unlogged_build(index->rd_smgr);
+ smgr_end_unlogged_build(RelationGetSmgr(index));
+#endif
+
/*

View File

@@ -14,7 +14,7 @@
use std::ffi::OsStr;
use std::io::Write;
use std::os::unix::prelude::AsRawFd;
use std::os::fd::AsFd;
use std::os::unix::process::CommandExt;
use std::path::Path;
use std::process::Command;
@@ -356,7 +356,7 @@ where
let file = pid_file::claim_for_current_process(&path).expect("claim pid file");
// Remove the FD_CLOEXEC flag on the pidfile descriptor so that the pidfile
// remains locked after exec.
nix::fcntl::fcntl(file.as_raw_fd(), FcntlArg::F_SETFD(FdFlag::empty()))
nix::fcntl::fcntl(file.as_fd(), FcntlArg::F_SETFD(FdFlag::empty()))
.expect("remove FD_CLOEXEC");
// Don't run drop(file), it would close the file before we actually exec.
std::mem::forget(file);

View File

@@ -8,7 +8,6 @@
use std::borrow::Cow;
use std::collections::{BTreeSet, HashMap};
use std::fs::File;
use std::os::fd::AsRawFd;
use std::path::PathBuf;
use std::process::exit;
use std::str::FromStr;
@@ -31,7 +30,7 @@ use control_plane::safekeeper::SafekeeperNode;
use control_plane::storage_controller::{
NeonStorageControllerStartArgs, NeonStorageControllerStopArgs, StorageController,
};
use nix::fcntl::{FlockArg, flock};
use nix::fcntl::{Flock, FlockArg};
use pageserver_api::config::{
DEFAULT_HTTP_LISTEN_PORT as DEFAULT_PAGESERVER_HTTP_PORT,
DEFAULT_PG_LISTEN_PORT as DEFAULT_PAGESERVER_PG_PORT,
@@ -749,16 +748,16 @@ struct TimelineTreeEl {
/// A flock-based guard over the neon_local repository directory
struct RepoLock {
_file: File,
_file: Flock<File>,
}
impl RepoLock {
fn new() -> Result<Self> {
let repo_dir = File::open(local_env::base_path())?;
let repo_dir_fd = repo_dir.as_raw_fd();
flock(repo_dir_fd, FlockArg::LockExclusive)?;
Ok(Self { _file: repo_dir })
match Flock::lock(repo_dir, FlockArg::LockExclusive) {
Ok(f) => Ok(Self { _file: f }),
Err((_, e)) => Err(e).context("flock error"),
}
}
}

View File

@@ -0,0 +1,8 @@
#!/bin/bash
# We need these settings to get the expected output results.
# We cannot use the environment variables e.g. PGTZ due to
# https://github.com/neondatabase/neon/issues/1287
export DATABASE=${1:-contrib_regression}
psql -c "ALTER DATABASE ${DATABASE} SET neon.allow_unstable_extensions='on'" \
-c "ALTER DATABASE ${DATABASE} SET DateStyle='Postgres,MDY'" \
-c "ALTER DATABASE ${DATABASE} SET TimeZone='America/Los_Angeles'" \

View File

@@ -18,6 +18,7 @@ TESTS=${TESTS/row_level_security/}
TESTS=${TESTS/sqli_connection/}
dropdb --if-exist contrib_regression
createdb contrib_regression
. ../alter_db.sh
psql -v ON_ERROR_STOP=1 -f test/fixtures.sql -d contrib_regression
${REGRESS} --use-existing --dbname=contrib_regression --inputdir=${TESTDIR} ${TESTS}

View File

@@ -3,6 +3,7 @@ set -ex
cd "$(dirname "${0}")"
dropdb --if-exist contrib_regression
createdb contrib_regression
. ../alter_db.sh
psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag"
PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --load-extension=vector --load-extension=rag --dbname=contrib_regression basic_functions text_processing api_keys chunking_functions document_processing embedding_api_functions voyageai_functions

View File

@@ -20,5 +20,6 @@ installcheck: regression-test
regression-test:
dropdb --if-exists contrib_regression
createdb contrib_regression
../alter_db.sh
psql -d contrib_regression -c "CREATE EXTENSION $(EXTNAME)"
$(PG_REGRESS) --inputdir=. --outputdir=. --use-existing --dbname=contrib_regression $(REGRESS)

View File

@@ -3,6 +3,7 @@ set -ex
cd "$(dirname ${0})"
dropdb --if-exist contrib_regression
createdb contrib_regression
. ../alter_db.sh
PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
REGRESS="$(make -n installcheck | awk '{print substr($0,index($0,"init-extension"));}')"
REGRESS="${REGRESS/startup_perms/}"

View File

@@ -11,5 +11,6 @@ PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
installcheck:
dropdb --if-exists contrib_regression
createdb contrib_regression
../alter_db.sh
psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag_bge_small_en_v15"
$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)

View File

@@ -11,5 +11,6 @@ PG_REGRESS := $(dir $(PGXS))../../src/test/regress/pg_regress
installcheck:
dropdb --if-exists contrib_regression
createdb contrib_regression
../alter_db.sh
psql -d contrib_regression -c "CREATE EXTENSION vector" -c "CREATE EXTENSION rag_jina_reranker_v1_tiny_en"
$(PG_REGRESS) --use-existing --dbname=contrib_regression $(REGRESS)

View File

@@ -3,5 +3,6 @@ set -ex
cd "$(dirname ${0})"
dropdb --if-exist contrib_regression
createdb contrib_regression
. ../alter_db.sh
PG_REGRESS=$(dirname "$(pg_config --pgxs)")/../test/regress/pg_regress
${PG_REGRESS} --inputdir=./ --bindir='/usr/local/pgsql/bin' --use-existing --dbname=contrib_regression rum rum_hash ruminv timestamp orderby orderby_hash altorder altorder_hash limits int2 int4 int8 float4 float8 money oid time timetz date interval macaddr inet cidr text varchar char bytea bit varbit numeric rum_weight expr array

View File

@@ -0,0 +1,194 @@
# Bottommost Garbage-Collection Compaction
## Summary
The goal of this doc is to propose a way to reliably collect garbages below the GC horizon. This process is called bottom-most garbage-collect-compaction, and is part of the broader legacy-enhanced compaction that we plan to implement in the future.
## Motivation
The current GC algorithm will wait until the covering via image layers before collecting the garbages of a key region. Relying on image layer generation to generate covering images is not reliable. There are prior arts to generate feedbacks from the GC algorithm to the image generation process to accelerate garbage collection, but it slows down the system and creates write amplification.
# Basic Idea
![](images/036-bottom-most-gc-compaction/01-basic-idea.svg)
The idea of bottom-most compaction is simple: we rewrite all layers that are below or intersect with the GC horizon to produce a flat level of image layers at the GC horizon and deltas above the GC horizon. In this process,
- All images and deltas ≤ GC horizon LSN will be dropped. This process collects garbages.
- We produce images for all keys involved in the compaction process at the GC horizon.
Therefore, it can precisely collect all garbages below the horizon, and reduce the space amplification, i.e., in the staircase pattern (test_gc_feedback).
![The staircase pattern in test_gc_feedback in the original compaction algorithm. The goal is to collect garbage below the red horizontal line.](images/036-bottom-most-gc-compaction/12-staircase-test-gc-feedback.png)
The staircase pattern in test_gc_feedback in the original compaction algorithm. The goal is to collect garbage below the red horizontal line.
# Branches
With branches, the bottom-most compaction should retain a snapshot of the keyspace at the `retain_lsn` so that the child branch can access data at the branch point. This requires some modifications to the basic bottom-most compaction algorithm that we sketched above.
![](images/036-bottom-most-gc-compaction/03-retain-lsn.svg)
## Single Timeline w/ Snapshots: handle `retain_lsn`
First lets look into the case where we create branches over the main branch but dont write any data to them (aka “snapshots”).
The bottom-most compaction algorithm collects all deltas and images of a key and can make decisions on what data to retain. Given that we have a single keys history as below:
```
LSN 0x10 -> A
LSN 0x20 -> append B
retain_lsn: 0x20
LSN 0x30 -> append C
LSN 0x40 -> append D
retain_lsn: 0x40
LSN 0x50 -> append E
GC horizon: 0x50
LSN 0x60 -> append F
```
The algorithm will produce:
```
LSN 0x20 -> AB
(drop all history below the earliest retain_lsn)
LSN 0x40 -> ABCD
(assume the cost of replaying 2 deltas is higher than storing the full image, we generate an image here)
LSN 0x50 -> append E
(replay one delta is cheap)
LSN 0x60 -> append F
(keep everything as-is above the GC horizon)
```
![](images/036-bottom-most-gc-compaction/05-btmgc-parent.svg)
What happens is that we balance the space taken by each retain_lsn and the cost of replaying deltas during the bottom-most compaction process. This is controlled by a threshold. If `count(deltas) < $threshold`, the deltas will be retained. Otherwise, an image will be generated and the deltas will be dropped.
In the example above, the `$threshold` is 2.
## Child Branches with data: pull + partial images
In the previous section we have shown how bottom-most compaction respects `retain_lsn` so that all data that was readable at branch creation remains readable. But branches can have data on their own, and that data can fall out of the branchs PITR window. So, this section explains how we deal with that.
We will run the same bottom-most compaction for these branches, to ensure the space amplification on the child branch is reasonable.
```
branch_lsn: 0x20
LSN 0x30 -> append P
LSN 0x40 -> append Q
LSN 0x50 -> append R
GC horizon: 0x50
LSN 0x60 -> append S
```
Note that bottom-most compaction happens on a per-timeline basis. When it processes this key, it only reads the history from LSN 0x30 without a base image. Therefore, on child branches, the bottom-most compaction process will make image creation decisions based on the same `count(deltas) < $threshold` criteria, and if it decides to create an image, the base image will be retrieved from the ancestor branch.
```
branch_lsn: 0x20
LSN 0x50 -> ABPQR
(we pull the image at LSN 0x20 from the ancestor branch to get AB, and then apply append PQ to the page; we replace the record at 0x40 with an image and drop the delta)
GC horizon: 0x50
LSN 0x60 -> append S
```
![](images/036-bottom-most-gc-compaction/06-btmgc-child.svg)
Note that for child branches, we do not create image layers for the images when bottom-most compaction runs. Instead, we drop the 0x30/0x40/0x50 delta records and directly place the image ABPQR@0x50 into the delta layer, which serves as a sparse image layer. For child branches, if we create image layers, we will need to put all keys in the range into the image layer. This causes space bloat and slow compactions. In this proposal, the compaction process will only compact and process keys modified inside the child branch.
# Result
Bottom-most compaction ensures all garbage under the GC horizon gets collected right away (compared with “eventually” in the current algorithm). Meanwhile, it generates images at each of the retain_lsn to ensure branch reads are fast. As we make per-key decisions on whether to generate an image or not, the theoretical lower bound of the storage space we need to retain for a branch is lower than before.
Before: min(sum(logs for each key), sum(image for each key)), for each partition — we always generate image layers on a key range
After: sum(min(logs for each key, image for each key))
# Compaction Trigger
The bottom-most compaction can be automatically triggered. The goal of the trigger is that it should ensure a constant factor for write amplification. Say that the user write 1GB of WAL into the system, we should write 1GB x C data to S3. The legacy compaction algorithm does not have such a constant factor C. The data we write to S3 is quadratic to the logical size of the database (see [A Theoretical View of Neon Storage](https://www.notion.so/A-Theoretical-View-of-Neon-Storage-8d7ad7555b0c41b2a3597fa780911194?pvs=21)).
We propose the following compaction trigger that generates a constant write amplification factor. Write amplification >= total writes to S3 / total user writes. We only analyze the write amplification caused by the bottom-most GC-compaction process, ignoring the legacy create image layers amplification.
Given that we have ***X*** bytes of the delta layers above the GC horizon, ***A*** bytes of the delta layers intersecting with the GC horizon, ***B*** bytes of the delta layers below the GC horizon, and ***C*** bytes of the image layers below the GC horizon.
The legacy GC + compaction loop will always keep ***A*** unchanged, reduce ***B and C*** when there are image layers covering the key range. This yields 0 write amplification (only file deletions) and extra ***B*** bytes of space.
![](images/036-bottom-most-gc-compaction/09-btmgc-analysis-2.svg)
The bottom-most compaction proposed here will split ***A*** into deltas above the GC horizon and below the GC horizon. Everything below the GC horizon will be image layers after the compaction (not considering branches). Therefore, this yields ***A+C*** extra write traffic each iteration, plus 0 extra space.
![](images/036-bottom-most-gc-compaction/07-btmgc-analysis-1.svg)
Also considering read amplification (below the GC horizon). When a read request reaches the GC horizon, the read amplification will be (A+B+C)/C=1+(A+B)/C. Reducing ***A*** and ***B*** can help reduce the read amplification below the GC horizon.
The metrics-based trigger will wait until a point that space amplification is not that large and write amplification is not that large before the compaction gets triggered. The trigger is defined as **(A+B)/C ≥ 1 (or some other ratio)**.
To reason about this trigger, consider the two cases:
**Data Ingestion**
User keeps ingesting data into the database, which indicates that WAL size roughly equals to the database logical size. The compaction gets triggered only when the newly-written WAL roughly equals to the current bottom-most image size (=X). Therefore, its triggered when the database size gets doubled. This is a reasonable amount of work. Write amplification is 2X/X=1 for the X amount of data written.
![](images/036-bottom-most-gc-compaction/10-btmgc-analysis-3.svg)
**Updates/Deletion**
In this case, WAL size will be larger than the database logical size ***D***. The compaction gets triggered for every ***D*** bytes of WAL written. Therefore, for every ***D*** bytes of WAL, we rewrite the bottom-most layer, which produces an extra ***D*** bytes of write amplification. This incurs exactly 2x write amplification (by the write of D), 1.5x write amplification (if we count from the start of the process) and no space amplification.
![](images/036-bottom-most-gc-compaction/11-btmgc-analysis-4.svg)
Note that here I try to reason that write amplification is a constant (i.e., the data we write to S3 is proportional to the data the user write). The main problem with the current legacy compaction algorithm is that write amplification is proportional to the database size.
The next step is to optimize the write amplification above the GC horizon (i.e., change the image creation criteria, top-most compaction, or introduce tiered compaction), to ensure the write amplification of the whole system is a constant factor.
20GB layers → +20GB layers → delete 20GB, need 40GB temporary space
# Sub-Compactions
The gc-compaction algorithm may take a long time and we need to split the job into multiple sub-compaction jobs.
![](images/036-bottom-most-gc-compaction/13-job-split.svg)
As in the figure, the auto-trigger schedules a compaction job covering the full keyspace below a specific LSN. In such case that we cannot finish compacting it in one run in a reasonable amount of time, the algorithm will vertically split it into multiple jobs (in this case, 5).
Each gc-compaction job will create one level of delta layers and one flat level of image layers for each LSN. Those layers will be automatically split based on size, which means that if the sub-compaction job produces 1GB of deltas, it will produce 4 * 256MB delta layers. For those layers that is not fully contained within the sub-compaction job rectangles, it will be rewritten to only contain the keys outside of the key range.
# Implementation
The main implementation of gc-compaction is in `compaction.rs`.
* `compact_with_gc`: The main loop of gc-compaction. It takes a rectangle range of the layer map and compact that specific range. It selects layers intersecting with the rectangle, downloads the layers, creates the k-merge iterator to read those layers in the key-lsn order, and decide which keys to keep or insert a reconstructed page. The process is the basic unit of a gc-compaction and is not interruptable. If the process gets preempted by L0 compaction, it has to be restarted from scratch. For layers overlaps with the rectangle but not fully inside, the main loop will also rewrite them so that the new layer (or two layers if both left and right ends are outside of the rectangle) has the same LSN range as the original one but only contain the keys outside of the compaction range.
* `gc_compaction_split_jobs`: Splits a big gc-compaction job into sub-compactions based on heuristics in the layer map. The function looks at the layer map and splits the compaction job based on the size of the layers so that each compaction job only pulls ~4GB of layer files.
* `generate_key_retention` and `KeyHistoryRetention`: Implements the algorithm described in the "basic idea" and "branch" chapter of this RFC. It takes a vector of history of a key (key-lsn-value) and decides which LSNs of the key to retain. If there are too many deltas between two retain_lsns, it will reconstruct the page and insert an image into the compaction result. Also, we implement `KeyHistoryRetention::verify` to ensure the generated result is not corrupted -- all retain_lsns and all LSNs above the gc-horizon should be accessible.
* `GcCompactionQueue`: the automatic trigger implementation for gc-compaction. `GcCompactionQueue::iteration` is called at the end of the tenant compaction loop. It will then call `trigger_auto_compaction` to decide whether to trigger a gc-compaction job for this tenant. If yes, the compaction-job will be added to the compaction queue, and the queue will be slowly drained once there are no other compaction jobs running. gc-compaction has the lowest priority. If a sub-compaction job is not successful or gets preempted by L0 compaction (see limitations for reasons why a compaction job would fail), it will _not_ be retried.
* Changes to `index_part.json`: we added a `last_completed_lsn` field to the index part for the auto-trigger to decide when to trigger a compaction.
* Changes to the read path: when gc-compaction updates the layer map, all reads need to wait. See `gc_compaction_layer_update_lock` and comments in the code path for more information.
Gc-compaction can also be scheduled over the HTTP API. Example:
```
curl 'localhost:9898/v1/tenant/:tenant_id/timeline/:timeline_id/compact?enhanced_gc_bottom_most_compaction=true&dry_run=true' -X PUT -H "Content-Type: application/json" -d '{"scheduled": true, "compact_key_range": { "start": "000000067F0000A0000002A1CF0100000000", "end": "000000067F0000A0000002A1D70100000000" } }'
```
The `dry_run` mode can be specified in the query string so that the compaction will go through all layers to estimate how much space can be saved without writing the compaction result into the layer map.
The auto-trigger is controlled by tenant-level flag `gc_compaction_enabled`. If this is set to false, no gc-compaction will be automatically scheduled on this tenant (but manual trigger still works).
# Next Steps
There are still some limitations of gc-compaction itself that needs to be resolved and tested,
- gc-compaction is currently only automatically triggered on root branches. We have not tested gc-compaction on child branches in staging.
- gc-compaction will skip aux key regions because of the possible conflict with the assumption of aux file tombstones.
- gc-compaction does not consider keyspaces at retain_lsns and only look at keys in the layers. This also causes us giving up some sub-compaction jobs because a key might have part of its history available due to traditional GC removing part of the history.
- We limit gc-compaction to run over shards <= 150GB to avoid gc-compaction taking too much time blocking other compaction jobs. The sub-compaction split algorithm needs to be improved to be able to split vertically and horizontally. Also, we need to move the download layer process out of the compaction loop so that we don't block other compaction jobs for too long.
- The compaction trigger always schedules gc-compaction from the lowest LSN to the gc-horizon. Currently we do not schedule compaction jobs that only selects layers in the middle. Allowing this could potentially reduce the number of layers read/write throughout the process.
- gc-compaction will give up if there are too many layers to rewrite or if there are not enough disk space for the compaction.
- gc-compaction sometimes fails with "no key produced during compaction", which means that all existing keys within the compaction range can be collected; but we don't have a way to write this information back to the layer map -- we cannot generate an empty image layer.
- We limit the maximum size of deltas for a single key to 512MB. If above this size, gc-compaction will give up. This can be resolved by changing `generate_key_retention` to be a stream instead of requiring to collect all the key history.
In the future,
- Top-most compaction: ensure we always have an image coverage for the latest data (or near the latest data), so that reads will be fast at the latest LSN.
- Tiered compaction on deltas: ensure read from any LSN is fast.
- Per-timeline compaction → tenant-wide compaction?

View File

@@ -0,0 +1,135 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="82 284 863 375" width="863" height="375">
<defs/>
<g id="01-basic-idea" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>01-basic-idea</title>
<rect fill="white" x="82" y="284" width="863" height="375"/>
<g id="01-basic-idea_Layer_1">
<title>Layer 1</title>
<g id="Graphic_2">
<rect x="234" y="379.5" width="203.5" height="17.5" fill="white"/>
<rect x="234" y="379.5" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_3">
<rect x="453.5" y="379.5" width="203.5" height="17.5" fill="white"/>
<rect x="453.5" y="379.5" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_4">
<rect x="672.5" y="379.5" width="203.5" height="17.5" fill="white"/>
<rect x="672.5" y="379.5" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_5">
<rect x="234" y="288.5" width="127" height="77.5" fill="white"/>
<rect x="234" y="288.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_6">
<rect x="375" y="288.5" width="127" height="77.5" fill="white"/>
<rect x="375" y="288.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_7">
<rect x="516" y="288.5" width="127" height="77.5" fill="white"/>
<rect x="516" y="288.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_8">
<rect x="657" y="288.5" width="127" height="77.5" fill="white"/>
<rect x="657" y="288.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_9">
<rect x="798" y="288.5" width="78" height="77.5" fill="white"/>
<rect x="798" y="288.5" width="78" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_11">
<line x1="185.5" y1="326.75" x2="943.7734" y2="326.75" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_12">
<text transform="translate(87 318.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_13">
<text transform="translate(106.41 372.886)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.39" y="15" xml:space="preserve">Images </tspan>
<tspan font-family="Helvetica Neue" font-size="10" fill="black" x="29132252e-19" y="28.447998" xml:space="preserve">at earlier LSN</tspan>
</text>
</g>
<g id="Graphic_14">
<text transform="translate(121.92 289.578)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8739676e-19" y="15" xml:space="preserve">Deltas</tspan>
</text>
</g>
<g id="Graphic_15">
<path d="M 517.125 423.5 L 553.375 423.5 L 553.375 482 L 571.5 482 L 535.25 512 L 499 482 L 517.125 482 Z" fill="white"/>
<path d="M 517.125 423.5 L 553.375 423.5 L 553.375 482 L 571.5 482 L 535.25 512 L 499 482 L 517.125 482 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_26">
<rect x="234" y="599.474" width="203.5" height="17.5" fill="white"/>
<rect x="234" y="599.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_25">
<rect x="453.5" y="599.474" width="203.5" height="17.5" fill="white"/>
<rect x="453.5" y="599.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_24">
<rect x="672.5" y="599.474" width="203.5" height="17.5" fill="white"/>
<rect x="672.5" y="599.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_23">
<rect x="234" y="533" width="127" height="52.974" fill="white"/>
<rect x="234" y="533" width="127" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_22">
<rect x="375" y="533" width="310.5" height="52.974" fill="white"/>
<rect x="375" y="533" width="310.5" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_21">
<rect x="702.5" y="533" width="173.5" height="52.974" fill="white"/>
<rect x="702.5" y="533" width="173.5" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_18">
<line x1="185.5" y1="607.724" x2="943.7734" y2="607.724" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_16">
<text transform="translate(121.92 538)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8739676e-19" y="15" xml:space="preserve">Deltas</tspan>
</text>
</g>
<g id="Graphic_27">
<text transform="translate(114.8 592.86)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="3488765e-18" y="15" xml:space="preserve">Images </tspan>
<tspan font-family="Helvetica Neue" font-size="10" fill="black" x="4.01" y="28.447998" xml:space="preserve">at GC LSN</tspan>
</text>
</g>
<g id="Graphic_28">
<rect x="243.06836" y="300" width="624.3633" height="17.5" fill="#c0ffc0"/>
<text transform="translate(248.06836 301.068)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.52364" y="12" xml:space="preserve">Deltas above GC Horizon</tspan>
</text>
</g>
<g id="Graphic_30">
<rect x="243.06836" y="335.5" width="624.3633" height="17.5" fill="#c0ffff"/>
<text transform="translate(248.06836 336.568)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.89414" y="12" xml:space="preserve">Deltas below GC Horizon</tspan>
</text>
</g>
<g id="Graphic_32">
<rect x="243.06836" y="550.737" width="624.3633" height="17.5" fill="#c0ffc0"/>
<text transform="translate(248.06836 551.805)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.52364" y="12" xml:space="preserve">Deltas above GC Horizon</tspan>
</text>
</g>
<g id="Graphic_33">
<rect x="304" y="630.474" width="485.5" height="28.447998" fill="#c0ffff"/>
<text transform="translate(309 637.016)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="63.095" y="12" xml:space="preserve">Deltas and image below GC Horizon gets garbage-collected</tspan>
</text>
</g>
<g id="Graphic_34">
<text transform="translate(576.5 444.0325)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="0" y="11" xml:space="preserve">WAL replay of deltas+image below GC Horizon</tspan>
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="0" y="25.336" xml:space="preserve">Reshuffle deltas</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 8.1 KiB

View File

@@ -0,0 +1,141 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-104 215 863 335" width="863" height="335">
<defs>
<marker orient="auto" overflow="visible" markerUnits="strokeWidth" id="FilledArrow_Marker" stroke-linejoin="miter" stroke-miterlimit="10" viewBox="-1 -4 10 8" markerWidth="10" markerHeight="8" color="#7f8080">
<g>
<path d="M 8 0 L 0 -3 L 0 3 Z" fill="currentColor" stroke="currentColor" stroke-width="1"/>
</g>
</marker>
</defs>
<g id="03-retain-lsn" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>03-retain-lsn</title>
<rect fill="white" x="-104" y="215" width="863" height="335"/>
<g id="03-retain-lsn_Layer_1">
<title>Layer 1</title>
<g id="Graphic_28">
<rect x="48" y="477" width="203.5" height="9.990005" fill="white"/>
<rect x="48" y="477" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_27">
<rect x="267.5" y="477" width="203.5" height="9.990005" fill="white"/>
<rect x="267.5" y="477" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_26">
<rect x="486.5" y="477" width="203.5" height="9.990005" fill="white"/>
<rect x="486.5" y="477" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_20">
<line x1="-.5" y1="387.172" x2="757.7734" y2="387.172" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_19">
<text transform="translate(-99 378.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_31">
<rect x="48.25" y="410" width="203.5" height="9.990005" fill="white"/>
<rect x="48.25" y="410" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_30">
<rect x="267.75" y="410" width="203.5" height="9.990005" fill="white"/>
<rect x="267.75" y="410" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_29">
<rect x="486.75" y="410" width="203.5" height="9.990005" fill="white"/>
<rect x="486.75" y="410" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_34">
<rect x="48.25" y="431.495" width="113.75" height="34" fill="white"/>
<rect x="48.25" y="431.495" width="113.75" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_33">
<rect x="172.5" y="431.495" width="203.5" height="34" fill="white"/>
<rect x="172.5" y="431.495" width="203.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_32">
<rect x="386.5" y="431.495" width="303.5" height="34" fill="white"/>
<rect x="386.5" y="431.495" width="303.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_37">
<rect x="48" y="498.495" width="203.5" height="9.990005" fill="white"/>
<rect x="48" y="498.495" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_36">
<rect x="267.5" y="498.495" width="203.5" height="9.990005" fill="white"/>
<rect x="267.5" y="498.495" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_35">
<rect x="486.5" y="498.495" width="203.5" height="9.990005" fill="white"/>
<rect x="486.5" y="498.495" width="203.5" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_38">
<line x1="-10.48" y1="535.5395" x2="39.318294" y2="508.24794" marker-end="url(#FilledArrow_Marker)" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_39">
<text transform="translate(-96.984 526.3155)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="40500936e-20" y="15" xml:space="preserve">retain_lsn 1</tspan>
</text>
</g>
<g id="Line_41">
<line x1="-10.48" y1="507.0915" x2="38.90236" y2="485.8992" marker-end="url(#FilledArrow_Marker)" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_40">
<text transform="translate(-96.984 497.8675)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="40500936e-20" y="15" xml:space="preserve">retain_lsn 2</tspan>
</text>
</g>
<g id="Line_43">
<line x1="-10.48" y1="478.6435" x2="39.44267" y2="453.01616" marker-end="url(#FilledArrow_Marker)" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_42">
<text transform="translate(-96.984 469.4195)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="40500936e-20" y="15" xml:space="preserve">retain_lsn 3</tspan>
</text>
</g>
<g id="Line_45">
<line x1="-10.48" y1="448.495" x2="39.65061" y2="419.90015" marker-end="url(#FilledArrow_Marker)" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_44">
<text transform="translate(-96.984 439.271)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="40500936e-20" y="15" xml:space="preserve">retain_lsn 4</tspan>
</text>
</g>
<g id="Graphic_46">
<rect x="335.46477" y="215.5" width="353.4299" height="125.495" fill="white"/>
<rect x="335.46477" y="215.5" width="353.4299" height="125.495" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_48">
<text transform="translate(549.3766 317.547)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="6536993e-19" y="15" xml:space="preserve">Dependent Branch</tspan>
</text>
</g>
<g id="Graphic_50">
<text transform="translate(340.43824 317.547)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="40500936e-20" y="15" xml:space="preserve">retain_lsn 3</tspan>
</text>
</g>
<g id="Line_57">
<line x1="323.90685" y1="248.8045" x2="714.9232" y2="248.8045" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_56">
<text transform="translate(165.91346 240.0805)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="35811354e-19" y="15" xml:space="preserve">Branch GC Horizon</tspan>
</text>
</g>
<g id="Graphic_58">
<rect x="493.9232" y="301.6405" width="107.45294" height="9.990005" fill="white"/>
<rect x="493.9232" y="301.6405" width="107.45294" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_59">
<text transform="translate(358.9232 277.276)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">Partial Image Coverage</tspan>
</text>
</g>
<g id="Graphic_60">
<rect x="354.1732" y="301.6405" width="107.45294" height="9.990005" fill="white"/>
<rect x="354.1732" y="301.6405" width="107.45294" height="9.990005" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 8.4 KiB

View File

@@ -0,0 +1,187 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-235 426 864 366" width="864" height="366">
<defs/>
<g id="05-btmgc-parent" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>05-btmgc-parent</title>
<rect fill="white" x="-235" y="426" width="864" height="366"/>
<g id="05-btmgc-parent_Layer_1">
<title>Layer 1</title>
<g id="Graphic_23">
<rect x="-83" y="510.15" width="203.5" height="26.391998" fill="white"/>
<rect x="-83" y="510.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-78 516.178)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="51.714" y="11" xml:space="preserve">Append C@0x30</tspan>
</text>
</g>
<g id="Graphic_22">
<rect x="136.5" y="510.15" width="203.5" height="26.391998" fill="white"/>
<rect x="136.5" y="510.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_21">
<rect x="355.5" y="510.15" width="203.5" height="26.391998" fill="white"/>
<rect x="355.5" y="510.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_20">
<line x1="-100.448" y1="459.224" x2="626.77344" y2="459.224" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_19">
<text transform="translate(-230 450.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_18">
<rect x="-82.75" y="426.748" width="203.5" height="26.391998" fill="white"/>
<rect x="-82.75" y="426.748" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-77.75 432.776)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="52.602" y="11" xml:space="preserve">Append F@0x60</tspan>
</text>
</g>
<g id="Graphic_17">
<rect x="136.75" y="426.748" width="203.5" height="26.391998" fill="white"/>
<rect x="136.75" y="426.748" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_16">
<rect x="355.75" y="426.748" width="203.5" height="26.391998" fill="white"/>
<rect x="355.75" y="426.748" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_15">
<rect x="-82.75" y="464.645" width="113.75" height="34" fill="white"/>
<rect x="-82.75" y="464.645" width="113.75" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-77.75 467.309)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="7.505" y="11" xml:space="preserve">Append E@0x50</tspan>
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="6.947" y="25.336" xml:space="preserve">Append D@0x40</tspan>
</text>
</g>
<g id="Graphic_14">
<rect x="41.5" y="464.645" width="203.5" height="34" fill="white"/>
<rect x="41.5" y="464.645" width="203.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_13">
<rect x="255.5" y="464.645" width="303.5" height="34" fill="white"/>
<rect x="255.5" y="464.645" width="303.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_12">
<rect x="-83" y="548.047" width="203.5" height="26.391998" fill="white"/>
<rect x="-83" y="548.047" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-78 554.075)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="26.796" y="11" xml:space="preserve">A@0x10, Append B@0x20</tspan>
</text>
</g>
<g id="Graphic_11">
<rect x="136.5" y="548.047" width="203.5" height="26.391998" fill="white"/>
<rect x="136.5" y="548.047" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_10">
<rect x="355.5" y="548.047" width="203.5" height="26.391998" fill="white"/>
<rect x="355.5" y="548.047" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_24">
<line x1="-104" y1="542" x2="610.5" y2="542" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_25">
<text transform="translate(-139.604 534.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Graphic_28">
<text transform="translate(-139.604 452.556)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
<g id="Line_30">
<line x1="-100.448" y1="481.145" x2="614.052" y2="481.145" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_29">
<text transform="translate(-139.604 473.449)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x40</tspan>
</text>
</g>
<g id="Line_48">
<line x1="-99.448" y1="701.513" x2="627.77344" y2="701.513" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_47">
<text transform="translate(-229 692.789)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_46">
<rect x="-81.75" y="670.496" width="113.75" height="26.391998" fill="white"/>
<rect x="-81.75" y="670.496" width="113.75" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-76.75 676.524)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="7.727" y="11" xml:space="preserve">Append F@0x60</tspan>
</text>
</g>
<g id="Graphic_43">
<rect x="-81.75" y="708.393" width="113.75" height="34" fill="white"/>
<rect x="-81.75" y="708.393" width="113.75" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-76.75 718.225)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="7.505" y="11" xml:space="preserve">Append E@0x50</tspan>
</text>
</g>
<g id="Line_37">
<line x1="-101" y1="777.2665" x2="613.5" y2="777.2665" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_36">
<text transform="translate(-138.604 769.7665)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Graphic_33">
<text transform="translate(-138.604 694.845)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
<g id="Line_32">
<line x1="-99.448" y1="755.089" x2="615.052" y2="755.089" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_31">
<text transform="translate(-138.604 747.393)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x40</tspan>
</text>
</g>
<g id="Graphic_40">
<rect x="-82" y="770.909" width="203.5" height="14.107002" fill="white"/>
<rect x="-82" y="770.909" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-77 770.7945)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="70.836" y="11" xml:space="preserve">AB@0x20</tspan>
</text>
</g>
<g id="Graphic_39">
<rect x="137.5" y="770.909" width="203.5" height="14.107002" fill="white"/>
<rect x="137.5" y="770.909" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_38">
<rect x="356.5" y="770.909" width="203.5" height="14.107002" fill="white"/>
<rect x="356.5" y="770.909" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_54">
<rect x="-81.75" y="748.5355" width="203.5" height="14.107002" fill="white"/>
<rect x="-81.75" y="748.5355" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-76.75 748.421)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="62.28" y="11" xml:space="preserve">ABCD@0x40</tspan>
</text>
</g>
<g id="Graphic_53">
<rect x="137.75" y="748.5355" width="203.5" height="14.107002" fill="white"/>
<rect x="137.75" y="748.5355" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_52">
<rect x="356.75" y="748.5355" width="203.5" height="14.107002" fill="white"/>
<rect x="356.75" y="748.5355" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_57">
<path d="M 211.32422 585 L 265.17578 585 L 265.17578 611.332 L 287.84375 611.332 L 238.25 633.117 L 188.65625 611.332 L 211.32422 611.332 Z" fill="white"/>
<path d="M 211.32422 585 L 265.17578 585 L 265.17578 611.332 L 287.84375 611.332 L 238.25 633.117 L 188.65625 611.332 L 211.32422 611.332 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_60">
<rect x="359" y="692.858" width="203.5" height="14.107002" fill="white"/>
<rect x="359" y="692.858" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_59">
<rect x="41.5" y="693.858" width="303" height="14.107002" fill="white"/>
<rect x="41.5" y="693.858" width="303" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,184 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-413 471 931 354" width="931" height="354">
<defs/>
<g id="06-btmgc-child" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>06-btmgc-child</title>
<rect fill="white" x="-413" y="471" width="931" height="354"/>
<g id="06-btmgc-child_Layer_1">
<title>Layer 1</title>
<g id="Graphic_47">
<rect x="-412" y="594.402" width="928" height="28.447998" fill="white"/>
<rect x="-412" y="594.402" width="928" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_46">
<rect x="-205" y="555.552" width="203.5" height="26.391998" fill="white"/>
<rect x="-205" y="555.552" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-200 561.58)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="52.158" y="11" xml:space="preserve">Append P@0x30</tspan>
</text>
</g>
<g id="Graphic_45">
<rect x="14.5" y="555.552" width="203.5" height="26.391998" fill="white"/>
<rect x="14.5" y="555.552" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_44">
<rect x="233.5" y="555.552" width="203.5" height="26.391998" fill="white"/>
<rect x="233.5" y="555.552" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_43">
<line x1="-222.448" y1="504.724" x2="504.77344" y2="504.724" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_42">
<text transform="translate(-352 496)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_41">
<rect x="-204.75" y="472.15" width="203.5" height="26.391998" fill="white"/>
<rect x="-204.75" y="472.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-199.75 478.178)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="52.158" y="11" xml:space="preserve">Append S@0x60</tspan>
</text>
</g>
<g id="Graphic_40">
<rect x="14.75" y="472.15" width="203.5" height="26.391998" fill="white"/>
<rect x="14.75" y="472.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_39">
<rect x="233.75" y="472.15" width="203.5" height="26.391998" fill="white"/>
<rect x="233.75" y="472.15" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_38">
<rect x="-204.75" y="510.047" width="113.75" height="34" fill="white"/>
<rect x="-204.75" y="510.047" width="113.75" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-199.75 512.711)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="7.061" y="11" xml:space="preserve">Append R@0x50</tspan>
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="6.611" y="25.336" xml:space="preserve">Append Q@0x40</tspan>
</text>
</g>
<g id="Graphic_37">
<rect x="-80.5" y="510.047" width="203.5" height="34" fill="white"/>
<rect x="-80.5" y="510.047" width="203.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_36">
<rect x="133.5" y="510.047" width="303.5" height="34" fill="white"/>
<rect x="133.5" y="510.047" width="303.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_33">
<text transform="translate(-261.604 498.056)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
<g id="Line_30">
<line x1="-224" y1="607.9115" x2="490.5" y2="607.9115" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_29">
<text transform="translate(-261.604 600.4115)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Graphic_28">
<rect x="-205" y="601.554" width="203.5" height="14.107002" fill="white"/>
<rect x="-205" y="601.554" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-200 601.4395)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="70.836" y="11" xml:space="preserve">AB@0x20</tspan>
</text>
</g>
<g id="Graphic_27">
<rect x="14.5" y="601.554" width="203.5" height="14.107002" fill="white"/>
<rect x="14.5" y="601.554" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_26">
<rect x="233.5" y="601.554" width="203.5" height="14.107002" fill="white"/>
<rect x="233.5" y="601.554" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_25">
<text transform="translate(-407 599.1875)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">Ancestor Branch</tspan>
</text>
</g>
<g id="Graphic_24">
<rect x="-411" y="795.46" width="928" height="28.447998" fill="white"/>
<rect x="-411" y="795.46" width="928" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_20">
<line x1="-221.448" y1="755.528" x2="505.77344" y2="755.528" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_19">
<text transform="translate(-351 746.804)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_18">
<rect x="-203.75" y="723.579" width="203.25" height="26.391998" fill="white"/>
<rect x="-203.75" y="723.579" width="203.25" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-198.75 729.607)" fill="black">
<tspan font-family="Helvetica Neue" font-size="12" fill="black" x="52.033" y="11" xml:space="preserve">Append S@0x60</tspan>
</text>
</g>
<g id="Graphic_10">
<text transform="translate(-260.604 748.86)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
<g id="Line_7">
<line x1="-223" y1="808.9695" x2="491.5" y2="808.9695" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_6">
<text transform="translate(-260.604 801.4695)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Graphic_5">
<rect x="-204" y="802.612" width="203.5" height="14.107002" fill="white"/>
<rect x="-204" y="802.612" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-199 802.4975)" fill="#b1001c">
<tspan font-family="Helvetica Neue" font-size="12" fill="#b1001c" x="70.836" y="11" xml:space="preserve">AB</tspan>
<tspan font-family="Helvetica Neue" font-size="12" fill="black" y="11" xml:space="preserve">@0x20</tspan>
</text>
</g>
<g id="Graphic_4">
<rect x="15.5" y="802.612" width="203.5" height="14.107002" fill="white"/>
<rect x="15.5" y="802.612" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_3">
<rect x="234.5" y="802.612" width="203.5" height="14.107002" fill="white"/>
<rect x="234.5" y="802.612" width="203.5" height="14.107002" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_2">
<text transform="translate(-406 800.2455)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">Ancestor Branch</tspan>
</text>
</g>
<g id="Graphic_48">
<path d="M 89.32422 639.081 L 143.17578 639.081 L 143.17578 665.413 L 165.84375 665.413 L 116.25 687.198 L 66.65625 665.413 L 89.32422 665.413 Z" fill="white"/>
<path d="M 89.32422 639.081 L 143.17578 639.081 L 143.17578 665.413 L 165.84375 665.413 L 116.25 687.198 L 66.65625 665.413 L 89.32422 665.413 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_49">
<rect x="-204" y="762.428" width="203.5" height="26.391998" fill="white"/>
<rect x="-204" y="762.428" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-199 768.456)" fill="#b1001c">
<tspan font-family="Helvetica Neue" font-size="12" fill="#b1001c" x="58.278" y="11" xml:space="preserve">AB</tspan>
<tspan font-family="Helvetica Neue" font-size="12" fill="black" y="11" xml:space="preserve">PQR@0x50</tspan>
</text>
</g>
<g id="Graphic_59">
<rect x="14.5" y="723.579" width="203.5" height="26.391998" fill="white"/>
<rect x="14.5" y="723.579" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_58">
<rect x="233.5" y="723.579" width="203.5" height="26.391998" fill="white"/>
<rect x="233.5" y="723.579" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_63">
<rect x="9" y="762.085" width="203.5" height="26.391998" fill="white"/>
<rect x="9" y="762.085" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_62">
<rect x="225" y="762.085" width="213" height="26.391998" fill="white"/>
<rect x="225" y="762.085" width="213" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,180 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-556 476 923 411" width="923" height="411">
<defs/>
<g id="07-btmgc-analysis-1" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>07-btmgc-analysis-1</title>
<rect fill="white" x="-556" y="476" width="923" height="411"/>
<g id="07-btmgc-analysis-1_Layer_1">
<title>Layer 1</title>
<g id="Graphic_85">
<rect x="-404" y="609.062" width="203.5" height="17.5" fill="white"/>
<rect x="-404" y="609.062" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_84">
<rect x="-184.5" y="609.062" width="203.5" height="17.5" fill="white"/>
<rect x="-184.5" y="609.062" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_83">
<rect x="34.5" y="609.062" width="203.5" height="17.5" fill="white"/>
<rect x="34.5" y="609.062" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_82">
<rect x="-404" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-404" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_81">
<rect x="-263" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-263" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_80">
<rect x="-122" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-122" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_79">
<rect x="19" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="19" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_78">
<rect x="160" y="479.922" width="78" height="77.5" fill="white"/>
<rect x="160" y="479.922" width="78" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_77">
<line x1="-452.5" y1="518.172" x2="251" y2="518.172" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_76">
<text transform="translate(-551 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_75">
<text transform="translate(-531.59 602.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.39" y="15" xml:space="preserve">Images </tspan>
<tspan font-family="Helvetica Neue" font-size="10" fill="black" x="29132252e-19" y="28.447998" xml:space="preserve">at earlier LSN</tspan>
</text>
</g>
<g id="Graphic_74">
<text transform="translate(-516.08 481)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8739676e-19" y="15" xml:space="preserve">Deltas</tspan>
</text>
</g>
<g id="Graphic_73">
<path d="M -120.675 651.5 L -84.425 651.5 L -84.425 710 L -66.3 710 L -102.55 740 L -138.8 710 L -120.675 710 Z" fill="white"/>
<path d="M -120.675 651.5 L -84.425 651.5 L -84.425 710 L -66.3 710 L -102.55 740 L -138.8 710 L -120.675 710 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_72">
<rect x="-403.8" y="827.474" width="203.5" height="17.5" fill="white"/>
<rect x="-403.8" y="827.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_71">
<rect x="-184.3" y="827.474" width="203.5" height="17.5" fill="white"/>
<rect x="-184.3" y="827.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_70">
<rect x="34.7" y="827.474" width="203.5" height="17.5" fill="white"/>
<rect x="34.7" y="827.474" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_69">
<rect x="-403.8" y="761" width="127" height="52.974" fill="white"/>
<rect x="-403.8" y="761" width="127" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_68">
<rect x="-262.8" y="761" width="310.5" height="52.974" fill="white"/>
<rect x="-262.8" y="761" width="310.5" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_67">
<rect x="64.7" y="761" width="173.5" height="52.974" fill="white"/>
<rect x="64.7" y="761" width="173.5" height="52.974" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_66">
<line x1="-452.3" y1="835.724" x2="251.2" y2="835.724" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_65">
<text transform="translate(-515.88 766)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8739676e-19" y="15" xml:space="preserve">Deltas</tspan>
</text>
</g>
<g id="Graphic_64">
<text transform="translate(-523 820.86)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="3488765e-18" y="15" xml:space="preserve">Images </tspan>
<tspan font-family="Helvetica Neue" font-size="10" fill="black" x="4.01" y="28.447998" xml:space="preserve">at GC LSN</tspan>
</text>
</g>
<g id="Graphic_63">
<rect x="-394.93164" y="491.422" width="624.3633" height="17.5" fill="#c0ffc0"/>
<text transform="translate(-389.93164 492.49)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.52364" y="12" xml:space="preserve">Deltas above GC Horizon</tspan>
</text>
</g>
<g id="Graphic_62">
<rect x="-394.93164" y="526.922" width="624.3633" height="17.5" fill="#c0ffff"/>
<text transform="translate(-389.93164 527.99)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.89414" y="12" xml:space="preserve">Deltas below GC Horizon</tspan>
</text>
</g>
<g id="Graphic_61">
<rect x="-394.73164" y="778.737" width="624.3633" height="17.5" fill="#c0ffc0"/>
<text transform="translate(-389.73164 779.805)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="233.52364" y="12" xml:space="preserve">Deltas above GC Horizon</tspan>
</text>
</g>
<g id="Graphic_60">
<rect x="-333.8" y="858.474" width="485.5" height="28.447998" fill="#c0ffff"/>
<text transform="translate(-328.8 865.016)" fill="black">
<tspan font-family="Helvetica Neue" font-size="13" fill="black" x="63.095" y="12" xml:space="preserve">Deltas and image below GC Horizon gets garbage-collected</tspan>
</text>
</g>
<g id="Graphic_86">
<text transform="translate(263 499.724)" fill="black">
<tspan font-family="Helvetica Neue" font-size="32" fill="black" x="0" y="30" xml:space="preserve">size=A</tspan>
</text>
</g>
<g id="Line_87">
<line x1="260.87012" y1="479.068" x2="360.71387" y2="479.068" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_88">
<line x1="260.87012" y1="561" x2="360.71387" y2="561" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_89">
<rect x="-403.8" y="569" width="161.8" height="28.447998" fill="white"/>
<rect x="-403.8" y="569" width="161.8" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_90">
<rect x="-229.5" y="569.018" width="277.2" height="28.447998" fill="white"/>
<rect x="-229.5" y="569.018" width="277.2" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_91">
<rect x="64.7" y="569.018" width="173.5" height="28.447998" fill="white"/>
<rect x="64.7" y="569.018" width="173.5" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_92">
<line x1="262" y1="602" x2="361.84375" y2="602" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_93">
<line x1="263" y1="625.562" x2="362.84375" y2="625.562" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_94">
<text transform="translate(264.53787 562.276)" fill="black">
<tspan font-family="Helvetica Neue" font-size="32" fill="black" x="14210855e-21" y="30" xml:space="preserve">size=B</tspan>
</text>
</g>
<g id="Graphic_95">
<text transform="translate(285.12 599.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="24" fill="black" x="0" y="23" xml:space="preserve">size=C</tspan>
</text>
</g>
<g id="Graphic_98">
<text transform="translate(264.53787 773.772)" fill="black">
<tspan font-family="Helvetica Neue" font-size="26" fill="black" x="8881784e-19" y="25" xml:space="preserve">A</tspan>
<tspan font-family="Lucida Grande" font-size="26" fill="black" y="25" xml:space="preserve"></tspan>
</text>
</g>
<g id="Graphic_97">
<text transform="translate(265.87013 815.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="26" fill="black" x="6536993e-19" y="25" xml:space="preserve">B</tspan>
<tspan font-family="Lucida Grande" font-size="26" fill="black" y="25" xml:space="preserve"></tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -0,0 +1,158 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-235 406 586 424" width="586" height="424">
<defs/>
<g id="08-optimization" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>08-optimization</title>
<rect fill="white" x="-235" y="406" width="586" height="424"/>
<g id="08-optimization_Layer_1">
<title>Layer 1</title>
<g id="Graphic_22">
<rect x="-100.448" y="509.902" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.448" y="509.902" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_21">
<rect x="118.552" y="509.902" width="203.5" height="26.391998" fill="white"/>
<rect x="118.552" y="509.902" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_20">
<line x1="-101.79572" y1="420.322" x2="349.5" y2="420.322" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_19">
<text transform="translate(-230 411.598)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_17">
<rect x="-100.198" y="426.5" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.198" y="426.5" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_16">
<rect x="118.802" y="426.5" width="203.5" height="26.391998" fill="white"/>
<rect x="118.802" y="426.5" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_14">
<rect x="-100.198" y="464.397" width="108.25" height="34" fill="white"/>
<rect x="-100.198" y="464.397" width="108.25" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_13">
<rect x="18.552" y="464.397" width="303.5" height="34" fill="white"/>
<rect x="18.552" y="464.397" width="303.5" height="34" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_11">
<rect x="-100.448" y="547.799" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.448" y="547.799" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_10">
<rect x="118.552" y="547.799" width="203.5" height="26.391998" fill="white"/>
<rect x="118.552" y="547.799" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_24">
<line x1="-104" y1="542" x2="339.4011" y2="542" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_25">
<text transform="translate(-139.604 534.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Line_27">
<line x1="-101.79572" y1="459.098" x2="341.6054" y2="459.098" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_26">
<text transform="translate(-139.604 451.402)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
<g id="Graphic_28">
<text transform="translate(-139.604 413.654)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x60</tspan>
</text>
</g>
<g id="Line_30">
<line x1="-101.79572" y1="481.145" x2="341.6054" y2="481.145" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_29">
<text transform="translate(-139.604 473.449)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x40</tspan>
</text>
</g>
<g id="Graphic_77">
<rect x="-100.448" y="765.19595" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.448" y="765.19595" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_76">
<rect x="118.552" y="765.19595" width="203.5" height="26.391998" fill="white"/>
<rect x="118.552" y="765.19595" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_75">
<line x1="-101.79572" y1="637.317" x2="349.5" y2="637.317" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_74">
<text transform="translate(-230 628.593)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_73">
<rect x="-100.198" y="681.794" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.198" y="681.794" width="203.5" height="26.391998" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_72">
<rect x="118.802" y="681.794" width="203.5" height="26.391998" fill="white"/>
<rect x="118.802" y="681.794" width="203.5" height="26.391998" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_71">
<rect x="-100.198" y="719.69096" width="108.25" height="34" fill="white"/>
<rect x="-100.198" y="719.69096" width="108.25" height="34" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_70">
<rect x="18.552" y="719.69096" width="303.5" height="34" fill="white"/>
<rect x="18.552" y="719.69096" width="303.5" height="34" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_69">
<rect x="-100.448" y="803.09295" width="203.5" height="26.391998" fill="white"/>
<rect x="-100.448" y="803.09295" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_68">
<rect x="118.552" y="803.09295" width="203.5" height="26.391998" fill="white"/>
<rect x="118.552" y="803.09295" width="203.5" height="26.391998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_67">
<line x1="-104" y1="797.294" x2="339.4011" y2="797.294" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_66">
<text transform="translate(-139.604 789.794)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x20</tspan>
</text>
</g>
<g id="Graphic_63">
<text transform="translate(-139.604 630.649)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x70</tspan>
</text>
</g>
<g id="Line_62">
<line x1="-101.79572" y1="736.439" x2="341.6054" y2="736.439" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_61">
<text transform="translate(-139.604 728.743)" fill="black">
<tspan font-family="Helvetica Neue" font-size="14" fill="black" x="0" y="13" xml:space="preserve">0x40</tspan>
</text>
</g>
<g id="Graphic_79">
<rect x="-100.198" y="644.393" width="168.198" height="26.391998" fill="white"/>
<rect x="-100.198" y="644.393" width="168.198" height="26.391998" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_78">
<rect x="80" y="644.393" width="242.302" height="26.391998" fill="white"/>
<rect x="80" y="644.393" width="242.302" height="26.391998" stroke="#b1001c" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_81">
<line x1="-101.79572" y1="714.139" x2="341.6054" y2="714.139" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="1.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_80">
<text transform="translate(-139.604 706.443)" fill="#a5a5a5">
<tspan font-family="Helvetica Neue" font-size="14" fill="#a5a5a5" x="0" y="13" xml:space="preserve">0x50</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 9.4 KiB

View File

@@ -0,0 +1,184 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-562 479 876 429" width="876" height="429">
<defs/>
<g id="09-btmgc-analysis-2" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>09-btmgc-analysis-2</title>
<rect fill="white" x="-562" y="479" width="876" height="429"/>
<g id="09-btmgc-analysis-2_Layer_1">
<title>Layer 1</title>
<g id="Graphic_85">
<rect x="-404" y="622.386" width="203.5" height="17.5" fill="white"/>
<rect x="-404" y="622.386" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-399 621.912)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="90.974" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_84">
<rect x="-184.5" y="622.386" width="203.5" height="17.5" fill="white"/>
<rect x="-184.5" y="622.386" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-179.5 621.912)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="90.974" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_83">
<rect x="34.5" y="622.386" width="203.5" height="17.5" fill="white"/>
<rect x="34.5" y="622.386" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(39.5 621.912)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="90.974" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_82">
<rect x="-404" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-404" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-399 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_81">
<rect x="-263" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-263" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-258 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_80">
<rect x="-122" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="-122" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-117 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_79">
<rect x="19" y="479.922" width="127" height="77.5" fill="white"/>
<rect x="19" y="479.922" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(24 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_78">
<rect x="160" y="479.922" width="78" height="77.5" fill="white"/>
<rect x="160" y="479.922" width="78" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(165 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="28.816" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Line_77">
<line x1="-452.5" y1="518.172" x2="251" y2="518.172" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_76">
<text transform="translate(-551 509.448)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_73">
<path d="M -120.675 651.5 L -84.425 651.5 L -84.425 710 L -66.3 710 L -102.55 740 L -138.8 710 L -120.675 710 Z" fill="white"/>
<path d="M -120.675 651.5 L -84.425 651.5 L -84.425 710 L -66.3 710 L -102.55 740 L -138.8 710 L -120.675 710 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_89">
<rect x="-403.8" y="582.324" width="161.8" height="28.447998" fill="white"/>
<rect x="-403.8" y="582.324" width="161.8" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-398.8 587.324)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="70.42" y="15" xml:space="preserve">B</tspan>
</text>
</g>
<g id="Graphic_90">
<rect x="-229.5" y="582.342" width="277.2" height="28.447998" fill="white"/>
<rect x="-229.5" y="582.342" width="277.2" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-224.5 587.342)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="128.12" y="15" xml:space="preserve">B</tspan>
</text>
</g>
<g id="Graphic_91">
<rect x="64.7" y="582.342" width="173.5" height="28.447998" fill="white"/>
<rect x="64.7" y="582.342" width="173.5" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(69.7 587.342)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="76.27" y="15" xml:space="preserve">B</tspan>
</text>
</g>
<g id="Graphic_97">
<rect x="-403.8" y="564.842" width="490.8" height="12.157997" fill="white"/>
<rect x="-403.8" y="564.842" width="490.8" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-398.8 561.697)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="234.624" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_109">
<rect x="28.6" y="889.964" width="203.5" height="17.5" fill="white"/>
<rect x="28.6" y="889.964" width="203.5" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(33.6 889.49)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="90.974" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_108">
<rect x="-409.9" y="747.5" width="127" height="77.5" fill="white"/>
<rect x="-409.9" y="747.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-404.9 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_107">
<rect x="-268.9" y="747.5" width="127" height="77.5" fill="white"/>
<rect x="-268.9" y="747.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-263.9 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_106">
<rect x="-127.9" y="747.5" width="127" height="77.5" fill="white"/>
<rect x="-127.9" y="747.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-122.9 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_105">
<rect x="13.1" y="747.5" width="127" height="77.5" fill="white"/>
<rect x="13.1" y="747.5" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(18.1 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="53.316" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Graphic_104">
<rect x="154.1" y="747.5" width="78" height="77.5" fill="white"/>
<rect x="154.1" y="747.5" width="78" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(159.1 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="28.816" y="15" xml:space="preserve">A</tspan>
</text>
</g>
<g id="Line_103">
<line x1="-458.4" y1="785.75" x2="245.1" y2="785.75" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_102">
<text transform="translate(-556.9 777.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_99">
<rect x="58.8" y="849.92" width="173.5" height="28.447998" fill="white"/>
<rect x="58.8" y="849.92" width="173.5" height="28.447998" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(63.8 854.92)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="76.27" y="15" xml:space="preserve">B</tspan>
</text>
</g>
<g id="Graphic_98">
<rect x="-409.7" y="832.42" width="490.8" height="12.157997" fill="white"/>
<rect x="-409.7" y="832.42" width="490.8" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(-404.7 829.275)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="234.624" y="15" xml:space="preserve">C</tspan>
</text>
</g>
<g id="Graphic_112">
<text transform="translate(273 797.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="26" fill="black" x="6536993e-19" y="25" xml:space="preserve">B</tspan>
<tspan font-family="Lucida Grande" font-size="26" fill="black" y="25" xml:space="preserve"></tspan>
</text>
</g>
<g id="Graphic_113">
<text transform="translate(273 833.974)" fill="black">
<tspan font-family="Helvetica Neue" font-size="26" fill="black" x="42277293e-20" y="25" xml:space="preserve">C</tspan>
<tspan font-family="Lucida Grande" font-size="26" fill="black" y="25" xml:space="preserve"></tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,81 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-12 920 809 269" width="809" height="269">
<defs/>
<g id="10-btmgc-analysis-3" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>10-btmgc-analysis-3</title>
<rect fill="white" x="-12" y="920" width="809" height="269"/>
<g id="10-btmgc-analysis-3_Layer_1">
<title>Layer 1</title>
<g id="Graphic_13">
<rect x="433.7" y="949" width="63.559346" height="77.5" fill="white"/>
<rect x="433.7" y="949" width="63.559346" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(438.7 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.107673" y="15" xml:space="preserve">1/5 X</tspan>
</text>
</g>
<g id="Graphic_12">
<rect x="503.7654" y="949" width="63.559346" height="77.5" fill="white"/>
<rect x="503.7654" y="949" width="63.559346" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(508.7654 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.107673" y="15" xml:space="preserve">1/5 X</tspan>
</text>
</g>
<g id="Graphic_11">
<rect x="574.8318" y="949" width="63.559346" height="77.5" fill="white"/>
<rect x="574.8318" y="949" width="63.559346" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(579.8318 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.107673" y="15" xml:space="preserve">1/5 X</tspan>
</text>
</g>
<g id="Graphic_10">
<rect x="645.3977" y="949" width="63.559346" height="77.5" fill="white"/>
<rect x="645.3977" y="949" width="63.559346" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(650.3977 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.107673" y="15" xml:space="preserve">1/5 X</tspan>
</text>
</g>
<g id="Line_8">
<line x1="92" y1="934.276" x2="795.5" y2="934.276" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_7">
<text transform="translate(-6.500003 925.552)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_2">
<rect x="113.2" y="1033.92" width="321.3" height="12.157997" fill="white"/>
<rect x="113.2" y="1033.92" width="321.3" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(118.2 1030.775)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="150.762" y="15" xml:space="preserve">X</tspan>
</text>
</g>
<g id="Graphic_17">
<path d="M 420.125 1062 L 456.375 1062 L 456.375 1120.5 L 474.5 1120.5 L 438.25 1150.5 L 402 1120.5 L 420.125 1120.5 Z" fill="white"/>
<path d="M 420.125 1062 L 456.375 1062 L 456.375 1120.5 L 474.5 1120.5 L 438.25 1150.5 L 402 1120.5 L 420.125 1120.5 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_27">
<line x1="93" y1="1164.224" x2="796.5" y2="1164.224" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_26">
<text transform="translate(-5.5000034 1155.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_25">
<rect x="114" y="1173.5" width="641.8" height="12.157997" fill="white"/>
<rect x="114" y="1173.5" width="641.8" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(119 1170.355)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="306.564" y="15" xml:space="preserve">2X</tspan>
</text>
</g>
<g id="Graphic_33">
<rect x="715.96355" y="949" width="63.559346" height="77.5" fill="white"/>
<rect x="715.96355" y="949" width="63.559346" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(720.96355 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="8.107673" y="15" xml:space="preserve">1/5 X</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 5.1 KiB

View File

@@ -0,0 +1,81 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="-12 920 809 269" width="809" height="269">
<defs/>
<g id="11-btmgc-analysis-4" stroke-opacity="1" stroke-dasharray="none" stroke="none" fill="none" fill-opacity="1">
<title>11-btmgc-analysis-4</title>
<rect fill="white" x="-12" y="920" width="809" height="269"/>
<g id="11-btmgc-analysis-4_Layer_1">
<title>Layer 1</title>
<g id="Graphic_13">
<rect x="113" y="949" width="127" height="77.5" fill="white"/>
<rect x="113" y="949" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(118 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="39.084" y="15" xml:space="preserve">1/5 D</tspan>
</text>
</g>
<g id="Graphic_12">
<rect x="253" y="949" width="127" height="77.5" fill="white"/>
<rect x="253" y="949" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(258 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="39.084" y="15" xml:space="preserve">1/5 D</tspan>
</text>
</g>
<g id="Graphic_11">
<rect x="395" y="949" width="127" height="77.5" fill="white"/>
<rect x="395" y="949" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(400 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="39.084" y="15" xml:space="preserve">1/5 D</tspan>
</text>
</g>
<g id="Graphic_10">
<rect x="536" y="949" width="127" height="77.5" fill="white"/>
<rect x="536" y="949" width="127" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(541 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="39.084" y="15" xml:space="preserve">1/5 D</tspan>
</text>
</g>
<g id="Graphic_9">
<rect x="677" y="949" width="78" height="77.5" fill="white"/>
<rect x="677" y="949" width="78" height="77.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(682 978.526)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="14.584" y="15" xml:space="preserve">1/5 D</tspan>
</text>
</g>
<g id="Line_8">
<line x1="92" y1="934.276" x2="795.5" y2="934.276" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_7">
<text transform="translate(-6.500003 925.552)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_2">
<rect x="113.2" y="1033.92" width="641.8" height="12.157997" fill="white"/>
<rect x="113.2" y="1033.92" width="641.8" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(118.2 1030.775)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="310.268" y="15" xml:space="preserve">D</tspan>
</text>
</g>
<g id="Graphic_17">
<path d="M 420.125 1062 L 456.375 1062 L 456.375 1120.5 L 474.5 1120.5 L 438.25 1150.5 L 402 1120.5 L 420.125 1120.5 Z" fill="white"/>
<path d="M 420.125 1062 L 456.375 1062 L 456.375 1120.5 L 474.5 1120.5 L 438.25 1150.5 L 402 1120.5 L 420.125 1120.5 Z" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_27">
<line x1="93" y1="1164.224" x2="796.5" y2="1164.224" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_26">
<text transform="translate(-5.5000034 1155.5)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">GC Horizon</tspan>
</text>
</g>
<g id="Graphic_25">
<rect x="114" y="1173.5" width="641.8" height="12.157997" fill="white"/>
<rect x="114" y="1173.5" width="641.8" height="12.157997" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(119 1170.355)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="310.268" y="15" xml:space="preserve">D</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

View File

@@ -0,0 +1,176 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" viewBox="210 271 870 514" width="870" height="514">
<defs/>
<g id="gc-compaction-split" stroke-dasharray="none" fill-opacity="1" stroke="none" fill="none" stroke-opacity="1">
<title>gc-compaction-split</title>
<rect fill="white" x="210" y="271" width="870" height="514"/>
<g id="gc-compaction-split_Layer_1">
<title>Layer 1</title>
<g id="Graphic_12">
<rect x="241" y="272" width="213" height="50.5" fill="white"/>
<rect x="241" y="272" width="213" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_11">
<rect x="468.72266" y="272" width="213" height="50.5" fill="white"/>
<rect x="468.72266" y="272" width="213" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_10">
<rect x="695.72266" y="272" width="213" height="50.5" fill="white"/>
<rect x="695.72266" y="272" width="213" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_9">
<rect x="241" y="337.3711" width="303.5" height="50.5" fill="white"/>
<rect x="241" y="337.3711" width="303.5" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_8">
<rect x="556.2617" y="337.3711" width="352.46094" height="50.5" fill="white"/>
<rect x="556.2617" y="337.3711" width="352.46094" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_7">
<rect x="241" y="402.7422" width="667.72266" height="50.5" fill="white"/>
<rect x="241" y="402.7422" width="667.72266" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_6">
<line x1="211" y1="355.5" x2="947.4961" y2="355.5" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_5">
<text transform="translate(952.4961 346.776)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">branch point</tspan>
</text>
</g>
<g id="Line_4">
<line x1="212" y1="438.5182" x2="948.4961" y2="438.5182" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_3">
<text transform="translate(953.4961 429.7942)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">last branch point</tspan>
</text>
</g>
<g id="Graphic_13">
<rect x="241" y="272" width="127.99101" height="181.24219" fill="#3a8eed" fill-opacity=".5"/>
<text transform="translate(246 353.3971)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="38.835502" y="15" xml:space="preserve">Job 1</tspan>
</text>
</g>
<g id="Graphic_57">
<rect x="359" y="647.96484" width="551.72266" height="50.5" fill="white"/>
<rect x="359" y="647.96484" width="551.72266" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_54">
<rect x="359" y="517.22266" width="96" height="50.5" fill="white"/>
<rect x="359" y="517.22266" width="96" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_53">
<rect x="469.72266" y="517.22266" width="213" height="50.5" fill="white"/>
<rect x="469.72266" y="517.22266" width="213" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_52">
<rect x="696.72266" y="517.22266" width="213" height="50.5" fill="white"/>
<rect x="696.72266" y="517.22266" width="213" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_51">
<rect x="359" y="582.59375" width="186.5" height="50.5" fill="white"/>
<rect x="359" y="582.59375" width="186.5" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_50">
<rect x="557.2617" y="582.59375" width="352.46094" height="50.5" fill="white"/>
<rect x="557.2617" y="582.59375" width="352.46094" height="50.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_49">
<line x1="212" y1="600.72266" x2="948.4961" y2="600.72266" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_48">
<text transform="translate(953.4961 591.99866)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">branch point</tspan>
</text>
</g>
<g id="Line_47">
<line x1="213" y1="683.74084" x2="949.4961" y2="683.74084" stroke="#7f8080" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_46">
<text transform="translate(954.4961 675.01685)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="0" y="15" xml:space="preserve">last branch point</tspan>
</text>
</g>
<g id="Graphic_63">
<rect x="376.72525" y="272" width="127.99101" height="181.24219" fill="#3a8eed" fill-opacity=".5"/>
<text transform="translate(381.72525 353.3971)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="38.835502" y="15" xml:space="preserve">Job 2</tspan>
</text>
</g>
<g id="Graphic_64">
<rect x="511.39405" y="272" width="127.99101" height="181.24219" fill="#3a8eed" fill-opacity=".5"/>
<text transform="translate(516.39405 353.3971)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="38.835502" y="15" xml:space="preserve">Job 3</tspan>
</text>
</g>
<g id="Graphic_65">
<rect x="646.06285" y="272" width="127.99101" height="181.24219" fill="#3a8eed" fill-opacity=".5"/>
<text transform="translate(651.06285 353.3971)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="38.835502" y="15" xml:space="preserve">Job 4</tspan>
</text>
</g>
<g id="Graphic_66">
<rect x="780.73165" y="272" width="127.99101" height="181.24219" fill="#3a8eed" fill-opacity=".5"/>
<text transform="translate(785.73165 353.3971)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="38.835502" y="15" xml:space="preserve">Job 5</tspan>
</text>
</g>
<g id="Graphic_56">
<rect x="243.5" y="517.22266" width="125.49101" height="181.24219" fill="#ccc"/>
<rect x="243.5" y="517.22266" width="125.49101" height="181.24219" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_55">
<rect x="243.5" y="673.46484" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="243.5" y="673.46484" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_68">
<rect x="379.22525" y="517.22266" width="125.49101" height="181.24219" fill="#ccc"/>
<rect x="379.22525" y="517.22266" width="125.49101" height="181.24219" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_67">
<rect x="379.22525" y="673.46484" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="379.22525" y="673.46484" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_70">
<rect x="514.22525" y="517.22266" width="125.49101" height="181.24219" fill="#ccc"/>
<rect x="514.22525" y="517.22266" width="125.49101" height="181.24219" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_69">
<rect x="514.22525" y="673.46484" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="514.22525" y="673.46484" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_72">
<rect x="649.22525" y="517.22266" width="125.49101" height="181.24219" fill="#ccc"/>
<rect x="649.22525" y="517.22266" width="125.49101" height="181.24219" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_71">
<rect x="649.22525" y="673.46484" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="649.22525" y="673.46484" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_74">
<rect x="785.23165" y="517.22266" width="125.49101" height="181.24219" fill="#ccc"/>
<rect x="785.23165" y="517.22266" width="125.49101" height="181.24219" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_73">
<rect x="785.23165" y="673.46484" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="785.23165" y="673.46484" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_78">
<rect x="241" y="731.3359" width="125.49101" height="27.26953" fill="#ccc"/>
<rect x="241" y="731.3359" width="125.49101" height="27.26953" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(246 735.7467)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="17.297502" y="15" xml:space="preserve">Delta Layer</tspan>
</text>
</g>
<g id="Graphic_79">
<rect x="241" y="766.759" width="125.49101" height="17.5" fill="#6b7ca5"/>
<rect x="241" y="766.759" width="125.49101" height="17.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(246 766.285)" fill="white">
<tspan font-family="Helvetica Neue" font-size="16" fill="white" x="13.737502" y="15" xml:space="preserve">Image Layer</tspan>
</text>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -0,0 +1,13 @@
[package]
name = "neon-shmem"
version = "0.1.0"
edition.workspace = true
license.workspace = true
[dependencies]
thiserror.workspace = true
nix.workspace=true
workspace_hack = { version = "0.1", path = "../../workspace_hack" }
[target.'cfg(target_os = "macos")'.dependencies]
tempfile = "3.14.0"

418
libs/neon-shmem/src/lib.rs Normal file
View File

@@ -0,0 +1,418 @@
//! Shared memory utilities for neon communicator
use std::num::NonZeroUsize;
use std::os::fd::{AsFd, BorrowedFd, OwnedFd};
use std::ptr::NonNull;
use std::sync::atomic::{AtomicUsize, Ordering};
use nix::errno::Errno;
use nix::sys::mman::MapFlags;
use nix::sys::mman::ProtFlags;
use nix::sys::mman::mmap as nix_mmap;
use nix::sys::mman::munmap as nix_munmap;
use nix::unistd::ftruncate as nix_ftruncate;
/// ShmemHandle represents a shared memory area that can be shared by processes over fork().
/// Unlike shared memory allocated by Postgres, this area is resizable, up to 'max_size' that's
/// specified at creation.
///
/// The area is backed by an anonymous file created with memfd_create(). The full address space for
/// 'max_size' is reserved up-front with mmap(), but whenever you call [`ShmemHandle::set_size`],
/// the underlying file is resized. Do not access the area beyond the current size. Currently, that
/// will cause the file to be expanded, but we might use mprotect() etc. to enforce that in the
/// future.
pub struct ShmemHandle {
/// memfd file descriptor
fd: OwnedFd,
max_size: usize,
// Pointer to the beginning of the shared memory area. The header is stored there.
shared_ptr: NonNull<SharedStruct>,
// Pointer to the beginning of the user data
pub data_ptr: NonNull<u8>,
}
/// This is stored at the beginning in the shared memory area.
struct SharedStruct {
max_size: usize,
/// Current size of the backing file. The high-order bit is used for the RESIZE_IN_PROGRESS flag
current_size: AtomicUsize,
}
const RESIZE_IN_PROGRESS: usize = 1 << 63;
const HEADER_SIZE: usize = std::mem::size_of::<SharedStruct>();
/// Error type returned by the ShmemHandle functions.
#[derive(thiserror::Error, Debug)]
#[error("{msg}: {errno}")]
pub struct Error {
pub msg: String,
pub errno: Errno,
}
impl Error {
fn new(msg: &str, errno: Errno) -> Error {
Error {
msg: msg.to_string(),
errno,
}
}
}
impl ShmemHandle {
/// Create a new shared memory area. To communicate between processes, the processes need to be
/// fork()'d after calling this, so that the ShmemHandle is inherited by all processes.
///
/// If the ShmemHandle is dropped, the memory is unmapped from the current process. Other
/// processes can continue using it, however.
pub fn new(name: &str, initial_size: usize, max_size: usize) -> Result<ShmemHandle, Error> {
// create the backing anonymous file.
let fd = create_backing_file(name)?;
Self::new_with_fd(fd, initial_size, max_size)
}
fn new_with_fd(
fd: OwnedFd,
initial_size: usize,
max_size: usize,
) -> Result<ShmemHandle, Error> {
// We reserve the high-order bit for the RESIZE_IN_PROGRESS flag, and the actual size
// is a little larger than this because of the SharedStruct header. Make the upper limit
// somewhat smaller than that, because with anything close to that, you'll run out of
// memory anyway.
if max_size >= 1 << 48 {
panic!("max size {} too large", max_size);
}
if initial_size > max_size {
panic!("initial size {initial_size} larger than max size {max_size}");
}
// The actual initial / max size is the one given by the caller, plus the size of
// 'SharedStruct'.
let initial_size = HEADER_SIZE + initial_size;
let max_size = NonZeroUsize::new(HEADER_SIZE + max_size).unwrap();
// Reserve address space for it with mmap
//
// TODO: Use MAP_HUGETLB if possible
let start_ptr = unsafe {
nix_mmap(
None,
max_size,
ProtFlags::PROT_READ | ProtFlags::PROT_WRITE,
MapFlags::MAP_SHARED,
&fd,
0,
)
}
.map_err(|e| Error::new("mmap failed: {e}", e))?;
// Reserve space for the initial size
enlarge_file(fd.as_fd(), initial_size as u64)?;
// Initialize the header
let shared: NonNull<SharedStruct> = start_ptr.cast();
unsafe {
shared.write(SharedStruct {
max_size: max_size.into(),
current_size: AtomicUsize::new(initial_size),
})
};
// The user data begins after the header
let data_ptr = unsafe { start_ptr.cast().add(HEADER_SIZE) };
Ok(ShmemHandle {
fd,
max_size: max_size.into(),
shared_ptr: shared,
data_ptr,
})
}
// return reference to the header
fn shared(&self) -> &SharedStruct {
unsafe { self.shared_ptr.as_ref() }
}
/// Resize the shared memory area. 'new_size' must not be larger than the 'max_size' specified
/// when creating the area.
///
/// This may only be called from one process/thread concurrently. We detect that case
/// and return an Error.
pub fn set_size(&self, new_size: usize) -> Result<(), Error> {
let new_size = new_size + HEADER_SIZE;
let shared = self.shared();
if new_size > self.max_size {
panic!(
"new size ({} is greater than max size ({})",
new_size, self.max_size
);
}
assert_eq!(self.max_size, shared.max_size);
// Lock the area by setting the bit in 'current_size'
//
// Ordering::Relaxed would probably be sufficient here, as we don't access any other memory
// and the posix_fallocate/ftruncate call is surely a synchronization point anyway. But
// since this is not performance-critical, better safe than sorry .
let mut old_size = shared.current_size.load(Ordering::Acquire);
loop {
if (old_size & RESIZE_IN_PROGRESS) != 0 {
return Err(Error::new(
"concurrent resize detected",
Errno::UnknownErrno,
));
}
match shared.current_size.compare_exchange(
old_size,
new_size,
Ordering::Acquire,
Ordering::Relaxed,
) {
Ok(_) => break,
Err(x) => old_size = x,
}
}
// Ok, we got the lock.
//
// NB: If anything goes wrong, we *must* clear the bit!
let result = {
use std::cmp::Ordering::{Equal, Greater, Less};
match new_size.cmp(&old_size) {
Less => nix_ftruncate(&self.fd, new_size as i64).map_err(|e| {
Error::new("could not shrink shmem segment, ftruncate failed: {e}", e)
}),
Equal => Ok(()),
Greater => enlarge_file(self.fd.as_fd(), new_size as u64),
}
};
// Unlock
shared.current_size.store(
if result.is_ok() { new_size } else { old_size },
Ordering::Release,
);
result
}
/// Returns the current user-visible size of the shared memory segment.
///
/// NOTE: a concurrent set_size() call can change the size at any time. It is the caller's
/// responsibility not to access the area beyond the current size.
pub fn current_size(&self) -> usize {
let total_current_size =
self.shared().current_size.load(Ordering::Relaxed) & !RESIZE_IN_PROGRESS;
total_current_size - HEADER_SIZE
}
}
impl Drop for ShmemHandle {
fn drop(&mut self) {
// SAFETY: The pointer was obtained from mmap() with the given size.
// We unmap the entire region.
let _ = unsafe { nix_munmap(self.shared_ptr.cast(), self.max_size) };
// The fd is dropped automatically by OwnedFd.
}
}
/// Create a "backing file" for the shared memory area. On Linux, use memfd_create(), to create an
/// anonymous in-memory file. One macos, fall back to a regular file. That's good enough for
/// development and testing, but in production we want the file to stay in memory.
///
/// disable 'unused_variables' warnings, because in the macos path, 'name' is unused.
#[allow(unused_variables)]
fn create_backing_file(name: &str) -> Result<OwnedFd, Error> {
#[cfg(not(target_os = "macos"))]
{
nix::sys::memfd::memfd_create(name, nix::sys::memfd::MFdFlags::empty())
.map_err(|e| Error::new("memfd_create failed: {e}", e))
}
#[cfg(target_os = "macos")]
{
let file = tempfile::tempfile().map_err(|e| {
Error::new(
"could not create temporary file to back shmem area: {e}",
nix::errno::Errno::from_raw(e.raw_os_error().unwrap_or(0)),
)
})?;
Ok(OwnedFd::from(file))
}
}
fn enlarge_file(fd: BorrowedFd, size: u64) -> Result<(), Error> {
// Use posix_fallocate() to enlarge the file. It reserves the space correctly, so that
// we don't get a segfault later when trying to actually use it.
#[cfg(not(target_os = "macos"))]
{
nix::fcntl::posix_fallocate(fd, 0, size as i64).map_err(|e| {
Error::new(
"could not grow shmem segment, posix_fallocate failed: {e}",
e,
)
})
}
// As a fallback on macos, which doesn't have posix_fallocate, use plain 'fallocate'
#[cfg(target_os = "macos")]
{
nix::unistd::ftruncate(fd, size as i64)
.map_err(|e| Error::new("could not grow shmem segment, ftruncate failed: {e}", e))
}
}
#[cfg(test)]
mod tests {
use super::*;
use nix::unistd::ForkResult;
use std::ops::Range;
/// check that all bytes in given range have the expected value.
fn assert_range(ptr: *const u8, expected: u8, range: Range<usize>) {
for i in range {
let b = unsafe { *(ptr.add(i)) };
assert_eq!(expected, b, "unexpected byte at offset {}", i);
}
}
/// Write 'b' to all bytes in the given range
fn write_range(ptr: *mut u8, b: u8, range: Range<usize>) {
unsafe { std::ptr::write_bytes(ptr.add(range.start), b, range.end - range.start) };
}
// simple single-process test of growing and shrinking
#[test]
fn test_shmem_resize() -> Result<(), Error> {
let max_size = 1024 * 1024;
let init_struct = ShmemHandle::new("test_shmem_resize", 0, max_size)?;
assert_eq!(init_struct.current_size(), 0);
// Initial grow
let size1 = 10000;
init_struct.set_size(size1).unwrap();
assert_eq!(init_struct.current_size(), size1);
// Write some data
let data_ptr = init_struct.data_ptr.as_ptr();
write_range(data_ptr, 0xAA, 0..size1);
assert_range(data_ptr, 0xAA, 0..size1);
// Shrink
let size2 = 5000;
init_struct.set_size(size2).unwrap();
assert_eq!(init_struct.current_size(), size2);
// Grow again
let size3 = 20000;
init_struct.set_size(size3).unwrap();
assert_eq!(init_struct.current_size(), size3);
// Try to read it. The area that was shrunk and grown again should read as all zeros now
assert_range(data_ptr, 0xAA, 0..5000);
assert_range(data_ptr, 0, 5000..size1);
// Try to grow beyond max_size
//let size4 = max_size + 1;
//assert!(init_struct.set_size(size4).is_err());
// Dropping init_struct should unmap the memory
drop(init_struct);
Ok(())
}
/// This is used in tests to coordinate between test processes. It's like std::sync::Barrier,
/// but is stored in the shared memory area and works across processes. It's implemented by
/// polling, because e.g. standard rust mutexes are not guaranteed to work across processes.
struct SimpleBarrier {
num_procs: usize,
count: AtomicUsize,
}
impl SimpleBarrier {
unsafe fn init(ptr: *mut SimpleBarrier, num_procs: usize) {
unsafe {
*ptr = SimpleBarrier {
num_procs,
count: AtomicUsize::new(0),
}
}
}
pub fn wait(&self) {
let old = self.count.fetch_add(1, Ordering::Relaxed);
let generation = old / self.num_procs;
let mut current = old + 1;
while current < (generation + 1) * self.num_procs {
std::thread::sleep(std::time::Duration::from_millis(10));
current = self.count.load(Ordering::Relaxed);
}
}
}
#[test]
fn test_multi_process() {
// Initialize
let max_size = 1_000_000_000_000;
let init_struct = ShmemHandle::new("test_multi_process", 0, max_size).unwrap();
let ptr = init_struct.data_ptr.as_ptr();
// Store the SimpleBarrier in the first 1k of the area.
init_struct.set_size(10000).unwrap();
let barrier_ptr: *mut SimpleBarrier = unsafe {
ptr.add(ptr.align_offset(std::mem::align_of::<SimpleBarrier>()))
.cast()
};
unsafe { SimpleBarrier::init(barrier_ptr, 2) };
let barrier = unsafe { barrier_ptr.as_ref().unwrap() };
// Fork another test process. The code after this runs in both processes concurrently.
let fork_result = unsafe { nix::unistd::fork().unwrap() };
// In the parent, fill bytes between 1000..2000. In the child, between 2000..3000
if fork_result.is_parent() {
write_range(ptr, 0xAA, 1000..2000);
} else {
write_range(ptr, 0xBB, 2000..3000);
}
barrier.wait();
// Verify the contents. (in both processes)
assert_range(ptr, 0xAA, 1000..2000);
assert_range(ptr, 0xBB, 2000..3000);
// Grow, from the child this time
let size = 10_000_000;
if !fork_result.is_parent() {
init_struct.set_size(size).unwrap();
}
barrier.wait();
// make some writes at the end
if fork_result.is_parent() {
write_range(ptr, 0xAA, (size - 10)..size);
} else {
write_range(ptr, 0xBB, (size - 20)..(size - 10));
}
barrier.wait();
// Verify the contents. (This runs in both processes)
assert_range(ptr, 0, (size - 1000)..(size - 20));
assert_range(ptr, 0xBB, (size - 20)..(size - 10));
assert_range(ptr, 0xAA, (size - 10)..size);
if let ForkResult::Parent { child } = fork_result {
nix::sys::wait::waitpid(child, None).unwrap();
}
}
}

View File

@@ -305,6 +305,7 @@ impl From<OtelExporterProtocol> for tracing_utils::Protocol {
pub struct TimelineImportConfig {
pub import_job_concurrency: NonZeroUsize,
pub import_job_soft_size_limit: NonZeroUsize,
pub import_job_checkpoint_threshold: NonZeroUsize,
}
pub mod statvfs {
@@ -639,23 +640,15 @@ impl Default for ConfigToml {
tenant_config: TenantConfigToml::default(),
no_sync: None,
wal_receiver_protocol: DEFAULT_WAL_RECEIVER_PROTOCOL,
page_service_pipelining: if !cfg!(test) {
PageServicePipeliningConfig::Serial
} else {
// Do not turn this into the default until scattered reads have been
// validated and rolled-out fully.
PageServicePipeliningConfig::Pipelined(PageServicePipeliningConfigPipelined {
page_service_pipelining: PageServicePipeliningConfig::Pipelined(
PageServicePipeliningConfigPipelined {
max_batch_size: NonZeroUsize::new(32).unwrap(),
execution: PageServiceProtocolPipelinedExecutionStrategy::ConcurrentFutures,
batching: PageServiceProtocolPipelinedBatchingStrategy::ScatteredLsn,
})
},
get_vectored_concurrent_io: if !cfg!(test) {
GetVectoredConcurrentIo::Sequential
} else {
GetVectoredConcurrentIo::SidecarTask
},
enable_read_path_debugging: if cfg!(test) || cfg!(feature = "testing") {
},
),
get_vectored_concurrent_io: GetVectoredConcurrentIo::SidecarTask,
enable_read_path_debugging: if cfg!(feature = "testing") {
Some(true)
} else {
None
@@ -669,6 +662,7 @@ impl Default for ConfigToml {
timeline_import_config: TimelineImportConfig {
import_job_concurrency: NonZeroUsize::new(128).unwrap(),
import_job_soft_size_limit: NonZeroUsize::new(1024 * 1024 * 1024).unwrap(),
import_job_checkpoint_threshold: NonZeroUsize::new(128).unwrap(),
},
}
}

View File

@@ -910,6 +910,11 @@ impl Key {
self.field1 == 0x00 && self.field4 != 0 && self.field6 != 0xffffffff
}
#[inline(always)]
pub fn is_rel_block_of_rel(&self, rel: Oid) -> bool {
self.is_rel_block_key() && self.field4 == rel
}
#[inline(always)]
pub fn is_rel_dir_key(&self) -> bool {
self.field1 == 0x00

View File

@@ -336,14 +336,30 @@ impl TimelineCreateRequest {
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub enum ShardImportStatus {
InProgress,
InProgress(Option<ShardImportProgress>),
Done,
Error(String),
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub enum ShardImportProgress {
V1(ShardImportProgressV1),
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub struct ShardImportProgressV1 {
/// Total number of jobs in the import plan
pub jobs: usize,
/// Number of jobs completed
pub completed: usize,
/// Hash of the plan
pub import_plan_hash: u64,
}
impl ShardImportStatus {
pub fn is_terminal(&self) -> bool {
match self {
ShardImportStatus::InProgress => false,
ShardImportStatus::InProgress(_) => false,
ShardImportStatus::Done | ShardImportStatus::Error(_) => true,
}
}
@@ -1803,7 +1819,6 @@ pub struct TopTenantShardsResponse {
}
pub mod virtual_file {
use std::sync::LazyLock;
#[derive(
Copy,
@@ -1851,15 +1866,7 @@ pub mod virtual_file {
impl IoMode {
pub fn preferred() -> Self {
// The default behavior when running Rust unit tests without any further
// flags is to use the newest behavior (DirectRw).
// The CI uses the environment variable to unit tests for all different modes.
// NB: the Python regression & perf tests have their own defaults management
// that writes pageserver.toml; they do not use this variable.
static ENV_OVERRIDE: LazyLock<Option<IoMode>> = LazyLock::new(|| {
utils::env::var_serde_json_string("NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IO_MODE")
});
ENV_OVERRIDE.unwrap_or(IoMode::DirectRw)
IoMode::DirectRw
}
}

View File

@@ -4,6 +4,7 @@
//! See docs/rfcs/025-generation-numbers.md
use serde::{Deserialize, Serialize};
use utils::generation::Generation;
use utils::id::{NodeId, TimelineId};
use crate::controller_api::NodeRegisterRequest;
@@ -63,9 +64,17 @@ pub struct ValidateResponseTenant {
pub valid: bool,
}
#[derive(Serialize, Deserialize)]
pub struct TimelineImportStatusRequest {
pub tenant_shard_id: TenantShardId,
pub timeline_id: TimelineId,
pub generation: Generation,
}
#[derive(Serialize, Deserialize)]
pub struct PutTimelineImportStatusRequest {
pub tenant_shard_id: TenantShardId,
pub timeline_id: TimelineId,
pub status: ShardImportStatus,
pub generation: Generation,
}

View File

@@ -36,6 +36,24 @@ impl Value {
Value::WalRecord(rec) => rec.will_init(),
}
}
#[inline(always)]
pub fn estimated_size(&self) -> usize {
match self {
Value::Image(image) => image.len(),
Value::WalRecord(NeonWalRecord::AuxFile {
content: Some(content),
..
}) => content.len(),
Value::WalRecord(NeonWalRecord::Postgres { rec, .. }) => rec.len(),
Value::WalRecord(NeonWalRecord::ClogSetAborted { xids }) => xids.len() * 4,
Value::WalRecord(NeonWalRecord::ClogSetCommitted { xids, .. }) => xids.len() * 4,
Value::WalRecord(NeonWalRecord::MultixactMembersCreate { members, .. }) => {
members.len() * 8
}
_ => 8192, /* use image size as the estimation */
}
}
}
#[derive(Debug, PartialEq)]

View File

@@ -0,0 +1,14 @@
[package]
name = "posthog_client_lite"
version = "0.1.0"
edition = "2024"
license.workspace = true
[dependencies]
anyhow.workspace = true
reqwest.workspace = true
serde.workspace = true
serde_json.workspace = true
sha2.workspace = true
workspace_hack.workspace = true
thiserror.workspace = true

View File

@@ -0,0 +1,634 @@
//! A lite version of the PostHog client that only supports local evaluation of feature flags.
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
use serde_json::json;
use sha2::Digest;
#[derive(Debug, thiserror::Error)]
pub enum PostHogEvaluationError {
/// The feature flag is not available, for example, because the local evaluation data is not populated yet.
#[error("Feature flag not available: {0}")]
NotAvailable(String),
#[error("No condition group is matched")]
NoConditionGroupMatched,
/// Real errors, e.g., the rollout percentage does not add up to 100.
#[error("Failed to evaluate feature flag: {0}")]
Internal(String),
}
#[derive(Deserialize)]
pub struct LocalEvaluationResponse {
#[allow(dead_code)]
flags: Vec<LocalEvaluationFlag>,
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlag {
key: String,
filters: LocalEvaluationFlagFilters,
active: bool,
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlagFilters {
groups: Vec<LocalEvaluationFlagFilterGroup>,
multivariate: LocalEvaluationFlagMultivariate,
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlagFilterGroup {
variant: Option<String>,
properties: Option<Vec<LocalEvaluationFlagFilterProperty>>,
rollout_percentage: i64,
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlagFilterProperty {
key: String,
value: PostHogFlagFilterPropertyValue,
operator: String,
}
#[derive(Debug, Serialize, Deserialize)]
#[serde(untagged)]
pub enum PostHogFlagFilterPropertyValue {
String(String),
Number(f64),
Boolean(bool),
List(Vec<String>),
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlagMultivariate {
variants: Vec<LocalEvaluationFlagMultivariateVariant>,
}
#[derive(Deserialize)]
pub struct LocalEvaluationFlagMultivariateVariant {
key: String,
rollout_percentage: i64,
}
pub struct FeatureStore {
flags: HashMap<String, LocalEvaluationFlag>,
}
impl Default for FeatureStore {
fn default() -> Self {
Self::new()
}
}
enum GroupEvaluationResult {
MatchedAndOverride(String),
MatchedAndEvaluate,
Unmatched,
}
impl FeatureStore {
pub fn new() -> Self {
Self {
flags: HashMap::new(),
}
}
pub fn set_flags(&mut self, flags: Vec<LocalEvaluationFlag>) {
self.flags.clear();
for flag in flags {
self.flags.insert(flag.key.clone(), flag);
}
}
/// Generate a consistent hash for a user ID (e.g., tenant ID).
///
/// The implementation is different from PostHog SDK. In PostHog SDK, it is sha1 of `user_id.distinct_id.salt`.
/// However, as we do not upload all of our tenant IDs to PostHog, we do not have the PostHog distinct_id for a
/// tenant. Therefore, the way we compute it is sha256 of `user_id.feature_id.salt`.
fn consistent_hash(user_id: &str, flag_key: &str, salt: &str) -> f64 {
let mut hasher = sha2::Sha256::new();
hasher.update(user_id);
hasher.update(".");
hasher.update(flag_key);
hasher.update(".");
hasher.update(salt);
let hash = hasher.finalize();
let hash_int = u64::from_le_bytes(hash[..8].try_into().unwrap());
hash_int as f64 / u64::MAX as f64
}
/// Evaluate a condition. Returns an error if the condition cannot be evaluated due to parsing error or missing
/// property.
fn evaluate_condition(
&self,
operator: &str,
provided: &PostHogFlagFilterPropertyValue,
requested: &PostHogFlagFilterPropertyValue,
) -> Result<bool, PostHogEvaluationError> {
match operator {
"exact" => {
let PostHogFlagFilterPropertyValue::String(provided) = provided else {
// Left should be a string
return Err(PostHogEvaluationError::Internal(format!(
"The left side of the condition is not a string: {:?}",
provided
)));
};
let PostHogFlagFilterPropertyValue::List(requested) = requested else {
// Right should be a list of string
return Err(PostHogEvaluationError::Internal(format!(
"The right side of the condition is not a list: {:?}",
requested
)));
};
Ok(requested.contains(provided))
}
"lt" | "gt" => {
let PostHogFlagFilterPropertyValue::String(requested) = requested else {
// Right should be a string
return Err(PostHogEvaluationError::Internal(format!(
"The right side of the condition is not a string: {:?}",
requested
)));
};
let Ok(requested) = requested.parse::<f64>() else {
return Err(PostHogEvaluationError::Internal(format!(
"Can not parse the right side of the condition as a number: {:?}",
requested
)));
};
// Left can either be a number or a string
let provided = match provided {
PostHogFlagFilterPropertyValue::Number(provided) => *provided,
PostHogFlagFilterPropertyValue::String(provided) => {
let Ok(provided) = provided.parse::<f64>() else {
return Err(PostHogEvaluationError::Internal(format!(
"Can not parse the left side of the condition as a number: {:?}",
provided
)));
};
provided
}
_ => {
return Err(PostHogEvaluationError::Internal(format!(
"The left side of the condition is not a number or a string: {:?}",
provided
)));
}
};
match operator {
"lt" => Ok(provided < requested),
"gt" => Ok(provided > requested),
op => Err(PostHogEvaluationError::Internal(format!(
"Unsupported operator: {}",
op
))),
}
}
_ => Err(PostHogEvaluationError::Internal(format!(
"Unsupported operator: {}",
operator
))),
}
}
/// Evaluate a percentage.
fn evaluate_percentage(&self, mapped_user_id: f64, percentage: i64) -> bool {
mapped_user_id <= percentage as f64 / 100.0
}
/// Evaluate a filter group for a feature flag. Returns an error if there are errors during the evaluation.
///
/// Return values:
/// Ok(GroupEvaluationResult::MatchedAndOverride(variant)): matched and evaluated to this value
/// Ok(GroupEvaluationResult::MatchedAndEvaluate): condition matched but no variant override, use the global rollout percentage
/// Ok(GroupEvaluationResult::Unmatched): condition unmatched
fn evaluate_group(
&self,
group: &LocalEvaluationFlagFilterGroup,
hash_on_group_rollout_percentage: f64,
provided_properties: &HashMap<String, PostHogFlagFilterPropertyValue>,
) -> Result<GroupEvaluationResult, PostHogEvaluationError> {
if let Some(ref properties) = group.properties {
for property in properties {
if let Some(value) = provided_properties.get(&property.key) {
// The user provided the property value
if !self.evaluate_condition(
property.operator.as_ref(),
value,
&property.value,
)? {
return Ok(GroupEvaluationResult::Unmatched);
}
} else {
// We cannot evaluate, the property is not available
return Err(PostHogEvaluationError::NotAvailable(format!(
"The required property in the condition is not available: {}",
property.key
)));
}
}
}
// The group has no condition matchers or we matched the properties
if self.evaluate_percentage(hash_on_group_rollout_percentage, group.rollout_percentage) {
if let Some(ref variant_override) = group.variant {
Ok(GroupEvaluationResult::MatchedAndOverride(
variant_override.clone(),
))
} else {
Ok(GroupEvaluationResult::MatchedAndEvaluate)
}
} else {
Ok(GroupEvaluationResult::Unmatched)
}
}
/// Evaluate a multivariate feature flag. Returns `None` if the flag is not available or if there are errors
/// during the evaluation.
///
/// The parsing logic is as follows:
///
/// * Match each filter group.
/// - If a group is matched, it will first determine whether the user is in the range of the group's rollout
/// percentage. We will generate a consistent hash for the user ID on the group rollout percentage. This hash
/// is shared across all groups.
/// - If the hash falls within the group's rollout percentage, return the variant if it's overridden, or
/// - Evaluate the variant using the global config and the global rollout percentage.
/// * Otherwise, continue with the next group until all groups are evaluated and no group is within the
/// rollout percentage.
/// * If there are no matching groups, return an error.
///
/// Example: we have a multivariate flag with 3 groups of the configured global rollout percentage: A (10%), B (20%), C (70%).
/// There is a single group with a condition that has a rollout percentage of 10% and it does not have a variant override.
/// Then, we will have 1% of the users evaluated to A, 2% to B, and 7% to C.
pub fn evaluate_multivariate(
&self,
flag_key: &str,
user_id: &str,
) -> Result<String, PostHogEvaluationError> {
let hash_on_global_rollout_percentage =
Self::consistent_hash(user_id, flag_key, "multivariate");
let hash_on_group_rollout_percentage =
Self::consistent_hash(user_id, flag_key, "within_group");
self.evaluate_multivariate_inner(
flag_key,
hash_on_global_rollout_percentage,
hash_on_group_rollout_percentage,
&HashMap::new(),
)
}
/// Evaluate a multivariate feature flag. Note that we directly take the mapped user ID
/// (a consistent hash ranging from 0 to 1) so that it is easier to use it in the tests
/// and avoid duplicate computations.
///
/// Use a different consistent hash for evaluating the group rollout percentage.
/// The behavior: if the condition is set to rolling out to 10% of the users, and
/// we set the variant A to 20% in the global config, then 2% of the total users will
/// be evaluated to variant A.
///
/// Note that the hash to determine group rollout percentage is shared across all groups. So if we have two
/// exactly-the-same conditions with 10% and 20% rollout percentage respectively, a total of 20% of the users
/// will be evaluated (versus 30% if group evaluation is done independently).
pub(crate) fn evaluate_multivariate_inner(
&self,
flag_key: &str,
hash_on_global_rollout_percentage: f64,
hash_on_group_rollout_percentage: f64,
properties: &HashMap<String, PostHogFlagFilterPropertyValue>,
) -> Result<String, PostHogEvaluationError> {
if let Some(flag_config) = self.flags.get(flag_key) {
if !flag_config.active {
return Err(PostHogEvaluationError::NotAvailable(format!(
"The feature flag is not active: {}",
flag_key
)));
}
// TODO: sort the groups so that variant overrides always get evaluated first and it follows the PostHog
// Python SDK behavior; for now we do not configure conditions without variant overrides in Neon so it
// does not matter.
for group in &flag_config.filters.groups {
match self.evaluate_group(group, hash_on_group_rollout_percentage, properties)? {
GroupEvaluationResult::MatchedAndOverride(variant) => return Ok(variant),
GroupEvaluationResult::MatchedAndEvaluate => {
let mut percentage = 0;
for variant in &flag_config.filters.multivariate.variants {
percentage += variant.rollout_percentage;
if self
.evaluate_percentage(hash_on_global_rollout_percentage, percentage)
{
return Ok(variant.key.clone());
}
}
// This should not happen because the rollout percentage always adds up to 100, but just in case that PostHog
// returned invalid spec, we return an error.
return Err(PostHogEvaluationError::Internal(format!(
"Rollout percentage does not add up to 100: {}",
flag_key
)));
}
GroupEvaluationResult::Unmatched => continue,
}
}
// If no group is matched, the feature is not available, and up to the caller to decide what to do.
Err(PostHogEvaluationError::NoConditionGroupMatched)
} else {
// The feature flag is not available yet
Err(PostHogEvaluationError::NotAvailable(format!(
"Not found in the local evaluation spec: {}",
flag_key
)))
}
}
}
/// A lite PostHog client.
///
/// At the point of writing this code, PostHog does not have a functional Rust client with feature flag support.
/// This is a lite version that only supports local evaluation of feature flags and only supports those JSON specs
/// that will be used within Neon.
///
/// PostHog is designed as a browser-server system: the browser (client) side uses the client key and is exposed
/// to the end users; the server side uses a server key and is not exposed to the end users. The client and the
/// server has different API keys and provide a different set of APIs. In Neon, we only have the server (that is
/// pageserver), and it will use both the client API and the server API. So we need to store two API keys within
/// our PostHog client.
///
/// The server API is used to fetch the feature flag specs. The client API is used to capture events in case we
/// want to report the feature flag usage back to PostHog. The current plan is to use PostHog only as an UI to
/// configure feature flags so it is very likely that the client API will not be used.
pub struct PostHogClient {
/// The server API key.
server_api_key: String,
/// The client API key.
client_api_key: String,
/// The project ID.
project_id: String,
/// The private API URL.
private_api_url: String,
/// The public API URL.
public_api_url: String,
/// The HTTP client.
client: reqwest::Client,
}
impl PostHogClient {
pub fn new(
server_api_key: String,
client_api_key: String,
project_id: String,
private_api_url: String,
public_api_url: String,
) -> Self {
let client = reqwest::Client::new();
Self {
server_api_key,
client_api_key,
project_id,
private_api_url,
public_api_url,
client,
}
}
pub fn new_with_us_region(
server_api_key: String,
client_api_key: String,
project_id: String,
) -> Self {
Self::new(
server_api_key,
client_api_key,
project_id,
"https://us.posthog.com".to_string(),
"https://us.i.posthog.com".to_string(),
)
}
/// Fetch the feature flag specs from the server.
///
/// This is unfortunately an undocumented API at:
/// - <https://posthog.com/docs/api/feature-flags#get-api-projects-project_id-feature_flags-local_evaluation>
/// - <https://posthog.com/docs/feature-flags/local-evaluation>
///
/// The handling logic in [`FeatureStore`] mostly follows the Python API implementation.
/// See `_compute_flag_locally` in <https://github.com/PostHog/posthog-python/blob/master/posthog/client.py>
pub async fn get_feature_flags_local_evaluation(
&self,
) -> anyhow::Result<LocalEvaluationResponse> {
// BASE_URL/api/projects/:project_id/feature_flags/local_evaluation
// with bearer token of self.server_api_key
let url = format!(
"{}/api/projects/{}/feature_flags/local_evaluation",
self.private_api_url, self.project_id
);
let response = self
.client
.get(url)
.bearer_auth(&self.server_api_key)
.send()
.await?;
let body = response.text().await?;
Ok(serde_json::from_str(&body)?)
}
/// Capture an event. This will only be used to report the feature flag usage back to PostHog, though
/// it also support a lot of other functionalities.
///
/// <https://posthog.com/docs/api/capture>
pub async fn capture_event(
&self,
event: &str,
distinct_id: &str,
properties: &HashMap<String, PostHogFlagFilterPropertyValue>,
) -> anyhow::Result<()> {
// PUBLIC_URL/capture/
// with bearer token of self.client_api_key
let url = format!("{}/capture/", self.public_api_url);
self.client
.post(url)
.body(serde_json::to_string(&json!({
"api_key": self.client_api_key,
"distinct_id": distinct_id,
"event": event,
"properties": properties,
}))?)
.send()
.await?;
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
fn data() -> &'static str {
r#"{
"flags": [
{
"id": 132794,
"team_id": 152860,
"name": "",
"key": "gc-compaction",
"filters": {
"groups": [
{
"variant": "enabled-stage-2",
"properties": [
{
"key": "plan_type",
"type": "person",
"value": [
"free"
],
"operator": "exact"
},
{
"key": "pageserver_remote_size",
"type": "person",
"value": "10000000",
"operator": "lt"
}
],
"rollout_percentage": 50
},
{
"properties": [
{
"key": "plan_type",
"type": "person",
"value": [
"free"
],
"operator": "exact"
},
{
"key": "pageserver_remote_size",
"type": "person",
"value": "10000000",
"operator": "lt"
}
],
"rollout_percentage": 80
}
],
"payloads": {},
"multivariate": {
"variants": [
{
"key": "disabled",
"name": "",
"rollout_percentage": 90
},
{
"key": "enabled-stage-1",
"name": "",
"rollout_percentage": 10
},
{
"key": "enabled-stage-2",
"name": "",
"rollout_percentage": 0
},
{
"key": "enabled-stage-3",
"name": "",
"rollout_percentage": 0
},
{
"key": "enabled",
"name": "",
"rollout_percentage": 0
}
]
}
},
"deleted": false,
"active": true,
"ensure_experience_continuity": false,
"has_encrypted_payloads": false,
"version": 6
}
],
"group_type_mapping": {},
"cohorts": {}
}"#
}
#[test]
fn parse_local_evaluation() {
let data = data();
let _: LocalEvaluationResponse = serde_json::from_str(data).unwrap();
}
#[test]
fn evaluate_multivariate() {
let mut store = FeatureStore::new();
let response: LocalEvaluationResponse = serde_json::from_str(data()).unwrap();
store.set_flags(response.flags);
// This lacks the required properties and cannot be evaluated.
let variant =
store.evaluate_multivariate_inner("gc-compaction", 1.00, 0.40, &HashMap::new());
assert!(matches!(
variant,
Err(PostHogEvaluationError::NotAvailable(_))
),);
let properties_unmatched = HashMap::from([
(
"plan_type".to_string(),
PostHogFlagFilterPropertyValue::String("paid".to_string()),
),
(
"pageserver_remote_size".to_string(),
PostHogFlagFilterPropertyValue::Number(1000.0),
),
]);
// This does not match any group so there will be an error.
let variant =
store.evaluate_multivariate_inner("gc-compaction", 1.00, 0.40, &properties_unmatched);
assert!(matches!(
variant,
Err(PostHogEvaluationError::NoConditionGroupMatched)
),);
let variant =
store.evaluate_multivariate_inner("gc-compaction", 0.80, 0.80, &properties_unmatched);
assert!(matches!(
variant,
Err(PostHogEvaluationError::NoConditionGroupMatched)
),);
let properties = HashMap::from([
(
"plan_type".to_string(),
PostHogFlagFilterPropertyValue::String("free".to_string()),
),
(
"pageserver_remote_size".to_string(),
PostHogFlagFilterPropertyValue::Number(1000.0),
),
]);
// It matches the first group as 0.10 <= 0.50 and the properties are matched. Then it gets evaluated to the variant override.
let variant = store.evaluate_multivariate_inner("gc-compaction", 0.10, 0.10, &properties);
assert_eq!(variant.unwrap(), "enabled-stage-2".to_string());
// It matches the second group as 0.50 <= 0.60 <= 0.80 and the properties are matched. Then it gets evaluated using the global percentage.
let variant = store.evaluate_multivariate_inner("gc-compaction", 0.99, 0.60, &properties);
assert_eq!(variant.unwrap(), "enabled-stage-1".to_string());
let variant = store.evaluate_multivariate_inner("gc-compaction", 0.80, 0.60, &properties);
assert_eq!(variant.unwrap(), "disabled".to_string());
// It matches the group conditions but not the group rollout percentage.
let variant = store.evaluate_multivariate_inner("gc-compaction", 1.00, 0.90, &properties);
assert!(matches!(
variant,
Err(PostHogEvaluationError::NoConditionGroupMatched)
),);
}
}

View File

@@ -330,11 +330,18 @@ impl AzureBlobStorage {
if let Err(DownloadError::Timeout) = &next_item {
timeout_try_cnt += 1;
if timeout_try_cnt <= 5 {
continue;
continue 'outer;
}
}
let next_item = next_item?;
let next_item = match next_item {
Ok(next_item) => next_item,
Err(e) => {
// The error is potentially retryable, so we must rewind the loop after yielding.
yield Err(e);
continue 'outer;
},
};
// Log a warning if we saw two timeouts in a row before a successful request
if timeout_try_cnt > 2 {

View File

@@ -657,7 +657,14 @@ impl RemoteStorage for S3Bucket {
res = request => Ok(res),
_ = tokio::time::sleep(self.timeout) => Err(DownloadError::Timeout),
_ = cancel.cancelled() => Err(DownloadError::Cancelled),
}?;
};
if let Err(DownloadError::Timeout) = &response {
yield Err(DownloadError::Timeout);
continue 'outer;
}
let response = response?; // always yield cancellation errors and stop the stream
let response = response
.context("Failed to list S3 prefixes")

View File

@@ -1,7 +1,7 @@
use std::borrow::Cow;
use std::fs::{self, File};
use std::io::{self, Write};
use std::os::fd::AsRawFd;
use std::os::fd::AsFd;
use camino::{Utf8Path, Utf8PathBuf};
@@ -210,13 +210,13 @@ pub fn overwrite(
/// Syncs the filesystem for the given file descriptor.
#[cfg_attr(target_os = "macos", allow(unused_variables))]
pub fn syncfs(fd: impl AsRawFd) -> anyhow::Result<()> {
pub fn syncfs(fd: impl AsFd) -> anyhow::Result<()> {
// Linux guarantees durability for syncfs.
// POSIX doesn't have syncfs, and further does not actually guarantee durability of sync().
#[cfg(target_os = "linux")]
{
use anyhow::Context;
nix::unistd::syncfs(fd.as_raw_fd()).context("syncfs")?;
nix::unistd::syncfs(fd).context("syncfs")?;
}
#[cfg(target_os = "macos")]
{

View File

@@ -11,9 +11,9 @@ pub fn rename_noreplace<P1: ?Sized + NixPath, P2: ?Sized + NixPath>(
#[cfg(all(target_os = "linux", target_env = "gnu"))]
{
nix::fcntl::renameat2(
None,
nix::fcntl::AT_FDCWD,
src,
None,
nix::fcntl::AT_FDCWD,
dst,
nix::fcntl::RenameFlags::RENAME_NOREPLACE,
)

View File

@@ -1,6 +1,6 @@
//! A module to create and read lock files.
//!
//! File locking is done using [`fcntl::flock`] exclusive locks.
//! File locking is done using [`nix::fcntl::Flock`] exclusive locks.
//! The only consumer of this module is currently
//! [`pid_file`](crate::pid_file). See the module-level comment
//! there for potential pitfalls with lock files that are used
@@ -9,26 +9,25 @@
use std::fs;
use std::io::{Read, Write};
use std::ops::Deref;
use std::os::unix::prelude::AsRawFd;
use anyhow::Context;
use camino::{Utf8Path, Utf8PathBuf};
use nix::errno::Errno::EAGAIN;
use nix::fcntl;
use nix::fcntl::{Flock, FlockArg};
use crate::crashsafe;
/// A handle to an open and unlocked, but not-yet-written lock file.
/// A handle to an open and flocked, but not-yet-written lock file.
/// Returned by [`create_exclusive`].
#[must_use]
pub struct UnwrittenLockFile {
path: Utf8PathBuf,
file: fs::File,
file: Flock<fs::File>,
}
/// Returned by [`UnwrittenLockFile::write_content`].
#[must_use]
pub struct LockFileGuard(fs::File);
pub struct LockFileGuard(Flock<fs::File>);
impl Deref for LockFileGuard {
type Target = fs::File;
@@ -67,17 +66,14 @@ pub fn create_exclusive(lock_file_path: &Utf8Path) -> anyhow::Result<UnwrittenLo
.open(lock_file_path)
.context("open lock file")?;
let res = fcntl::flock(
lock_file.as_raw_fd(),
fcntl::FlockArg::LockExclusiveNonblock,
);
let res = Flock::lock(lock_file, FlockArg::LockExclusiveNonblock);
match res {
Ok(()) => Ok(UnwrittenLockFile {
Ok(lock_file) => Ok(UnwrittenLockFile {
path: lock_file_path.to_owned(),
file: lock_file,
}),
Err(EAGAIN) => anyhow::bail!("file is already locked"),
Err(e) => Err(e).context("flock error"),
Err((_, EAGAIN)) => anyhow::bail!("file is already locked"),
Err((_, e)) => Err(e).context("flock error"),
}
}
@@ -105,32 +101,37 @@ pub enum LockFileRead {
/// Check the [`LockFileRead`] variants for details.
pub fn read_and_hold_lock_file(path: &Utf8Path) -> anyhow::Result<LockFileRead> {
let res = fs::OpenOptions::new().read(true).open(path);
let mut lock_file = match res {
let lock_file = match res {
Ok(f) => f,
Err(e) => match e.kind() {
std::io::ErrorKind::NotFound => return Ok(LockFileRead::NotExist),
_ => return Err(e).context("open lock file"),
},
};
let res = fcntl::flock(
lock_file.as_raw_fd(),
fcntl::FlockArg::LockExclusiveNonblock,
);
let res = Flock::lock(lock_file, FlockArg::LockExclusiveNonblock);
// We need the content regardless of lock success / failure.
// But, read it after flock so that, if it succeeded, the content is consistent.
let mut content = String::new();
lock_file
.read_to_string(&mut content)
.context("read lock file")?;
match res {
Ok(()) => Ok(LockFileRead::NotHeldByAnyProcess(
LockFileGuard(lock_file),
content,
)),
Err(EAGAIN) => Ok(LockFileRead::LockedByOtherProcess {
not_locked_file: lock_file,
content,
}),
Err(e) => Err(e).context("flock error"),
Ok(mut locked_file) => {
let mut content = String::new();
locked_file
.read_to_string(&mut content)
.context("read lock file")?;
Ok(LockFileRead::NotHeldByAnyProcess(
LockFileGuard(locked_file),
content,
))
}
Err((mut not_locked_file, EAGAIN)) => {
let mut content = String::new();
not_locked_file
.read_to_string(&mut content)
.context("read lock file")?;
Ok(LockFileRead::LockedByOtherProcess {
not_locked_file,
content,
})
}
Err((_, e)) => Err(e).context("flock error"),
}
}

View File

@@ -127,12 +127,12 @@ macro_rules! __check_fields_present {
match check_fields_present0($extractors) {
Ok(FoundEverything) => Ok(()),
Ok(Unconfigured) if cfg!(test) => {
Ok(Unconfigured) if cfg!(feature = "testing") => {
// allow unconfigured in tests
Ok(())
},
Ok(Unconfigured) => {
panic!("utils::tracing_span_assert: outside of #[cfg(test)] expected tracing to be configured with tracing_error::ErrorLayer")
panic!(r#"utils::tracing_span_assert: outside of #[cfg(feature = "testing")] expected tracing to be configured with tracing_error::ErrorLayer"#)
},
Err(missing) => Err(missing)
}

View File

@@ -96,6 +96,7 @@ strum.workspace = true
strum_macros.workspace = true
wal_decoder.workspace = true
smallvec.workspace = true
twox-hash.workspace = true
[target.'cfg(target_os = "linux")'.dependencies]
procfs.workspace = true

View File

@@ -1,5 +1,6 @@
use std::collections::HashMap;
use std::error::Error as _;
use std::time::Duration;
use bytes::Bytes;
use detach_ancestor::AncestorDetached;
@@ -819,4 +820,25 @@ impl Client {
.await
.map(|resp| resp.status())
}
pub async fn activate_post_import(
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
activate_timeline_timeout: Duration,
) -> Result<TimelineInfo> {
let uri = format!(
"{}/v1/tenant/{}/timeline/{}/activate_post_import?timeline_activate_timeout_ms={}",
self.mgmt_api_endpoint,
tenant_shard_id,
timeline_id,
activate_timeline_timeout.as_millis()
);
self.request(Method::PUT, uri, ())
.await?
.json()
.await
.map_err(Error::ReceiveBody)
}
}

View File

@@ -65,6 +65,9 @@ pub(crate) struct Args {
#[clap(long, default_value = "1")]
queue_depth: NonZeroUsize,
#[clap(long)]
only_relnode: Option<u32>,
targets: Option<Vec<TenantTimelineId>>,
}
@@ -206,7 +209,12 @@ async fn main_impl(
for r in partitioning.keys.ranges.iter() {
let mut i = r.start;
while i != r.end {
if i.is_rel_block_key() {
let mut include = true;
include &= i.is_rel_block_key();
if let Some(only_relnode) = args.only_relnode {
include &= i.is_rel_block_of_rel(only_relnode);
}
if include {
filtered.add_key(i);
}
i = i.next();

View File

@@ -7,7 +7,7 @@ use pageserver_api::models::ShardImportStatus;
use pageserver_api::shard::TenantShardId;
use pageserver_api::upcall_api::{
PutTimelineImportStatusRequest, ReAttachRequest, ReAttachResponse, ReAttachResponseTenant,
ValidateRequest, ValidateRequestTenant, ValidateResponse,
TimelineImportStatusRequest, ValidateRequest, ValidateRequestTenant, ValidateResponse,
};
use reqwest::Certificate;
use serde::Serialize;
@@ -51,8 +51,15 @@ pub trait StorageControllerUpcallApi {
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
generation: Generation,
status: ShardImportStatus,
) -> impl Future<Output = Result<(), RetryForeverError>> + Send;
fn get_timeline_import_status(
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
generation: Generation,
) -> impl Future<Output = Result<ShardImportStatus, RetryForeverError>> + Send;
}
impl StorageControllerUpcallClient {
@@ -97,6 +104,7 @@ impl StorageControllerUpcallClient {
&self,
url: &url::Url,
request: R,
method: reqwest::Method,
) -> Result<T, RetryForeverError>
where
R: Serialize,
@@ -106,7 +114,7 @@ impl StorageControllerUpcallClient {
|| async {
let response = self
.http_client
.post(url.clone())
.request(method.clone(), url.clone())
.json(&request)
.send()
.await?;
@@ -215,7 +223,9 @@ impl StorageControllerUpcallApi for StorageControllerUpcallClient {
register: register.clone(),
};
let response: ReAttachResponse = self.retry_http_forever(&url, request).await?;
let response: ReAttachResponse = self
.retry_http_forever(&url, request, reqwest::Method::POST)
.await?;
tracing::info!(
"Received re-attach response with {} tenants (node {}, register: {:?})",
response.tenants.len(),
@@ -268,7 +278,9 @@ impl StorageControllerUpcallApi for StorageControllerUpcallClient {
return Err(RetryForeverError::ShuttingDown);
}
let response: ValidateResponse = self.retry_http_forever(&url, request).await?;
let response: ValidateResponse = self
.retry_http_forever(&url, request, reqwest::Method::POST)
.await?;
for rt in response.tenants {
result.insert(rt.id, rt.valid);
}
@@ -287,6 +299,7 @@ impl StorageControllerUpcallApi for StorageControllerUpcallClient {
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
generation: Generation,
status: ShardImportStatus,
) -> Result<(), RetryForeverError> {
let url = self
@@ -297,9 +310,35 @@ impl StorageControllerUpcallApi for StorageControllerUpcallClient {
let request = PutTimelineImportStatusRequest {
tenant_shard_id,
timeline_id,
generation,
status,
};
self.retry_http_forever(&url, request).await
self.retry_http_forever(&url, request, reqwest::Method::POST)
.await
}
#[tracing::instrument(skip_all)] // so that warning logs from retry_http_forever have context
async fn get_timeline_import_status(
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
generation: Generation,
) -> Result<ShardImportStatus, RetryForeverError> {
let url = self
.base_url
.join("timeline_import_status")
.expect("Failed to build path");
let request = TimelineImportStatusRequest {
tenant_shard_id,
timeline_id,
generation,
};
let response: ShardImportStatus = self
.retry_http_forever(&url, request, reqwest::Method::GET)
.await?;
Ok(response)
}
}

View File

@@ -663,6 +663,7 @@ mod test {
use camino::Utf8Path;
use hex_literal::hex;
use pageserver_api::key::Key;
use pageserver_api::models::ShardImportStatus;
use pageserver_api::shard::ShardIndex;
use pageserver_api::upcall_api::ReAttachResponseTenant;
use remote_storage::{RemoteStorageConfig, RemoteStorageKind};
@@ -792,10 +793,20 @@ mod test {
&self,
_tenant_shard_id: TenantShardId,
_timeline_id: TimelineId,
_generation: Generation,
_status: pageserver_api::models::ShardImportStatus,
) -> Result<(), RetryForeverError> {
unimplemented!()
}
async fn get_timeline_import_status(
&self,
_tenant_shard_id: TenantShardId,
_timeline_id: TimelineId,
_generation: Generation,
) -> Result<ShardImportStatus, RetryForeverError> {
unimplemented!()
}
}
async fn setup(test_name: &str) -> anyhow::Result<TestSetup> {

View File

@@ -3500,6 +3500,107 @@ async fn put_tenant_timeline_import_wal(
}.instrument(span).await
}
/// Activate a timeline after its import has completed
///
/// The endpoint is idempotent and callers are expected to retry all
/// errors until a successful response.
async fn activate_post_import_handler(
request: Request<Body>,
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let timeline_id: TimelineId = parse_request_param(&request, "timeline_id")?;
const DEFAULT_ACTIVATE_TIMEOUT: Duration = Duration::from_secs(1);
let activate_timeout = parse_query_param(&request, "timeline_activate_timeout_ms")?
.map(Duration::from_millis)
.unwrap_or(DEFAULT_ACTIVATE_TIMEOUT);
let span = info_span!(
"activate_post_import_handler",
tenant_id=%tenant_shard_id.tenant_id,
timeline_id=%timeline_id,
shard_id=%tenant_shard_id.shard_slug()
);
async move {
let state = get_state(&request);
let tenant = state
.tenant_manager
.get_attached_tenant_shard(tenant_shard_id)?;
tenant.wait_to_become_active(ACTIVE_TENANT_TIMEOUT).await?;
tenant
.finalize_importing_timeline(timeline_id)
.await
.map_err(ApiError::InternalServerError)?;
match tenant.get_timeline(timeline_id, false) {
Ok(_timeline) => {
// Timeline is already visible. Reset not required: fall through.
}
Err(GetTimelineError::NotFound { .. }) => {
// This is crude: we reset the whole tenant such that the new timeline is detected
// and activated. We can come up with something more granular in the future.
//
// Note that we only reset the tenant if required: when the timeline is
// not present in [`Tenant::timelines`].
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Warn);
state
.tenant_manager
.reset_tenant(tenant_shard_id, false, &ctx)
.await
.map_err(ApiError::InternalServerError)?;
}
Err(GetTimelineError::ShuttingDown) => {
return Err(ApiError::ShuttingDown);
}
Err(GetTimelineError::NotActive { .. }) => {
unreachable!("Called get_timeline with active_only=false");
}
}
let timeline = tenant.get_timeline(timeline_id, false)?;
let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Warn)
.with_scope_timeline(&timeline);
let result =
tokio::time::timeout(activate_timeout, timeline.wait_to_become_active(&ctx)).await;
match result {
Ok(Ok(())) => {
// fallthrough
}
// Timeline reached some other state that's not active
// TODO(vlad): if the tenant is broken, return a permananet error
Ok(Err(_timeline_state)) => {
return Err(ApiError::InternalServerError(anyhow::anyhow!(
"Timeline activation failed"
)));
}
// Activation timed out
Err(_) => {
return Err(ApiError::Timeout("Timeline activation timed out".into()));
}
}
let timeline_info = build_timeline_info(
&timeline, false, // include_non_incremental_logical_size,
false, // force_await_initial_logical_size
&ctx,
)
.await
.context("get local timeline info")
.map_err(ApiError::InternalServerError)?;
json_response(StatusCode::OK, timeline_info)
}
.instrument(span)
.await
}
/// Read the end of a tar archive.
///
/// A tar archive normally ends with two consecutive blocks of zeros, 512 bytes each.
@@ -3924,5 +4025,9 @@ pub fn make_router(
"/v1/tenant/:tenant_id/timeline/:timeline_id/import_wal",
|r| api_handler(r, put_tenant_timeline_import_wal),
)
.put(
"/v1/tenant/:tenant_shard_id/timeline/:timeline_id/activate_post_import",
|r| api_handler(r, activate_post_import_handler),
)
.any(handler_404))
}

View File

@@ -1278,7 +1278,7 @@ impl PageServerHandler {
}
#[instrument(level = tracing::Level::DEBUG, skip_all)]
async fn pagesteam_handle_batched_message<IO>(
async fn pagestream_handle_batched_message<IO>(
&mut self,
pgb_writer: &mut PostgresBackend<IO>,
batch: BatchedFeMessage,
@@ -1733,7 +1733,7 @@ impl PageServerHandler {
};
let result = self
.pagesteam_handle_batched_message(
.pagestream_handle_batched_message(
pgb_writer,
msg,
io_concurrency.clone(),
@@ -1909,7 +1909,7 @@ impl PageServerHandler {
return Err(e);
}
};
self.pagesteam_handle_batched_message(
self.pagestream_handle_batched_message(
pgb_writer,
batch,
io_concurrency.clone(),

View File

@@ -50,6 +50,7 @@ use remote_timeline_client::{
use secondary::heatmap::{HeatMapTenant, HeatMapTimeline};
use storage_broker::BrokerClientChannel;
use timeline::compaction::{CompactionOutcome, GcCompactionQueue};
use timeline::import_pgdata::ImportingTimeline;
use timeline::offload::{OffloadError, offload_timeline};
use timeline::{
CompactFlags, CompactOptions, CompactionError, PreviousHeatmap, ShutdownMode, import_pgdata,
@@ -284,6 +285,19 @@ pub struct TenantShard {
/// **Lock order**: if acquiring all (or a subset), acquire them in order `timelines`, `timelines_offloaded`, `timelines_creating`
timelines_offloaded: Mutex<HashMap<TimelineId, Arc<OffloadedTimeline>>>,
/// Tracks the timelines that are currently importing into this tenant shard.
///
/// Note that importing timelines are also present in [`Self::timelines_creating`].
/// Keep this in mind when ordering lock acquisition.
///
/// Lifetime:
/// * An imported timeline is created while scanning the bucket on tenant attach
/// if the index part contains an `import_pgdata` entry and said field marks the import
/// as in progress.
/// * Imported timelines are removed when the storage controller calls the post timeline
/// import activation endpoint.
timelines_importing: std::sync::Mutex<HashMap<TimelineId, ImportingTimeline>>,
/// The last tenant manifest known to be in remote storage. None if the manifest has not yet
/// been either downloaded or uploaded. Always Some after tenant attach.
///
@@ -923,19 +937,10 @@ enum StartCreatingTimelineResult {
#[allow(clippy::large_enum_variant, reason = "TODO")]
enum TimelineInitAndSyncResult {
ReadyToActivate(Arc<Timeline>),
ReadyToActivate,
NeedsSpawnImportPgdata(TimelineInitAndSyncNeedsSpawnImportPgdata),
}
impl TimelineInitAndSyncResult {
fn ready_to_activate(self) -> Option<Arc<Timeline>> {
match self {
Self::ReadyToActivate(timeline) => Some(timeline),
_ => None,
}
}
}
#[must_use]
struct TimelineInitAndSyncNeedsSpawnImportPgdata {
timeline: Arc<Timeline>,
@@ -1012,10 +1017,6 @@ enum CreateTimelineCause {
enum LoadTimelineCause {
Attach,
Unoffload,
ImportPgdata {
create_guard: TimelineCreateGuard,
activate: ActivateTimelineArgs,
},
}
#[derive(thiserror::Error, Debug)]
@@ -1097,7 +1098,7 @@ impl TenantShard {
self: &Arc<Self>,
timeline_id: TimelineId,
resources: TimelineResources,
mut index_part: IndexPart,
index_part: IndexPart,
metadata: TimelineMetadata,
previous_heatmap: Option<PreviousHeatmap>,
ancestor: Option<Arc<Timeline>>,
@@ -1106,7 +1107,7 @@ impl TenantShard {
) -> anyhow::Result<TimelineInitAndSyncResult> {
let tenant_id = self.tenant_shard_id;
let import_pgdata = index_part.import_pgdata.take();
let import_pgdata = index_part.import_pgdata.clone();
let idempotency = match &import_pgdata {
Some(import_pgdata) => {
CreateTimelineIdempotency::ImportPgdata(CreatingTimelineIdempotencyImportPgdata {
@@ -1127,7 +1128,7 @@ impl TenantShard {
}
};
let (timeline, timeline_ctx) = self.create_timeline_struct(
let (timeline, _timeline_ctx) = self.create_timeline_struct(
timeline_id,
&metadata,
previous_heatmap,
@@ -1197,14 +1198,6 @@ impl TenantShard {
match import_pgdata {
Some(import_pgdata) if !import_pgdata.is_done() => {
match cause {
LoadTimelineCause::Attach | LoadTimelineCause::Unoffload => (),
LoadTimelineCause::ImportPgdata { .. } => {
unreachable!(
"ImportPgdata should not be reloading timeline import is done and persisted as such in s3"
)
}
}
let mut guard = self.timelines_creating.lock().unwrap();
if !guard.insert(timeline_id) {
// We should never try and load the same timeline twice during startup
@@ -1260,26 +1253,7 @@ impl TenantShard {
"Timeline has no ancestor and no layer files"
);
match cause {
LoadTimelineCause::Attach | LoadTimelineCause::Unoffload => (),
LoadTimelineCause::ImportPgdata {
create_guard,
activate,
} => {
// TODO: see the comment in the task code above how I'm not so certain
// it is safe to activate here because of concurrent shutdowns.
match activate {
ActivateTimelineArgs::Yes { broker_client } => {
info!("activating timeline after reload from pgdata import task");
timeline.activate(self.clone(), broker_client, None, &timeline_ctx);
}
ActivateTimelineArgs::No => (),
}
drop(create_guard);
}
}
Ok(TimelineInitAndSyncResult::ReadyToActivate(timeline))
Ok(TimelineInitAndSyncResult::ReadyToActivate)
}
}
}
@@ -1768,7 +1742,7 @@ impl TenantShard {
})?;
match effect {
TimelineInitAndSyncResult::ReadyToActivate(_) => {
TimelineInitAndSyncResult::ReadyToActivate => {
// activation happens later, on Tenant::activate
}
TimelineInitAndSyncResult::NeedsSpawnImportPgdata(
@@ -1778,13 +1752,24 @@ impl TenantShard {
guard,
},
) => {
tokio::task::spawn(self.clone().create_timeline_import_pgdata_task(
timeline,
import_pgdata,
ActivateTimelineArgs::No,
guard,
ctx.detached_child(TaskKind::ImportPgdata, DownloadBehavior::Warn),
));
let timeline_id = timeline.timeline_id;
let import_task_handle =
tokio::task::spawn(self.clone().create_timeline_import_pgdata_task(
timeline.clone(),
import_pgdata,
guard,
ctx.detached_child(TaskKind::ImportPgdata, DownloadBehavior::Warn),
));
let prev = self.timelines_importing.lock().unwrap().insert(
timeline_id,
ImportingTimeline {
timeline: timeline.clone(),
import_task_handle,
},
);
assert!(prev.is_none());
}
}
}
@@ -2678,14 +2663,7 @@ impl TenantShard {
.await?
}
CreateTimelineParams::ImportPgdata(params) => {
self.create_timeline_import_pgdata(
params,
ActivateTimelineArgs::Yes {
broker_client: broker_client.clone(),
},
ctx,
)
.await?
self.create_timeline_import_pgdata(params, ctx).await?
}
};
@@ -2759,7 +2737,6 @@ impl TenantShard {
async fn create_timeline_import_pgdata(
self: &Arc<Self>,
params: CreateTimelineParamsImportPgdata,
activate: ActivateTimelineArgs,
ctx: &RequestContext,
) -> Result<CreateTimelineResult, CreateTimelineError> {
let CreateTimelineParamsImportPgdata {
@@ -2840,24 +2817,71 @@ impl TenantShard {
let (timeline, timeline_create_guard) = uninit_timeline.finish_creation_myself();
tokio::spawn(self.clone().create_timeline_import_pgdata_task(
let import_task_handle = tokio::spawn(self.clone().create_timeline_import_pgdata_task(
timeline.clone(),
index_part,
activate,
timeline_create_guard,
timeline_ctx.detached_child(TaskKind::ImportPgdata, DownloadBehavior::Warn),
));
let prev = self.timelines_importing.lock().unwrap().insert(
timeline.timeline_id,
ImportingTimeline {
timeline: timeline.clone(),
import_task_handle,
},
);
// Idempotency is enforced higher up the stack
assert!(prev.is_none());
// NB: the timeline doesn't exist in self.timelines at this point
Ok(CreateTimelineResult::ImportSpawned(timeline))
}
/// Finalize the import of a timeline on this shard by marking it complete in
/// the index part. If the import task hasn't finished yet, returns an error.
///
/// This method is idempotent. If the import was finalized once, the next call
/// will be a no-op.
pub(crate) async fn finalize_importing_timeline(
&self,
timeline_id: TimelineId,
) -> anyhow::Result<()> {
let timeline = {
let locked = self.timelines_importing.lock().unwrap();
match locked.get(&timeline_id) {
Some(importing_timeline) => {
if !importing_timeline.import_task_handle.is_finished() {
return Err(anyhow::anyhow!("Import task not done yet"));
}
importing_timeline.timeline.clone()
}
None => {
return Ok(());
}
}
};
timeline
.remote_client
.schedule_index_upload_for_import_pgdata_finalize()?;
timeline.remote_client.wait_completion().await?;
self.timelines_importing
.lock()
.unwrap()
.remove(&timeline_id);
Ok(())
}
#[instrument(skip_all, fields(tenant_id=%self.tenant_shard_id.tenant_id, shard_id=%self.tenant_shard_id.shard_slug(), timeline_id=%timeline.timeline_id))]
async fn create_timeline_import_pgdata_task(
self: Arc<TenantShard>,
timeline: Arc<Timeline>,
index_part: import_pgdata::index_part_format::Root,
activate: ActivateTimelineArgs,
timeline_create_guard: TimelineCreateGuard,
ctx: RequestContext,
) {
@@ -2869,7 +2893,6 @@ impl TenantShard {
.create_timeline_import_pgdata_task_impl(
timeline,
index_part,
activate,
timeline_create_guard,
ctx,
)
@@ -2885,60 +2908,15 @@ impl TenantShard {
self: Arc<TenantShard>,
timeline: Arc<Timeline>,
index_part: import_pgdata::index_part_format::Root,
activate: ActivateTimelineArgs,
timeline_create_guard: TimelineCreateGuard,
_timeline_create_guard: TimelineCreateGuard,
ctx: RequestContext,
) -> Result<(), anyhow::Error> {
info!("importing pgdata");
let ctx = ctx.with_scope_timeline(&timeline);
import_pgdata::doit(&timeline, index_part, &ctx, self.cancel.clone())
.await
.context("import")?;
info!("import done");
//
// Reload timeline from remote.
// This proves that the remote state is attachable, and it reuses the code.
//
// TODO: think about whether this is safe to do with concurrent TenantShard::shutdown.
// timeline_create_guard hols the tenant gate open, so, shutdown cannot _complete_ until we exit.
// But our activate() call might launch new background tasks after TenantShard::shutdown
// already went past shutting down the TenantShard::timelines, which this timeline here is no part of.
// I think the same problem exists with the bootstrap & branch mgmt API tasks (tenant shutting
// down while bootstrapping/branching + activating), but, the race condition is much more likely
// to manifest because of the long runtime of this import task.
// in theory this shouldn't even .await anything except for coop yield
info!("shutting down timeline");
timeline.shutdown(ShutdownMode::Hard).await;
info!("timeline shut down, reloading from remote");
// TODO: we can't do the following check because create_timeline_import_pgdata must return an Arc<Timeline>
// let Some(timeline) = Arc::into_inner(timeline) else {
// anyhow::bail!("implementation error: timeline that we shut down was still referenced from somewhere");
// };
let timeline_id = timeline.timeline_id;
// load from object storage like TenantShard::attach does
let resources = self.build_timeline_resources(timeline_id);
let index_part = resources
.remote_client
.download_index_file(&self.cancel)
.await?;
let index_part = match index_part {
MaybeDeletedIndexPart::Deleted(_) => {
// likely concurrent delete call, cplane should prevent this
anyhow::bail!(
"index part says deleted but we are not done creating yet, this should not happen but"
)
}
MaybeDeletedIndexPart::IndexPart(p) => p,
};
let metadata = index_part.metadata.clone();
self
.load_remote_timeline(timeline_id, index_part, metadata, None, resources, LoadTimelineCause::ImportPgdata{
create_guard: timeline_create_guard, activate, }, &ctx)
.await?
.ready_to_activate()
.context("implementation error: reloaded timeline still needs import after import reported success")?;
info!("import done - waiting for activation");
anyhow::Ok(())
}
@@ -3475,6 +3453,14 @@ impl TenantShard {
timeline.defuse_for_tenant_drop();
});
}
{
let mut timelines_importing = self.timelines_importing.lock().unwrap();
timelines_importing
.drain()
.for_each(|(_timeline_id, importing_timeline)| {
importing_timeline.shutdown();
});
}
// test_long_timeline_create_then_tenant_delete is leaning on this message
tracing::info!("Waiting for timelines...");
while let Some(res) = js.join_next().await {
@@ -3949,13 +3935,6 @@ where
Ok(result)
}
enum ActivateTimelineArgs {
Yes {
broker_client: storage_broker::BrokerClientChannel,
},
No,
}
impl TenantShard {
pub fn tenant_specific_overrides(&self) -> pageserver_api::models::TenantConfig {
self.tenant_conf.load().tenant_conf.clone()
@@ -4322,6 +4301,7 @@ impl TenantShard {
timelines: Mutex::new(HashMap::new()),
timelines_creating: Mutex::new(HashSet::new()),
timelines_offloaded: Mutex::new(HashMap::new()),
timelines_importing: Mutex::new(HashMap::new()),
remote_tenant_manifest: Default::default(),
gc_cs: tokio::sync::Mutex::new(()),
walredo_mgr,

View File

@@ -949,6 +949,35 @@ impl RemoteTimelineClient {
Ok(())
}
/// If the `import_pgdata` field marks the timeline as having an import in progress,
/// launch an index-file upload operation that transitions it to done in the background
pub(crate) fn schedule_index_upload_for_import_pgdata_finalize(
self: &Arc<Self>,
) -> anyhow::Result<()> {
use import_pgdata::index_part_format;
let mut guard = self.upload_queue.lock().unwrap();
let upload_queue = guard.initialized_mut()?;
let to_update = match &upload_queue.dirty.import_pgdata {
Some(import) if !import.is_done() => Some(import),
Some(_) | None => None,
};
if let Some(old) = to_update {
let new =
index_part_format::Root::V1(index_part_format::V1::Done(index_part_format::Done {
idempotency_key: old.idempotency_key().clone(),
started_at: *old.started_at(),
finished_at: chrono::Utc::now().naive_utc(),
}));
upload_queue.dirty.import_pgdata = Some(new);
self.schedule_index_upload(upload_queue);
}
Ok(())
}
/// Launch an index-file upload operation in the background, setting `gc_compaction_state` field.
pub(crate) fn schedule_index_upload_for_gc_compaction_state_update(
self: &Arc<Self>,

View File

@@ -668,7 +668,9 @@ impl From<DownloadError> for UpdateError {
impl From<std::io::Error> for UpdateError {
fn from(value: std::io::Error) -> Self {
if let Some(nix::errno::Errno::ENOSPC) = value.raw_os_error().map(nix::errno::from_i32) {
if let Some(nix::errno::Errno::ENOSPC) =
value.raw_os_error().map(nix::errno::Errno::from_raw)
{
UpdateError::NoSpace
} else if value
.get_ref()

View File

@@ -3435,6 +3435,7 @@ impl Timeline {
// Step 2: Produce images+deltas.
let mut accumulated_values = Vec::new();
let mut accumulated_values_estimated_size = 0;
let mut last_key: Option<Key> = None;
// Only create image layers when there is no ancestor branches. TODO: create covering image layer
@@ -3611,12 +3612,16 @@ impl Timeline {
if last_key.is_none() {
last_key = Some(key);
}
accumulated_values_estimated_size += val.estimated_size();
accumulated_values.push((key, lsn, val));
if accumulated_values.len() >= 65536 {
// Assume all of them are images, that would be 512MB of data in memory for a single key.
// Accumulated values should never exceed 512MB.
if accumulated_values_estimated_size >= 1024 * 1024 * 512 {
return Err(CompactionError::Other(anyhow!(
"too many values for a single key, giving up gc-compaction"
"too many values for a single key: {} for key {}, {} items",
accumulated_values_estimated_size,
key,
accumulated_values.len()
)));
}
} else {
@@ -3651,6 +3656,7 @@ impl Timeline {
.map_err(CompactionError::Other)?;
accumulated_values.clear();
*last_key = key;
accumulated_values_estimated_size = val.estimated_size();
accumulated_values.push((key, lsn, val));
}
}

View File

@@ -1,8 +1,10 @@
use std::sync::Arc;
use anyhow::{Context, bail};
use importbucket_client::{ControlFile, RemoteStorageWrapper};
use pageserver_api::models::ShardImportStatus;
use remote_storage::RemotePath;
use tokio::task::JoinHandle;
use tokio_util::sync::CancellationToken;
use tracing::info;
use utils::lsn::Lsn;
@@ -17,6 +19,17 @@ mod importbucket_client;
mod importbucket_format;
pub(crate) mod index_part_format;
pub(crate) struct ImportingTimeline {
pub import_task_handle: JoinHandle<()>,
pub timeline: Arc<Timeline>,
}
impl ImportingTimeline {
pub(crate) fn shutdown(self) {
self.import_task_handle.abort();
}
}
pub async fn doit(
timeline: &Arc<Timeline>,
index_part: index_part_format::Root,
@@ -26,173 +39,225 @@ pub async fn doit(
let index_part_format::Root::V1(v1) = index_part;
let index_part_format::InProgress {
location,
idempotency_key,
started_at,
idempotency_key: _,
started_at: _,
} = match v1 {
index_part_format::V1::Done(_) => return Ok(()),
index_part_format::V1::InProgress(in_progress) => in_progress,
};
let storage = importbucket_client::new(timeline.conf, &location, cancel.clone()).await?;
let storcon_client = StorageControllerUpcallClient::new(timeline.conf, &cancel);
let status_prefix = RemotePath::from_string("status").unwrap();
let shard_status = storcon_client
.get_timeline_import_status(
timeline.tenant_shard_id,
timeline.timeline_id,
timeline.generation,
)
.await
.map_err(|_err| anyhow::anyhow!("Shut down while getting timeline import status"))?;
//
// See if shard is done.
// TODO: incorporate generations into status key for split brain safety. Figure out together with checkpointing.
//
let shard_status_key =
status_prefix.join(format!("shard-{}", timeline.tenant_shard_id.shard_slug()));
let shard_status: Option<importbucket_format::ShardStatus> =
storage.get_json(&shard_status_key).await?;
info!(?shard_status, "peeking shard status");
if shard_status.map(|st| st.done).unwrap_or(false) {
info!("shard status indicates that the shard is done, skipping import");
} else {
// TODO: checkpoint the progress into the IndexPart instead of restarting
// from the beginning.
match shard_status {
ShardImportStatus::InProgress(maybe_progress) => {
let storage =
importbucket_client::new(timeline.conf, &location, cancel.clone()).await?;
//
// Wipe the slate clean - the flow does not allow resuming.
// We can implement resuming in the future by checkpointing the progress into the IndexPart.
//
info!("wipe the slate clean");
{
// TODO: do we need to hold GC lock for this?
let mut guard = timeline.layers.write().await;
assert!(
guard.layer_map()?.open_layer.is_none(),
"while importing, there should be no in-memory layer" // this just seems like a good place to assert it
);
let all_layers_keys = guard.all_persistent_layers();
let all_layers: Vec<_> = all_layers_keys
.iter()
.map(|key| guard.get_from_key(key))
.collect();
let open = guard.open_mut().context("open_mut")?;
let control_file_res = if maybe_progress.is_none() {
// Only prepare the import once when there's no progress.
prepare_import(timeline, storage.clone(), &cancel).await
} else {
storage.get_control_file().await
};
timeline.remote_client.schedule_gc_update(&all_layers)?;
open.finish_gc_timeline(&all_layers);
}
//
// Wait for pgdata to finish uploading
//
info!("wait for pgdata to reach status 'done'");
let pgdata_status_key = status_prefix.join("pgdata");
loop {
let res = async {
let pgdata_status: Option<importbucket_format::PgdataStatus> = storage
.get_json(&pgdata_status_key)
.await
.context("get pgdata status")?;
info!(?pgdata_status, "peeking pgdata status");
if pgdata_status.map(|st| st.done).unwrap_or(false) {
Ok(())
} else {
Err(anyhow::anyhow!("pgdata not done yet"))
}
}
.await;
match res {
Ok(_) => break,
let control_file = match control_file_res {
Ok(cf) => cf,
Err(err) => {
info!(?err, "indefinitely waiting for pgdata to finish");
if tokio::time::timeout(std::time::Duration::from_secs(10), cancel.cancelled())
.await
.is_ok()
{
bail!("cancelled while waiting for pgdata");
}
return Err(
terminate_flow_with_error(timeline, err, &storcon_client, &cancel).await,
);
}
}
}
};
//
// Do the import
//
info!("do the import");
let control_file = storage.get_control_file().await?;
let base_lsn = control_file.base_lsn();
info!("update TimelineMetadata based on LSNs from control file");
{
let pg_version = control_file.pg_version();
let _ctx: &RequestContext = ctx;
async move {
// FIXME: The 'disk_consistent_lsn' should be the LSN at the *end* of the
// checkpoint record, and prev_record_lsn should point to its beginning.
// We should read the real end of the record from the WAL, but here we
// just fake it.
let disk_consistent_lsn = Lsn(base_lsn.0 + 8);
let prev_record_lsn = base_lsn;
let metadata = TimelineMetadata::new(
disk_consistent_lsn,
Some(prev_record_lsn),
None, // no ancestor
Lsn(0), // no ancestor lsn
base_lsn, // latest_gc_cutoff_lsn
base_lsn, // initdb_lsn
pg_version,
let res = flow::run(
timeline.clone(),
control_file,
storage.clone(),
maybe_progress,
ctx,
)
.await;
if let Err(err) = res {
return Err(
terminate_flow_with_error(timeline, err, &storcon_client, &cancel).await,
);
let _start_lsn = disk_consistent_lsn + 1;
timeline
.remote_client
.schedule_index_upload_for_full_metadata_update(&metadata)?;
timeline.remote_client.wait_completion().await?;
anyhow::Ok(())
}
// Communicate that shard is done.
// Ensure at-least-once delivery of the upcall to storage controller
// before we mark the task as done and never come here again.
//
// Note that we do not mark the import complete in the index part now.
// This happens in [`Tenant::finalize_importing_timeline`] in response
// to the storage controller calling
// `/v1/tenant/:tenant_id/timeline/:timeline_id/activate_post_import`.
storcon_client
.put_timeline_import_status(
timeline.tenant_shard_id,
timeline.timeline_id,
timeline.generation,
ShardImportStatus::Done,
)
.await
.map_err(|_err| {
anyhow::anyhow!("Shut down while putting timeline import status")
})?;
}
ShardImportStatus::Error(err) => {
info!(
"shard status indicates that the shard is done (error), skipping import {}",
err
);
}
ShardImportStatus::Done => {
info!("shard status indicates that the shard is done (success), skipping import");
}
.await?;
flow::run(timeline.clone(), control_file, storage.clone(), ctx).await?;
//
// Communicate that shard is done.
// Ensure at-least-once delivery of the upcall to storage controller
// before we mark the task as done and never come here again.
//
let storcon_client = StorageControllerUpcallClient::new(timeline.conf, &cancel);
storcon_client
.put_timeline_import_status(
timeline.tenant_shard_id,
timeline.timeline_id,
// TODO(vlad): What about import errors?
ShardImportStatus::Done,
)
.await
.map_err(|_err| anyhow::anyhow!("Shut down while putting timeline import status"))?;
storage
.put_json(
&shard_status_key,
&importbucket_format::ShardStatus { done: true },
)
.await
.context("put shard status")?;
}
//
// Mark as done in index_part.
// This makes subsequent timeline loads enter the normal load code path
// instead of spawning the import task and calling this here function.
//
info!("mark import as complete in index part");
timeline
.remote_client
.schedule_index_upload_for_import_pgdata_state_update(Some(index_part_format::Root::V1(
index_part_format::V1::Done(index_part_format::Done {
idempotency_key,
started_at,
finished_at: chrono::Utc::now().naive_utc(),
}),
)))?;
timeline.remote_client.wait_completion().await?;
Ok(())
}
async fn prepare_import(
timeline: &Arc<Timeline>,
storage: RemoteStorageWrapper,
cancel: &CancellationToken,
) -> anyhow::Result<ControlFile> {
// Wipe the slate clean before starting the import as a precaution.
// This method is only called when there's no recorded checkpoint for the import
// in the storage controller.
//
// Note that this is split-brain safe (two imports for same timeline shards running in
// different generations) because we go through the usual deletion path, including deletion queue.
info!("wipe the slate clean");
{
// TODO: do we need to hold GC lock for this?
let mut guard = timeline.layers.write().await;
assert!(
guard.layer_map()?.open_layer.is_none(),
"while importing, there should be no in-memory layer" // this just seems like a good place to assert it
);
let all_layers_keys = guard.all_persistent_layers();
let all_layers: Vec<_> = all_layers_keys
.iter()
.map(|key| guard.get_from_key(key))
.collect();
let open = guard.open_mut().context("open_mut")?;
timeline.remote_client.schedule_gc_update(&all_layers)?;
open.finish_gc_timeline(&all_layers);
}
//
// Wait for pgdata to finish uploading
//
info!("wait for pgdata to reach status 'done'");
let status_prefix = RemotePath::from_string("status").unwrap();
let pgdata_status_key = status_prefix.join("pgdata");
loop {
let res = async {
let pgdata_status: Option<importbucket_format::PgdataStatus> = storage
.get_json(&pgdata_status_key)
.await
.context("get pgdata status")?;
info!(?pgdata_status, "peeking pgdata status");
if pgdata_status.map(|st| st.done).unwrap_or(false) {
Ok(())
} else {
Err(anyhow::anyhow!("pgdata not done yet"))
}
}
.await;
match res {
Ok(_) => break,
Err(err) => {
info!(?err, "indefinitely waiting for pgdata to finish");
if tokio::time::timeout(std::time::Duration::from_secs(10), cancel.cancelled())
.await
.is_ok()
{
bail!("cancelled while waiting for pgdata");
}
}
}
}
let control_file = storage.get_control_file().await?;
let base_lsn = control_file.base_lsn();
info!("update TimelineMetadata based on LSNs from control file");
{
let pg_version = control_file.pg_version();
async move {
// FIXME: The 'disk_consistent_lsn' should be the LSN at the *end* of the
// checkpoint record, and prev_record_lsn should point to its beginning.
// We should read the real end of the record from the WAL, but here we
// just fake it.
let disk_consistent_lsn = Lsn(base_lsn.0 + 8);
let prev_record_lsn = base_lsn;
let metadata = TimelineMetadata::new(
disk_consistent_lsn,
Some(prev_record_lsn),
None, // no ancestor
Lsn(0), // no ancestor lsn
base_lsn, // latest_gc_cutoff_lsn
base_lsn, // initdb_lsn
pg_version,
);
let _start_lsn = disk_consistent_lsn + 1;
timeline
.remote_client
.schedule_index_upload_for_full_metadata_update(&metadata)?;
timeline.remote_client.wait_completion().await?;
anyhow::Ok(())
}
}
.await?;
Ok(control_file)
}
async fn terminate_flow_with_error(
timeline: &Arc<Timeline>,
error: anyhow::Error,
storcon_client: &StorageControllerUpcallClient,
cancel: &CancellationToken,
) -> anyhow::Error {
// The import task is a aborted on tenant shutdown, so in principle, it should
// never be cancelled. To be on the safe side, check the cancellation tokens
// before marking the import as failed.
if !(cancel.is_cancelled() || timeline.cancel.is_cancelled()) {
let notify_res = storcon_client
.put_timeline_import_status(
timeline.tenant_shard_id,
timeline.timeline_id,
timeline.generation,
ShardImportStatus::Error(format!("{error:#}")),
)
.await;
if let Err(_notify_error) = notify_res {
// The [`StorageControllerUpcallClient::put_timeline_import_status`] retries
// forever internally, so errors returned by it can only be due to cancellation.
info!("failed to notify storcon about permanent import error");
}
// Will be logged by [`Tenant::create_timeline_import_pgdata_task`]
error
} else {
anyhow::anyhow!("Import task cancelled")
}
}

View File

@@ -29,10 +29,11 @@
//! - version-specific CheckPointData (=> pgv abstraction, already exists for regular walingest)
use std::collections::HashSet;
use std::hash::{Hash, Hasher};
use std::ops::Range;
use std::sync::Arc;
use anyhow::{bail, ensure};
use anyhow::ensure;
use bytes::Bytes;
use futures::stream::FuturesOrdered;
use itertools::Itertools;
@@ -43,6 +44,7 @@ use pageserver_api::key::{
slru_segment_size_to_key,
};
use pageserver_api::keyspace::{contiguous_range_len, is_contiguous_range, singleton_range};
use pageserver_api::models::{ShardImportProgress, ShardImportProgressV1, ShardImportStatus};
use pageserver_api::reltag::{RelTag, SlruKind};
use pageserver_api::shard::ShardIdentity;
use postgres_ffi::relfile_utils::parse_relfilename;
@@ -53,21 +55,42 @@ use tokio_stream::StreamExt;
use tracing::{debug, instrument};
use utils::bin_ser::BeSer;
use utils::lsn::Lsn;
use utils::pausable_failpoint;
use super::Timeline;
use super::importbucket_client::{ControlFile, RemoteStorageWrapper};
use crate::assert_u64_eq_usize::UsizeIsU64;
use crate::context::{DownloadBehavior, RequestContext};
use crate::controller_upcall_client::{StorageControllerUpcallApi, StorageControllerUpcallClient};
use crate::pgdatadir_mapping::{
DbDirectory, RelDirectory, SlruSegmentDirectory, TwoPhaseDirectory,
};
use crate::task_mgr::TaskKind;
use crate::tenant::storage_layer::{ImageLayerWriter, Layer};
use crate::tenant::storage_layer::{AsLayerDesc, ImageLayerWriter, Layer};
pub async fn run(
timeline: Arc<Timeline>,
control_file: ControlFile,
storage: RemoteStorageWrapper,
import_progress: Option<ShardImportProgress>,
ctx: &RequestContext,
) -> anyhow::Result<()> {
// Match how we run the import based on the progress version.
// If there's no import progress, it means that this is a new import
// and we can use whichever version we want.
match import_progress {
Some(ShardImportProgress::V1(progress)) => {
run_v1(timeline, control_file, storage, Some(progress), ctx).await
}
None => run_v1(timeline, control_file, storage, None, ctx).await,
}
}
async fn run_v1(
timeline: Arc<Timeline>,
control_file: ControlFile,
storage: RemoteStorageWrapper,
import_progress: Option<ShardImportProgressV1>,
ctx: &RequestContext,
) -> anyhow::Result<()> {
let planner = Planner {
@@ -79,7 +102,32 @@ pub async fn run(
let import_config = &timeline.conf.timeline_import_config;
let plan = planner.plan(import_config).await?;
plan.execute(timeline, import_config, ctx).await
// Hash the plan and compare with the hash of the plan we got back from the storage controller.
// If the two match, it means that the planning stage had the same output.
//
// This is not intended to be a cryptographically secure hash.
const SEED: u64 = 42;
let mut hasher = twox_hash::XxHash64::with_seed(SEED);
plan.hash(&mut hasher);
let plan_hash = hasher.finish();
if let Some(progress) = &import_progress {
if plan_hash != progress.import_plan_hash {
anyhow::bail!("Import plan does not match storcon metadata");
}
// Handle collisions on jobs of unequal length
if progress.jobs != plan.jobs.len() {
anyhow::bail!("Import plan job length does not match storcon metadata")
}
}
pausable_failpoint!("import-timeline-pre-execute-pausable");
let start_from_job_idx = import_progress.map(|progress| progress.completed);
plan.execute(timeline, start_from_job_idx, plan_hash, import_config, ctx)
.await
}
struct Planner {
@@ -89,8 +137,11 @@ struct Planner {
tasks: Vec<AnyImportTask>,
}
#[derive(Hash)]
struct Plan {
jobs: Vec<ChunkProcessingJob>,
// Included here such that it ends up in the hash for the plan
shard: ShardIdentity,
}
impl Planner {
@@ -194,7 +245,10 @@ impl Planner {
pgdata_lsn,
));
Ok(Plan { jobs })
Ok(Plan {
jobs,
shard: self.shard,
})
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(dboid=%db.dboid, tablespace=%db.spcnode, path=%db.path))]
@@ -323,25 +377,45 @@ impl Plan {
async fn execute(
self,
timeline: Arc<Timeline>,
start_after_job_idx: Option<usize>,
import_plan_hash: u64,
import_config: &TimelineImportConfig,
ctx: &RequestContext,
) -> anyhow::Result<()> {
let storcon_client = StorageControllerUpcallClient::new(timeline.conf, &timeline.cancel);
let mut work = FuturesOrdered::new();
let semaphore = Arc::new(Semaphore::new(import_config.import_job_concurrency.into()));
let jobs_in_plan = self.jobs.len();
let mut jobs = self.jobs.into_iter().enumerate().peekable();
let mut results = Vec::new();
let mut jobs = self
.jobs
.into_iter()
.enumerate()
.map(|(idx, job)| (idx + 1, job))
.filter(|(idx, _job)| {
// Filter out any jobs that have been done already
if let Some(start_after) = start_after_job_idx {
*idx > start_after
} else {
true
}
})
.peekable();
let mut last_completed_job_idx = start_after_job_idx.unwrap_or(0);
let checkpoint_every: usize = import_config.import_job_checkpoint_threshold.into();
// Run import jobs concurrently up to the limit specified by the pageserver configuration.
// Note that we process completed futures in the oreder of insertion. This will be the
// building block for resuming imports across pageserver restarts or tenant migrations.
while results.len() < jobs_in_plan {
while last_completed_job_idx < jobs_in_plan {
tokio::select! {
permit = semaphore.clone().acquire_owned(), if jobs.peek().is_some() => {
let permit = permit.expect("never closed");
let (job_idx, job) = jobs.next().expect("we peeked");
let job_timeline = timeline.clone();
let ctx = ctx.detached_child(TaskKind::ImportPgdata, DownloadBehavior::Error);
@@ -353,13 +427,35 @@ impl Plan {
},
maybe_complete_job_idx = work.next() => {
match maybe_complete_job_idx {
Some(Ok((_job_idx, res))) => {
results.push(res);
Some(Ok((job_idx, res))) => {
assert!(last_completed_job_idx.checked_add(1).unwrap() == job_idx);
res?;
last_completed_job_idx = job_idx;
if last_completed_job_idx % checkpoint_every == 0 {
let progress = ShardImportProgressV1 {
jobs: jobs_in_plan,
completed: last_completed_job_idx,
import_plan_hash,
};
storcon_client.put_timeline_import_status(
timeline.tenant_shard_id,
timeline.timeline_id,
timeline.generation,
ShardImportStatus::InProgress(Some(ShardImportProgress::V1(progress)))
)
.await
.map_err(|_err| {
anyhow::anyhow!("Shut down while putting timeline import status")
})?;
}
},
Some(Err(_)) => {
results.push(Err(anyhow::anyhow!(
"parallel job panicked or cancelled, check pageserver logs"
)));
anyhow::bail!(
"import job panicked or cancelled"
);
}
None => {}
}
@@ -367,17 +463,7 @@ impl Plan {
}
}
if results.iter().all(|r| r.is_ok()) {
Ok(())
} else {
let mut msg = String::new();
for result in results {
if let Err(err) = result {
msg.push_str(&format!("{err:?}\n\n"));
}
}
bail!("Some parallel jobs failed:\n\n{msg}");
}
Ok(())
}
}
@@ -549,6 +635,15 @@ struct ImportSingleKeyTask {
buf: Bytes,
}
impl Hash for ImportSingleKeyTask {
fn hash<H: Hasher>(&self, state: &mut H) {
let ImportSingleKeyTask { key, buf } = self;
key.hash(state);
buf.hash(state);
}
}
impl ImportSingleKeyTask {
fn new(key: Key, buf: Bytes) -> Self {
ImportSingleKeyTask { key, buf }
@@ -577,6 +672,20 @@ struct ImportRelBlocksTask {
storage: RemoteStorageWrapper,
}
impl Hash for ImportRelBlocksTask {
fn hash<H: Hasher>(&self, state: &mut H) {
let ImportRelBlocksTask {
shard_identity: _,
key_range,
path,
storage: _,
} = self;
key_range.hash(state);
path.hash(state);
}
}
impl ImportRelBlocksTask {
fn new(
shard_identity: ShardIdentity,
@@ -661,6 +770,19 @@ struct ImportSlruBlocksTask {
storage: RemoteStorageWrapper,
}
impl Hash for ImportSlruBlocksTask {
fn hash<H: Hasher>(&self, state: &mut H) {
let ImportSlruBlocksTask {
key_range,
path,
storage: _,
} = self;
key_range.hash(state);
path.hash(state);
}
}
impl ImportSlruBlocksTask {
fn new(key_range: Range<Key>, path: &RemotePath, storage: RemoteStorageWrapper) -> Self {
ImportSlruBlocksTask {
@@ -703,6 +825,7 @@ impl ImportTask for ImportSlruBlocksTask {
}
}
#[derive(Hash)]
enum AnyImportTask {
SingleKey(ImportSingleKeyTask),
RelBlocks(ImportRelBlocksTask),
@@ -749,6 +872,7 @@ impl From<ImportSlruBlocksTask> for AnyImportTask {
}
}
#[derive(Hash)]
struct ChunkProcessingJob {
range: Range<Key>,
tasks: Vec<AnyImportTask>,
@@ -786,17 +910,51 @@ impl ChunkProcessingJob {
let resident_layer = if nimages > 0 {
let (desc, path) = writer.finish(ctx).await?;
{
let guard = timeline.layers.read().await;
let existing_layer = guard.try_get_from_key(&desc.key());
if let Some(layer) = existing_layer {
if layer.metadata().generation != timeline.generation {
return Err(anyhow::anyhow!(
"Import attempted to rewrite layer file in the same generation: {}",
layer.local_path()
));
}
}
}
Layer::finish_creating(timeline.conf, &timeline, desc, &path)?
} else {
// dropping the writer cleans up
return Ok(());
};
// this is sharing the same code as create_image_layers
// The same import job might run multiple times since not each job is checkpointed.
// Hence, we must support the cases where the layer already exists. We cannot be
// certain that the existing layer is identical to the new one, so in that case
// we replace the old layer with the one we just generated.
let mut guard = timeline.layers.write().await;
guard
.open_mut()?
.track_new_image_layers(&[resident_layer.clone()], &timeline.metrics);
let existing_layer = guard
.try_get_from_key(&resident_layer.layer_desc().key())
.cloned();
match existing_layer {
Some(existing) => {
guard.open_mut()?.rewrite_layers(
&[(existing.clone(), resident_layer.clone())],
&[],
&timeline.metrics,
);
}
None => {
guard
.open_mut()?
.track_new_image_layers(&[resident_layer.clone()], &timeline.metrics);
}
}
crate::tenant::timeline::drop_wlock(guard);
timeline

View File

@@ -190,31 +190,6 @@ impl RemoteStorageWrapper {
Ok(Some(res))
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn put_json<T>(&self, path: &RemotePath, value: &T) -> anyhow::Result<()>
where
T: serde::Serialize,
{
let buf = serde_json::to_vec(value)?;
let bytes = Bytes::from(buf);
utils::backoff::retry(
|| async {
let size = bytes.len();
let bytes = futures::stream::once(futures::future::ready(Ok(bytes.clone())));
self.storage
.upload_storage_object(bytes, size, path, &self.cancel)
.await
},
remote_storage::TimeoutOrCancel::caused_by_cancel,
1,
u32::MAX,
&format!("put json {path}"),
&self.cancel,
)
.await
.expect("practically infinite retries")
}
#[instrument(level = tracing::Level::DEBUG, skip_all, fields(%path))]
pub async fn get_range(
&self,

View File

@@ -5,9 +5,3 @@ pub struct PgdataStatus {
pub done: bool,
// TODO: remaining fields
}
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq, Eq)]
pub struct ShardStatus {
pub done: bool,
// TODO: remaining fields
}

View File

@@ -64,4 +64,12 @@ impl Root {
},
}
}
pub fn started_at(&self) -> &chrono::NaiveDateTime {
match self {
Root::V1(v1) => match v1 {
V1::InProgress(in_progress) => &in_progress.started_at,
V1::Done(done) => &done.started_at,
},
}
}
}

View File

@@ -408,7 +408,7 @@ impl OpenFiles {
/// error types may be elegible for retry.
pub(crate) fn is_fatal_io_error(e: &std::io::Error) -> bool {
use nix::errno::Errno::*;
match e.raw_os_error().map(nix::errno::from_i32) {
match e.raw_os_error().map(nix::errno::Errno::from_raw) {
Some(EIO) => {
// Terminate on EIO because we no longer trust the device to store
// data safely, or to uphold persistence guarantees on fsync.

View File

@@ -124,9 +124,7 @@ pub(super) fn epoll_uring_error_to_std(
) -> std::io::Error {
match e {
tokio_epoll_uring::Error::Op(e) => e,
tokio_epoll_uring::Error::System(system) => {
std::io::Error::new(std::io::ErrorKind::Other, system)
}
tokio_epoll_uring::Error::System(system) => std::io::Error::other(system),
}
}

View File

@@ -936,6 +936,44 @@ lfc_prewarm_main(Datum main_arg)
lfc_ctl->prewarm_workers[worker_id].completed = GetCurrentTimestamp();
}
void
lfc_invalidate(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber nblocks)
{
BufferTag tag;
FileCacheEntry *entry;
uint32 hash;
if (lfc_maybe_disabled()) /* fast exit if file cache is disabled */
return;
CopyNRelFileInfoToBufTag(tag, rinfo);
tag.forkNum = forkNum;
CriticalAssert(BufTagGetRelNumber(&tag) != InvalidRelFileNumber);
LWLockAcquire(lfc_lock, LW_EXCLUSIVE);
if (LFC_ENABLED())
{
for (BlockNumber blkno = 0; blkno < nblocks; blkno += lfc_blocks_per_chunk)
{
tag.blockNum = blkno;
hash = get_hash_value(lfc_hash, &tag);
entry = hash_search_with_hash_value(lfc_hash, &tag, hash, HASH_FIND, NULL);
if (entry != NULL)
{
for (int i = 0; i < lfc_blocks_per_chunk; i++)
{
if (GET_STATE(entry, i) == AVAILABLE)
{
lfc_ctl->used_pages -= 1;
SET_STATE(entry, i, UNAVAILABLE);
}
}
}
}
}
LWLockRelease(lfc_lock);
}
/*
* Check if page is present in the cache.

View File

@@ -28,6 +28,7 @@ typedef struct FileCacheState
extern bool lfc_store_prefetch_result;
/* functions for local file cache */
extern void lfc_invalidate(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber nblocks);
extern void lfc_writev(NRelFileInfo rinfo, ForkNumber forkNum,
BlockNumber blkno, const void *const *buffers,
BlockNumber nblocks);

View File

@@ -86,7 +86,7 @@ InitBufferTag(BufferTag *tag, const RelFileNode *rnode,
#define InvalidRelFileNumber InvalidOid
#define SMgrRelGetRelInfo(reln) \
#define SMgrRelGetRelInfo(reln) \
(reln->smgr_rnode.node)
#define DropRelationAllLocalBuffers DropRelFileNodeAllLocalBuffers
@@ -148,6 +148,12 @@ InitBufferTag(BufferTag *tag, const RelFileNode *rnode,
#define DropRelationAllLocalBuffers DropRelationAllLocalBuffers
#endif
#define NRelFileInfoInvalidate(rinfo) do { \
NInfoGetSpcOid(rinfo) = InvalidOid; \
NInfoGetDbOid(rinfo) = InvalidOid; \
NInfoGetRelNumber(rinfo) = InvalidRelFileNumber; \
} while (0)
#if PG_MAJORVERSION_NUM < 17
#define ProcNumber BackendId
#define INVALID_PROC_NUMBER InvalidBackendId

View File

@@ -108,7 +108,7 @@ typedef enum
UNLOGGED_BUILD_NOT_PERMANENT
} UnloggedBuildPhase;
static SMgrRelation unlogged_build_rel = NULL;
static NRelFileInfo unlogged_build_rel_info;
static UnloggedBuildPhase unlogged_build_phase = UNLOGGED_BUILD_NOT_IN_PROGRESS;
static bool neon_redo_read_buffer_filter(XLogReaderState *record, uint8 block_id);
@@ -912,16 +912,19 @@ neon_extend(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno,
{
case 0:
neon_log(ERROR, "cannot call smgrextend() on rel with unknown persistence");
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdextend(reln, forkNum, blkno, buffer, skipFsync);
return;
}
break;
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
mdextend(reln, forkNum, blkno, buffer, skipFsync);
/* Update LFC in case of unlogged index build */
if (reln == unlogged_build_rel && unlogged_build_phase == UNLOGGED_BUILD_PHASE_2)
lfc_write(InfoFromSMgrRel(reln), forkNum, blkno, buffer);
return;
default:
@@ -1003,21 +1006,19 @@ neon_zeroextend(SMgrRelation reln, ForkNumber forkNum, BlockNumber blocknum,
{
case 0:
neon_log(ERROR, "cannot call smgrextend() on rel with unknown persistence");
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdzeroextend(reln, forkNum, blocknum, nblocks, skipFsync);
return;
}
break;
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
mdzeroextend(reln, forkNum, blocknum, nblocks, skipFsync);
/* Update LFC in case of unlogged index build */
if (reln == unlogged_build_rel && unlogged_build_phase == UNLOGGED_BUILD_PHASE_2)
{
for (int i = 0; i < nblocks; i++)
{
lfc_write(InfoFromSMgrRel(reln), forkNum, blocknum + i, buffer.data);
}
}
return;
default:
@@ -1281,75 +1282,24 @@ neon_read_at_lsn(NRelFileInfo rinfo, ForkNumber forkNum, BlockNumber blkno,
communicator_read_at_lsnv(rinfo, forkNum, blkno, &request_lsns, &buffer, 1, NULL);
}
#if PG_MAJORVERSION_NUM < 17
/*
* neon_read() -- Read the specified block from a relation.
*/
#if PG_MAJORVERSION_NUM < 16
static void
neon_read(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, char *buffer)
#else
static void
neon_read(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, void *buffer)
#endif
{
neon_request_lsns request_lsns;
bits8 present;
void *bufferp;
switch (reln->smgr_relpersistence)
{
case 0:
neon_log(ERROR, "cannot call smgrread() on rel with unknown persistence");
case RELPERSISTENCE_PERMANENT:
break;
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
mdread(reln, forkNum, blkno, buffer);
return;
default:
neon_log(ERROR, "unknown relpersistence '%c'", reln->smgr_relpersistence);
}
/* Try to read PS results if they are available */
communicator_prefetch_pump_state();
neon_get_request_lsns(InfoFromSMgrRel(reln), forkNum, blkno, &request_lsns, 1);
present = 0;
bufferp = buffer;
if (communicator_prefetch_lookupv(InfoFromSMgrRel(reln), forkNum, blkno, &request_lsns, 1, &bufferp, &present))
{
/* Prefetch hit */
return;
}
/* Try to read from local file cache */
if (lfc_read(InfoFromSMgrRel(reln), forkNum, blkno, buffer))
{
MyNeonCounters->file_cache_hits_total++;
return;
}
neon_read_at_lsn(InfoFromSMgrRel(reln), forkNum, blkno, request_lsns, buffer);
/*
* Try to receive prefetch results once again just to make sure we don't leave the smgr code while the OS might still have buffered bytes.
*/
communicator_prefetch_pump_state();
#ifdef DEBUG_COMPARE_LOCAL
static void
compare_with_local(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, void* buffer, XLogRecPtr request_lsn)
{
if (forkNum == MAIN_FORKNUM && IS_LOCAL_REL(reln))
{
char pageserver_masked[BLCKSZ];
PGIOAlignedBlock mdbuf;
PGIOAlignedBlock mdbuf_masked;
XLogRecPtr request_lsn = request_lsns.request_lsn;
#if PG_MAJORVERSION_NUM >= 17
{
void* mdbuffers[1] = { mdbuf.data };
mdreadv(reln, forkNum, blkno, mdbuffers, 1);
}
#else
mdread(reln, forkNum, blkno, mdbuf.data);
#endif
memcpy(pageserver_masked, buffer, BLCKSZ);
memcpy(mdbuf_masked.data, mdbuf.data, BLCKSZ);
@@ -1413,11 +1363,111 @@ neon_read(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, void *buffer
}
}
}
}
#endif
#if PG_MAJORVERSION_NUM < 17
/*
* neon_read() -- Read the specified block from a relation.
*/
#if PG_MAJORVERSION_NUM < 16
static void
neon_read(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, char *buffer)
#else
static void
neon_read(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, void *buffer)
#endif
{
neon_request_lsns request_lsns;
bits8 present;
void *bufferp;
switch (reln->smgr_relpersistence)
{
case 0:
neon_log(ERROR, "cannot call smgrread() on rel with unknown persistence");
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdread(reln, forkNum, blkno, buffer);
return;
}
break;
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
mdread(reln, forkNum, blkno, buffer);
return;
default:
neon_log(ERROR, "unknown relpersistence '%c'", reln->smgr_relpersistence);
}
/* Try to read PS results if they are available */
communicator_prefetch_pump_state();
neon_get_request_lsns(InfoFromSMgrRel(reln), forkNum, blkno, &request_lsns, 1);
present = 0;
bufferp = buffer;
if (communicator_prefetch_lookupv(InfoFromSMgrRel(reln), forkNum, blkno, &request_lsns, 1, &bufferp, &present))
{
/* Prefetch hit */
#ifdef DEBUG_COMPARE_LOCAL
compare_with_local(reln, forkNum, blkno, buffer, request_lsns.request_lsn);
#else
return;
#endif
}
/* Try to read from local file cache */
if (lfc_read(InfoFromSMgrRel(reln), forkNum, blkno, buffer))
{
MyNeonCounters->file_cache_hits_total++;
#ifdef DEBUG_COMPARE_LOCAL
compare_with_local(reln, forkNum, blkno, buffer, request_lsns.request_lsn);
#else
return;
#endif
}
neon_read_at_lsn(InfoFromSMgrRel(reln), forkNum, blkno, request_lsns, buffer);
/*
* Try to receive prefetch results once again just to make sure we don't leave the smgr code while the OS might still have buffered bytes.
*/
communicator_prefetch_pump_state();
#ifdef DEBUG_COMPARE_LOCAL
compare_with_local(reln, forkNum, blkno, buffer, request_lsns.request_lsn);
#endif
}
#endif /* PG_MAJORVERSION_NUM <= 16 */
#if PG_MAJORVERSION_NUM >= 17
#ifdef DEBUG_COMPARE_LOCAL
static void
compare_with_localv(SMgrRelation reln, ForkNumber forkNum, BlockNumber blkno, void** buffers, BlockNumber nblocks, neon_request_lsns* request_lsns, bits8* read_pages)
{
if (forkNum == MAIN_FORKNUM && IS_LOCAL_REL(reln))
{
for (BlockNumber i = 0; i < nblocks; i++)
{
if (BITMAP_ISSET(read_pages, i))
{
compare_with_local(reln, forkNum, blkno + i, buffers[i], request_lsns[i].request_lsn);
}
}
}
}
#endif
static void
neon_readv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
void **buffers, BlockNumber nblocks)
@@ -1431,8 +1481,14 @@ neon_readv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
{
case 0:
neon_log(ERROR, "cannot call smgrread() on rel with unknown persistence");
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdreadv(reln, forknum, blocknum, buffers, nblocks);
return;
}
break;
case RELPERSISTENCE_TEMP:
@@ -1460,8 +1516,13 @@ neon_readv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
blocknum, request_lsns, nblocks,
buffers, read_pages);
#ifdef DEBUG_COMPARE_LOCAL
compare_with_localv(reln, forknum, blocknum, buffers, nblocks, request_lsns, read_pages);
memset(read_pages, 0, sizeof(read_pages));
#else
if (prefetch_result == nblocks)
return;
#endif
/* Try to read from local file cache */
lfc_result = lfc_readv_select(InfoFromSMgrRel(reln), forknum, blocknum, buffers,
@@ -1470,9 +1531,14 @@ neon_readv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
if (lfc_result > 0)
MyNeonCounters->file_cache_hits_total += lfc_result;
#ifdef DEBUG_COMPARE_LOCAL
compare_with_localv(reln, forknum, blocknum, buffers, nblocks, request_lsns, read_pages);
memset(read_pages, 0, sizeof(read_pages));
#else
/* Read all blocks from LFC, so we're done */
if (prefetch_result + lfc_result == nblocks)
return;
#endif
communicator_read_at_lsnv(InfoFromSMgrRel(reln), forknum, blocknum, request_lsns,
buffers, nblocks, read_pages);
@@ -1483,91 +1549,8 @@ neon_readv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
communicator_prefetch_pump_state();
#ifdef DEBUG_COMPARE_LOCAL
if (forknum == MAIN_FORKNUM && IS_LOCAL_REL(reln))
{
char pageserver_masked[BLCKSZ];
PGIOAlignedBlock mdbuf;
PGIOAlignedBlock mdbuf_masked;
XLogRecPtr request_lsn = request_lsns->request_lsn;
for (int i = 0; i < nblocks; i++)
{
BlockNumber blkno = blocknum + i;
if (!BITMAP_ISSET(read_pages, i))
continue;
#if PG_MAJORVERSION_NUM >= 17
{
void* mdbuffers[1] = { mdbuf.data };
mdreadv(reln, forknum, blkno, mdbuffers, 1);
}
#else
mdread(reln, forknum, blkno, mdbuf.data);
#endif
memcpy(pageserver_masked, buffers[i], BLCKSZ);
memcpy(mdbuf_masked.data, mdbuf.data, BLCKSZ);
if (PageIsNew((Page) mdbuf.data))
{
if (!PageIsNew((Page) pageserver_masked))
{
neon_log(PANIC, "page is new in MD but not in Page Server at blk %u in rel %u/%u/%u fork %u (request LSN %X/%08X):\n%s\n",
blkno,
RelFileInfoFmt(InfoFromSMgrRel(reln)),
forknum,
(uint32) (request_lsn >> 32), (uint32) request_lsn,
hexdump_page(buffers[i]));
}
}
else if (PageIsNew((Page) buffers[i]))
{
neon_log(PANIC, "page is new in Page Server but not in MD at blk %u in rel %u/%u/%u fork %u (request LSN %X/%08X):\n%s\n",
blkno,
RelFileInfoFmt(InfoFromSMgrRel(reln)),
forknum,
(uint32) (request_lsn >> 32), (uint32) request_lsn,
hexdump_page(mdbuf.data));
}
else if (PageGetSpecialSize(mdbuf.data) == 0)
{
/* assume heap */
RmgrTable[RM_HEAP_ID].rm_mask(mdbuf_masked.data, blkno);
RmgrTable[RM_HEAP_ID].rm_mask(pageserver_masked, blkno);
if (memcmp(mdbuf_masked.data, pageserver_masked, BLCKSZ) != 0)
{
neon_log(PANIC, "heap buffers differ at blk %u in rel %u/%u/%u fork %u (request LSN %X/%08X):\n------ MD ------\n%s\n------ Page Server ------\n%s\n",
blkno,
RelFileInfoFmt(InfoFromSMgrRel(reln)),
forknum,
(uint32) (request_lsn >> 32), (uint32) request_lsn,
hexdump_page(mdbuf_masked.data),
hexdump_page(pageserver_masked));
}
}
else if (PageGetSpecialSize(mdbuf.data) == MAXALIGN(sizeof(BTPageOpaqueData)))
{
if (((BTPageOpaqueData *) PageGetSpecialPointer(mdbuf.data))->btpo_cycleid < MAX_BT_CYCLE_ID)
{
/* assume btree */
RmgrTable[RM_BTREE_ID].rm_mask(mdbuf_masked.data, blkno);
RmgrTable[RM_BTREE_ID].rm_mask(pageserver_masked, blkno);
if (memcmp(mdbuf_masked.data, pageserver_masked, BLCKSZ) != 0)
{
neon_log(PANIC, "btree buffers differ at blk %u in rel %u/%u/%u fork %u (request LSN %X/%08X):\n------ MD ------\n%s\n------ Page Server ------\n%s\n",
blkno,
RelFileInfoFmt(InfoFromSMgrRel(reln)),
forknum,
(uint32) (request_lsn >> 32), (uint32) request_lsn,
hexdump_page(mdbuf_masked.data),
hexdump_page(pageserver_masked));
}
}
}
}
}
memset(read_pages, 0xFF, sizeof(read_pages));
compare_with_localv(reln, forknum, blocknum, buffers, nblocks, request_lsns, read_pages);
#endif
}
#endif
@@ -1638,6 +1621,15 @@ neon_write(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, const vo
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
#if PG_MAJORVERSION_NUM >= 17
mdwritev(reln, forknum, blocknum, &buffer, 1, skipFsync);
#else
mdwrite(reln, forknum, blocknum, buffer, skipFsync);
#endif
return;
}
break;
case RELPERSISTENCE_TEMP:
@@ -1647,9 +1639,6 @@ neon_write(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, const vo
#else
mdwrite(reln, forknum, blocknum, buffer, skipFsync);
#endif
/* Update LFC in case of unlogged index build */
if (reln == unlogged_build_rel && unlogged_build_phase == UNLOGGED_BUILD_PHASE_2)
lfc_write(InfoFromSMgrRel(reln), forknum, blocknum, buffer);
return;
default:
neon_log(ERROR, "unknown relpersistence '%c'", reln->smgr_relpersistence);
@@ -1710,14 +1699,16 @@ neon_writev(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdwritev(reln, forknum, blkno, buffers, nblocks, skipFsync);
return;
}
break;
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
mdwritev(reln, forknum, blkno, buffers, nblocks, skipFsync);
/* Update LFC in case of unlogged index build */
if (reln == unlogged_build_rel && unlogged_build_phase == UNLOGGED_BUILD_PHASE_2)
lfc_writev(InfoFromSMgrRel(reln), forknum, blkno, buffers, nblocks);
return;
default:
neon_log(ERROR, "unknown relpersistence '%c'", reln->smgr_relpersistence);
@@ -1753,6 +1744,10 @@ neon_nblocks(SMgrRelation reln, ForkNumber forknum)
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
return mdnblocks(reln, forknum);
}
break;
case RELPERSISTENCE_TEMP:
@@ -1822,6 +1817,11 @@ neon_truncate(SMgrRelation reln, ForkNumber forknum, BlockNumber old_blocks, Blo
break;
case RELPERSISTENCE_PERMANENT:
if (RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)))
{
mdtruncate(reln, forknum, old_blocks, nblocks);
return;
}
break;
case RELPERSISTENCE_TEMP:
@@ -1960,7 +1960,6 @@ neon_start_unlogged_build(SMgrRelation reln)
*/
if (unlogged_build_phase != UNLOGGED_BUILD_NOT_IN_PROGRESS)
neon_log(ERROR, "unlogged relation build is already in progress");
Assert(unlogged_build_rel == NULL);
ereport(SmgrTrace,
(errmsg(NEON_TAG "starting unlogged build of relation %u/%u/%u",
@@ -1977,7 +1976,7 @@ neon_start_unlogged_build(SMgrRelation reln)
case RELPERSISTENCE_TEMP:
case RELPERSISTENCE_UNLOGGED:
unlogged_build_rel = reln;
unlogged_build_rel_info = InfoFromSMgrRel(reln);
unlogged_build_phase = UNLOGGED_BUILD_NOT_PERMANENT;
#ifdef DEBUG_COMPARE_LOCAL
if (!IsParallelWorker())
@@ -1998,12 +1997,9 @@ neon_start_unlogged_build(SMgrRelation reln)
neon_log(ERROR, "cannot perform unlogged index build, index is not empty ");
#endif
unlogged_build_rel = reln;
unlogged_build_rel_info = InfoFromSMgrRel(reln);
unlogged_build_phase = UNLOGGED_BUILD_PHASE_1;
/* Make the relation look like it's unlogged */
reln->smgr_relpersistence = RELPERSISTENCE_UNLOGGED;
/*
* Create the local file. In a parallel build, the leader is expected to
* call this first and do it.
@@ -2030,17 +2026,16 @@ neon_start_unlogged_build(SMgrRelation reln)
static void
neon_finish_unlogged_build_phase_1(SMgrRelation reln)
{
Assert(unlogged_build_rel == reln);
Assert(RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)));
ereport(SmgrTrace,
(errmsg(NEON_TAG "finishing phase 1 of unlogged build of relation %u/%u/%u",
RelFileInfoFmt(InfoFromSMgrRel(reln)))));
RelFileInfoFmt((unlogged_build_rel_info)))));
if (unlogged_build_phase == UNLOGGED_BUILD_NOT_PERMANENT)
return;
Assert(unlogged_build_phase == UNLOGGED_BUILD_PHASE_1);
Assert(reln->smgr_relpersistence == RELPERSISTENCE_UNLOGGED);
/*
* In a parallel build, (only) the leader process performs the 2nd
@@ -2048,7 +2043,7 @@ neon_finish_unlogged_build_phase_1(SMgrRelation reln)
*/
if (IsParallelWorker())
{
unlogged_build_rel = NULL;
NRelFileInfoInvalidate(unlogged_build_rel_info);
unlogged_build_phase = UNLOGGED_BUILD_NOT_IN_PROGRESS;
}
else
@@ -2069,11 +2064,11 @@ neon_end_unlogged_build(SMgrRelation reln)
{
NRelFileInfoBackend rinfob = InfoBFromSMgrRel(reln);
Assert(unlogged_build_rel == reln);
Assert(RelFileInfoEquals(unlogged_build_rel_info, InfoFromSMgrRel(reln)));
ereport(SmgrTrace,
(errmsg(NEON_TAG "ending unlogged build of relation %u/%u/%u",
RelFileInfoFmt(InfoFromNInfoB(rinfob)))));
RelFileInfoFmt(unlogged_build_rel_info))));
if (unlogged_build_phase != UNLOGGED_BUILD_NOT_PERMANENT)
{
@@ -2081,7 +2076,6 @@ neon_end_unlogged_build(SMgrRelation reln)
BlockNumber nblocks;
Assert(unlogged_build_phase == UNLOGGED_BUILD_PHASE_2);
Assert(reln->smgr_relpersistence == RELPERSISTENCE_UNLOGGED);
/*
* Update the last-written LSN cache.
@@ -2102,9 +2096,6 @@ neon_end_unlogged_build(SMgrRelation reln)
InfoFromNInfoB(rinfob),
MAIN_FORKNUM);
/* Make the relation look permanent again */
reln->smgr_relpersistence = RELPERSISTENCE_PERMANENT;
/* Remove local copy */
for (int forknum = 0; forknum <= MAX_FORKNUM; forknum++)
{
@@ -2113,6 +2104,8 @@ neon_end_unlogged_build(SMgrRelation reln)
forknum);
forget_cached_relsize(InfoFromNInfoB(rinfob), forknum);
lfc_invalidate(InfoFromNInfoB(rinfob), forknum, nblocks);
mdclose(reln, forknum);
#ifndef DEBUG_COMPARE_LOCAL
/* use isRedo == true, so that we drop it immediately */
@@ -2123,7 +2116,7 @@ neon_end_unlogged_build(SMgrRelation reln)
mdunlink(rinfob, INIT_FORKNUM, true);
#endif
}
unlogged_build_rel = NULL;
NRelFileInfoInvalidate(unlogged_build_rel_info);
unlogged_build_phase = UNLOGGED_BUILD_NOT_IN_PROGRESS;
}
@@ -2196,7 +2189,7 @@ AtEOXact_neon(XactEvent event, void *arg)
* Forget about any build we might have had in progress. The local
* file will be unlinked by smgrDoPendingDeletes()
*/
unlogged_build_rel = NULL;
NRelFileInfoInvalidate(unlogged_build_rel_info);
unlogged_build_phase = UNLOGGED_BUILD_NOT_IN_PROGRESS;
break;
@@ -2208,7 +2201,7 @@ AtEOXact_neon(XactEvent event, void *arg)
case XACT_EVENT_PRE_PREPARE:
if (unlogged_build_phase != UNLOGGED_BUILD_NOT_IN_PROGRESS)
{
unlogged_build_rel = NULL;
NRelFileInfoInvalidate(unlogged_build_rel_info);
unlogged_build_phase = UNLOGGED_BUILD_NOT_IN_PROGRESS;
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),

15
poetry.lock generated
View File

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 2.1.2 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
[[package]]
name = "aiohappyeyeballs"
@@ -1145,18 +1145,19 @@ dotenv = ["python-dotenv"]
[[package]]
name = "flask-cors"
version = "5.0.0"
description = "A Flask extension adding a decorator for CORS support"
version = "6.0.0"
description = "A Flask extension simplifying CORS support"
optional = false
python-versions = "*"
python-versions = "<4.0,>=3.9"
groups = ["main"]
files = [
{file = "Flask_Cors-5.0.0-py2.py3-none-any.whl", hash = "sha256:b9e307d082a9261c100d8fb0ba909eec6a228ed1b60a8315fd85f783d61910bc"},
{file = "flask_cors-5.0.0.tar.gz", hash = "sha256:5aadb4b950c4e93745034594d9f3ea6591f734bb3662e16e255ffbf5e89c88ef"},
{file = "flask_cors-6.0.0-py3-none-any.whl", hash = "sha256:6332073356452343a8ccddbfec7befdc3fdd040141fe776ec9b94c262f058657"},
{file = "flask_cors-6.0.0.tar.gz", hash = "sha256:4592c1570246bf7beee96b74bc0adbbfcb1b0318f6ba05c412e8909eceec3393"},
]
[package.dependencies]
Flask = ">=0.9"
flask = ">=0.9"
Werkzeug = ">=0.7"
[[package]]
name = "frozenlist"

View File

@@ -1,6 +1,10 @@
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
#[allow(non_upper_case_globals)]
#[unsafe(export_name = "malloc_conf")]
pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:21\0";
#[tokio::main]
async fn main() -> anyhow::Result<()> {
proxy::binary::proxy::run().await

View File

@@ -394,6 +394,7 @@ async fn handle_client(
}
}
#[allow(clippy::large_enum_variant)]
enum Connection {
Raw(tokio::net::TcpStream),
Tls(tokio_rustls::client::TlsStream<tokio::net::TcpStream>),

View File

@@ -43,11 +43,12 @@ project_build_tag!(BUILD_TAG);
use clap::{Parser, ValueEnum};
#[derive(Clone, Debug, ValueEnum)]
#[clap(rename_all = "kebab-case")]
enum AuthBackendType {
#[value(name("cplane-v1"), alias("control-plane"))]
ControlPlaneV1,
#[clap(alias("cplane-v1"))]
ControlPlane,
#[value(name("link"), alias("control-redirect"))]
#[clap(alias("link"))]
ConsoleRedirect,
#[cfg(any(test, feature = "testing"))]
@@ -707,7 +708,7 @@ fn build_auth_backend(
args: &ProxyCliArgs,
) -> anyhow::Result<Either<&'static auth::Backend<'static, ()>, &'static ConsoleRedirectBackend>> {
match &args.auth_backend {
AuthBackendType::ControlPlaneV1 => {
AuthBackendType::ControlPlane => {
let wake_compute_cache_config: CacheOptions = args.wake_compute_cache.parse()?;
let project_info_cache_config: ProjectInfoCacheOptions =
args.project_info_cache.parse()?;
@@ -862,7 +863,7 @@ async fn configure_redis(
("irsa", _) => match (&args.redis_host, args.redis_port) {
(Some(host), Some(port)) => Some(
ConnectionWithCredentialsProvider::new_with_credentials_provider(
host.to_string(),
host.clone(),
port,
elasticache::CredentialsProvider::new(
args.aws_region.clone(),

View File

@@ -6,12 +6,12 @@ use ipnet::{IpNet, Ipv4Net, Ipv6Net};
use postgres_client::CancelToken;
use postgres_client::tls::MakeTlsConnect;
use pq_proto::CancelKeyData;
use redis::{FromRedisValue, Pipeline, Value, pipe};
use redis::{Cmd, FromRedisValue, Value};
use serde::{Deserialize, Serialize};
use thiserror::Error;
use tokio::net::TcpStream;
use tokio::sync::{mpsc, oneshot};
use tracing::{debug, info, warn};
use tracing::{debug, error, info, warn};
use crate::auth::backend::ComputeUserInfo;
use crate::auth::{AuthError, check_peer_addr_is_in_list};
@@ -56,8 +56,70 @@ pub enum CancelKeyOp {
},
}
pub struct Pipeline {
inner: redis::Pipeline,
replies: Vec<CancelReplyOp>,
}
impl Pipeline {
fn with_capacity(n: usize) -> Self {
Self {
inner: redis::Pipeline::with_capacity(n),
replies: Vec::with_capacity(n),
}
}
async fn execute(&mut self, client: &mut RedisKVClient) {
let responses = self.replies.len();
let batch_size = self.inner.len();
match client.query(&self.inner).await {
// for each reply, we expect that many values.
Ok(Value::Array(values)) if values.len() == responses => {
debug!(
batch_size,
responses, "successfully completed cancellation jobs",
);
for (value, reply) in std::iter::zip(values, self.replies.drain(..)) {
reply.send_value(value);
}
}
Ok(value) => {
error!(batch_size, ?value, "unexpected redis return value");
for reply in self.replies.drain(..) {
reply.send_err(anyhow!("incorrect response type from redis"));
}
}
Err(err) => {
for reply in self.replies.drain(..) {
reply.send_err(anyhow!("could not send cmd to redis: {err}"));
}
}
}
self.inner.clear();
self.replies.clear();
}
fn add_command_with_reply(&mut self, cmd: Cmd, reply: CancelReplyOp) {
self.inner.add_command(cmd);
self.replies.push(reply);
}
fn add_command_no_reply(&mut self, cmd: Cmd) {
self.inner.add_command(cmd).ignore();
}
fn add_command(&mut self, cmd: Cmd, reply: Option<CancelReplyOp>) {
match reply {
Some(reply) => self.add_command_with_reply(cmd, reply),
None => self.add_command_no_reply(cmd),
}
}
}
impl CancelKeyOp {
fn register(self, pipe: &mut Pipeline) -> Option<CancelReplyOp> {
fn register(self, pipe: &mut Pipeline) {
#[allow(clippy::used_underscore_binding)]
match self {
CancelKeyOp::StoreCancelKey {
@@ -68,18 +130,18 @@ impl CancelKeyOp {
_guard,
expire,
} => {
pipe.hset(&key, field, value);
pipe.expire(key, expire);
let resp_tx = resp_tx?;
Some(CancelReplyOp::StoreCancelKey { resp_tx, _guard })
let reply =
resp_tx.map(|resp_tx| CancelReplyOp::StoreCancelKey { resp_tx, _guard });
pipe.add_command(Cmd::hset(&key, field, value), reply);
pipe.add_command_no_reply(Cmd::expire(key, expire));
}
CancelKeyOp::GetCancelData {
key,
resp_tx,
_guard,
} => {
pipe.hgetall(key);
Some(CancelReplyOp::GetCancelData { resp_tx, _guard })
let reply = CancelReplyOp::GetCancelData { resp_tx, _guard };
pipe.add_command_with_reply(Cmd::hgetall(key), reply);
}
CancelKeyOp::RemoveCancelKey {
key,
@@ -87,9 +149,9 @@ impl CancelKeyOp {
resp_tx,
_guard,
} => {
pipe.hdel(key, field);
let resp_tx = resp_tx?;
Some(CancelReplyOp::RemoveCancelKey { resp_tx, _guard })
let reply =
resp_tx.map(|resp_tx| CancelReplyOp::RemoveCancelKey { resp_tx, _guard });
pipe.add_command(Cmd::hdel(key, field), reply);
}
}
}
@@ -170,8 +232,8 @@ pub async fn handle_cancel_messages(
client: &mut RedisKVClient,
mut rx: mpsc::Receiver<CancelKeyOp>,
) -> anyhow::Result<()> {
let mut batch = Vec::new();
let mut replies = vec![];
let mut batch = Vec::with_capacity(BATCH_SIZE);
let mut pipeline = Pipeline::with_capacity(BATCH_SIZE);
loop {
if rx.recv_many(&mut batch, BATCH_SIZE).await == 0 {
@@ -182,42 +244,11 @@ pub async fn handle_cancel_messages(
let batch_size = batch.len();
debug!(batch_size, "running cancellation jobs");
let mut pipe = pipe();
for msg in batch.drain(..) {
if let Some(reply) = msg.register(&mut pipe) {
replies.push(reply);
} else {
pipe.ignore();
}
msg.register(&mut pipeline);
}
let responses = replies.len();
match client.query(pipe).await {
// for each reply, we expect that many values.
Ok(Value::Array(values)) if values.len() == responses => {
debug!(
batch_size,
responses, "successfully completed cancellation jobs",
);
for (value, reply) in std::iter::zip(values, replies.drain(..)) {
reply.send_value(value);
}
}
Ok(value) => {
debug!(?value, "unexpected redis return value");
for reply in replies.drain(..) {
reply.send_err(anyhow!("incorrect response type from redis"));
}
}
Err(err) => {
for reply in replies.drain(..) {
reply.send_err(anyhow!("could not send cmd to redis: {err}"));
}
}
}
replies.clear();
pipeline.execute(client).await;
}
}

View File

@@ -78,7 +78,7 @@ struct RequestContextInner {
#[derive(Clone, Debug)]
pub(crate) enum AuthMethod {
// aka passwordless, fka link
// aka link
ConsoleRedirect,
ScramSha256,
ScramSha256Plus,

View File

@@ -3,7 +3,7 @@ use std::net::TcpListener;
use std::sync::{Arc, Mutex};
use anyhow::{anyhow, bail};
use http_utils::endpoint::{self, request_span};
use http_utils::endpoint::{self, profile_cpu_handler, profile_heap_handler, request_span};
use http_utils::error::ApiError;
use http_utils::json::json_response;
use http_utils::{RouterBuilder, RouterService};
@@ -33,6 +33,12 @@ fn make_router(metrics: AppMetrics) -> RouterBuilder<hyper0::Body, ApiError> {
request_span(r, move |b| prometheus_metrics_handler(b, state))
})
.get("/v1/status", status_handler)
.get("/profile/cpu", move |r| {
request_span(r, profile_cpu_handler)
})
.get("/profile/heap", move |r| {
request_span(r, profile_heap_handler)
})
}
pub async fn task_main(

View File

@@ -47,7 +47,7 @@ impl RedisKVClient {
pub(crate) async fn query<T: FromRedisValue>(
&mut self,
q: impl Queryable,
q: &impl Queryable,
) -> anyhow::Result<T> {
if !self.limiter.check() {
tracing::info!("Rate limit exceeded. Skipping query");

View File

@@ -1,5 +1,5 @@
[toolchain]
channel = "1.86.0"
channel = "1.87.0"
profile = "default"
# The default profile includes rustc, rust-std, cargo, rust-docs, rustfmt and clippy.
# https://rust-lang.github.io/rustup/concepts/profiles.html

View File

@@ -22,9 +22,10 @@ use safekeeper::defaults::{
DEFAULT_PARTIAL_BACKUP_TIMEOUT, DEFAULT_PG_LISTEN_ADDR, DEFAULT_SSL_CERT_FILE,
DEFAULT_SSL_CERT_RELOAD_PERIOD, DEFAULT_SSL_KEY_FILE,
};
use safekeeper::wal_backup::WalBackup;
use safekeeper::{
BACKGROUND_RUNTIME, BROKER_RUNTIME, GlobalTimelines, HTTP_RUNTIME, SafeKeeperConf,
WAL_SERVICE_RUNTIME, broker, control_file, http, wal_backup, wal_service,
WAL_SERVICE_RUNTIME, broker, control_file, http, wal_service,
};
use sd_notify::NotifyState;
use storage_broker::{DEFAULT_ENDPOINT, Uri};
@@ -484,15 +485,15 @@ async fn start_safekeeper(conf: Arc<SafeKeeperConf>) -> Result<()> {
None => None,
};
let global_timelines = Arc::new(GlobalTimelines::new(conf.clone()));
let wal_backup = Arc::new(WalBackup::new(&conf).await?);
let global_timelines = Arc::new(GlobalTimelines::new(conf.clone(), wal_backup.clone()));
// Register metrics collector for active timelines. It's important to do this
// after daemonizing, otherwise process collector will be upset.
let timeline_collector = safekeeper::metrics::TimelineCollector::new(global_timelines.clone());
metrics::register_internal(Box::new(timeline_collector))?;
wal_backup::init_remote_storage(&conf).await;
// Keep handles to main tasks to die if any of them disappears.
let mut tasks_handles: FuturesUnordered<BoxFuture<(String, JoinTaskRes)>> =
FuturesUnordered::new();

View File

@@ -3,6 +3,7 @@ use std::sync::Arc;
use anyhow::{Result, bail};
use camino::Utf8PathBuf;
use postgres_ffi::{MAX_SEND_SIZE, WAL_SEGMENT_SIZE};
use remote_storage::GenericRemoteStorage;
use safekeeper_api::membership::Configuration;
use tokio::fs::OpenOptions;
use tokio::io::{AsyncSeekExt, AsyncWriteExt};
@@ -30,6 +31,7 @@ pub struct Request {
pub async fn handle_request(
request: Request,
global_timelines: Arc<GlobalTimelines>,
storage: Arc<GenericRemoteStorage>,
) -> Result<()> {
// TODO: request.until_lsn MUST be a valid LSN, and we cannot check it :(
// if LSN will point to the middle of a WAL record, timeline will be in "broken" state
@@ -127,6 +129,7 @@ pub async fn handle_request(
assert!(first_ondisk_segment >= first_segment);
copy_s3_segments(
&storage,
wal_seg_size,
&request.source_ttid,
&request.destination_ttid,

View File

@@ -258,6 +258,7 @@ async fn timeline_snapshot_handler(request: Request<Body>) -> Result<Response<Bo
let global_timelines = get_global_timelines(&request);
let tli = global_timelines.get(ttid).map_err(ApiError::from)?;
let storage = global_timelines.get_wal_backup().get_storage();
// To stream the body use wrap_stream which wants Stream of Result<Bytes>,
// so create the chan and write to it in another task.
@@ -269,6 +270,7 @@ async fn timeline_snapshot_handler(request: Request<Body>) -> Result<Response<Bo
conf.my_id,
destination,
tx,
storage,
));
let rx_stream = ReceiverStream::new(rx);
@@ -390,12 +392,18 @@ async fn timeline_copy_handler(mut request: Request<Body>) -> Result<Response<Bo
);
let global_timelines = get_global_timelines(&request);
let wal_backup = global_timelines.get_wal_backup();
let storage = wal_backup
.get_storage()
.ok_or(ApiError::BadRequest(anyhow::anyhow!(
"Remote Storage is not configured"
)))?;
copy_timeline::handle_request(copy_timeline::Request{
source_ttid,
until_lsn: request_data.until_lsn,
destination_ttid: TenantTimelineId::new(source_ttid.tenant_id, request_data.target_timeline_id),
}, global_timelines)
}, global_timelines, storage)
.instrument(info_span!("copy_timeline", from=%source_ttid, to=%request_data.target_timeline_id, until_lsn=%request_data.until_lsn))
.await
.map_err(ApiError::InternalServerError)?;

View File

@@ -125,12 +125,6 @@ pub struct SafeKeeperConf {
pub enable_tls_wal_service_api: bool,
}
impl SafeKeeperConf {
pub fn is_wal_backup_enabled(&self) -> bool {
self.remote_storage.is_some() && self.wal_backup_enabled
}
}
impl SafeKeeperConf {
pub fn dummy() -> Self {
SafeKeeperConf {

View File

@@ -9,6 +9,7 @@ use chrono::{DateTime, Utc};
use futures::{SinkExt, StreamExt, TryStreamExt};
use http_utils::error::ApiError;
use postgres_ffi::{PG_TLI, XLogFileName, XLogSegNo};
use remote_storage::GenericRemoteStorage;
use reqwest::Certificate;
use safekeeper_api::Term;
use safekeeper_api::models::{PullTimelineRequest, PullTimelineResponse, TimelineStatus};
@@ -43,6 +44,7 @@ pub async fn stream_snapshot(
source: NodeId,
destination: NodeId,
tx: mpsc::Sender<Result<Bytes>>,
storage: Option<Arc<GenericRemoteStorage>>,
) {
match tli.try_wal_residence_guard().await {
Err(e) => {
@@ -53,10 +55,32 @@ pub async fn stream_snapshot(
Ok(maybe_resident_tli) => {
if let Err(e) = match maybe_resident_tli {
Some(resident_tli) => {
stream_snapshot_resident_guts(resident_tli, source, destination, tx.clone())
.await
stream_snapshot_resident_guts(
resident_tli,
source,
destination,
tx.clone(),
storage,
)
.await
}
None => {
if let Some(storage) = storage {
stream_snapshot_offloaded_guts(
tli,
source,
destination,
tx.clone(),
&storage,
)
.await
} else {
tx.send(Err(anyhow!("remote storage not configured")))
.await
.ok();
return;
}
}
None => stream_snapshot_offloaded_guts(tli, source, destination, tx.clone()).await,
} {
// Error type/contents don't matter as they won't can't reach the client
// (hyper likely doesn't do anything with it), but http stream will be
@@ -123,10 +147,12 @@ pub(crate) async fn stream_snapshot_offloaded_guts(
source: NodeId,
destination: NodeId,
tx: mpsc::Sender<Result<Bytes>>,
storage: &GenericRemoteStorage,
) -> Result<()> {
let mut ar = prepare_tar_stream(tx);
tli.snapshot_offloaded(&mut ar, source, destination).await?;
tli.snapshot_offloaded(&mut ar, source, destination, storage)
.await?;
ar.finish().await?;
@@ -139,10 +165,13 @@ pub async fn stream_snapshot_resident_guts(
source: NodeId,
destination: NodeId,
tx: mpsc::Sender<Result<Bytes>>,
storage: Option<Arc<GenericRemoteStorage>>,
) -> Result<()> {
let mut ar = prepare_tar_stream(tx);
let bctx = tli.start_snapshot(&mut ar, source, destination).await?;
let bctx = tli
.start_snapshot(&mut ar, source, destination, storage)
.await?;
pausable_failpoint!("sk-snapshot-after-list-pausable");
let tli_dir = tli.get_timeline_dir();
@@ -182,6 +211,7 @@ impl Timeline {
ar: &mut tokio_tar::Builder<W>,
source: NodeId,
destination: NodeId,
storage: &GenericRemoteStorage,
) -> Result<()> {
// Take initial copy of control file, then release state lock
let mut control_file = {
@@ -216,6 +246,7 @@ impl Timeline {
// can fail if the timeline was un-evicted and modified in the background.
let remote_timeline_path = &self.remote_path;
wal_backup::copy_partial_segment(
storage,
&replace.previous.remote_path(remote_timeline_path),
&replace.current.remote_path(remote_timeline_path),
)
@@ -262,6 +293,7 @@ impl WalResidentTimeline {
ar: &mut tokio_tar::Builder<W>,
source: NodeId,
destination: NodeId,
storage: Option<Arc<GenericRemoteStorage>>,
) -> Result<SnapshotContext> {
let mut shared_state = self.write_shared_state().await;
let wal_seg_size = shared_state.get_wal_seg_size();
@@ -283,6 +315,7 @@ impl WalResidentTimeline {
let remote_timeline_path = &self.tli.remote_path;
wal_backup::copy_partial_segment(
&*storage.context("remote storage not configured")?,
&replace.previous.remote_path(remote_timeline_path),
&replace.current.remote_path(remote_timeline_path),
)

View File

@@ -513,7 +513,7 @@ impl SafekeeperPostgresHandler {
let end_pos = end_watch.get();
if end_pos < start_pos {
warn!(
info!(
"requested start_pos {} is ahead of available WAL end_pos {}",
start_pos, end_pos
);

View File

@@ -18,7 +18,7 @@ use crate::send_wal::EndWatch;
use crate::state::{TimelinePersistentState, TimelineState};
use crate::timeline::{SharedState, StateSK, Timeline, get_timeline_dir};
use crate::timelines_set::TimelinesSet;
use crate::wal_backup::remote_timeline_path;
use crate::wal_backup::{WalBackup, remote_timeline_path};
use crate::{SafeKeeperConf, control_file, receive_wal, wal_storage};
/// A Safekeeper testing or benchmarking environment. Uses a tempdir for storage, removed on drop.
@@ -101,18 +101,22 @@ impl Env {
let safekeeper = self.make_safekeeper(node_id, ttid, start_lsn).await?;
let shared_state = SharedState::new(StateSK::Loaded(safekeeper));
let wal_backup = Arc::new(WalBackup::new(&conf).await?);
let timeline = Timeline::new(
ttid,
&timeline_dir,
&remote_path,
shared_state,
conf.clone(),
wal_backup.clone(),
);
timeline.bootstrap(
&mut timeline.write_shared_state().await,
&conf,
Arc::new(TimelinesSet::default()), // ignored for now
RateLimiter::new(0, 0),
wal_backup,
);
Ok(timeline)
}

View File

@@ -35,7 +35,8 @@ use crate::state::{EvictionState, TimelineMemState, TimelinePersistentState, Tim
use crate::timeline_guard::ResidenceGuard;
use crate::timeline_manager::{AtomicStatus, ManagerCtl};
use crate::timelines_set::TimelinesSet;
use crate::wal_backup::{self, remote_timeline_path};
use crate::wal_backup;
use crate::wal_backup::{WalBackup, remote_timeline_path};
use crate::wal_backup_partial::PartialRemoteSegment;
use crate::wal_storage::{Storage as wal_storage_iface, WalReader};
use crate::{SafeKeeperConf, control_file, debug_dump, timeline_manager, wal_storage};
@@ -452,6 +453,8 @@ pub struct Timeline {
manager_ctl: ManagerCtl,
conf: Arc<SafeKeeperConf>,
pub(crate) wal_backup: Arc<WalBackup>,
remote_deletion: std::sync::Mutex<Option<RemoteDeletionReceiver>>,
/// Hold this gate from code that depends on the Timeline's non-shut-down state. While holding
@@ -476,6 +479,7 @@ impl Timeline {
remote_path: &RemotePath,
shared_state: SharedState,
conf: Arc<SafeKeeperConf>,
wal_backup: Arc<WalBackup>,
) -> Arc<Self> {
let (commit_lsn_watch_tx, commit_lsn_watch_rx) =
watch::channel(shared_state.sk.state().commit_lsn);
@@ -509,6 +513,7 @@ impl Timeline {
wal_backup_active: AtomicBool::new(false),
last_removed_segno: AtomicU64::new(0),
mgr_status: AtomicStatus::new(),
wal_backup,
})
}
@@ -516,6 +521,7 @@ impl Timeline {
pub fn load_timeline(
conf: Arc<SafeKeeperConf>,
ttid: TenantTimelineId,
wal_backup: Arc<WalBackup>,
) -> Result<Arc<Timeline>> {
let _enter = info_span!("load_timeline", timeline = %ttid.timeline_id).entered();
@@ -529,6 +535,7 @@ impl Timeline {
&remote_path,
shared_state,
conf,
wal_backup,
))
}
@@ -539,6 +546,7 @@ impl Timeline {
conf: &SafeKeeperConf,
broker_active_set: Arc<TimelinesSet>,
partial_backup_rate_limiter: RateLimiter,
wal_backup: Arc<WalBackup>,
) {
let (tx, rx) = self.manager_ctl.bootstrap_manager();
@@ -561,6 +569,7 @@ impl Timeline {
tx,
rx,
partial_backup_rate_limiter,
wal_backup,
)
.await
}
@@ -606,9 +615,10 @@ impl Timeline {
// it is cancelled, so WAL storage won't be opened again.
shared_state.sk.close_wal_store();
if !only_local && self.conf.is_wal_backup_enabled() {
if !only_local {
self.remote_delete().await?;
}
let dir_existed = delete_dir(&self.timeline_dir).await?;
Ok(dir_existed)
}
@@ -675,11 +685,20 @@ impl Timeline {
guard: &mut std::sync::MutexGuard<Option<RemoteDeletionReceiver>>,
) -> RemoteDeletionReceiver {
tracing::info!("starting remote deletion");
let storage = self.wal_backup.get_storage().clone();
let (result_tx, result_rx) = tokio::sync::watch::channel(None);
let ttid = self.ttid;
tokio::task::spawn(
async move {
let r = wal_backup::delete_timeline(&ttid).await;
let r = if let Some(storage) = storage {
wal_backup::delete_timeline(&storage, &ttid).await
} else {
tracing::info!(
"skipping remote deletion because no remote storage is configured; this effectively leaks the objects in remote storage"
);
Ok(())
};
if let Err(e) = &r {
// Log error here in case nobody ever listens for our result (e.g. dropped API request)
tracing::error!("remote deletion failed: {e}");
@@ -1046,14 +1065,13 @@ impl WalResidentTimeline {
pub async fn get_walreader(&self, start_lsn: Lsn) -> Result<WalReader> {
let (_, persisted_state) = self.get_state().await;
let enable_remote_read = self.conf.is_wal_backup_enabled();
WalReader::new(
&self.ttid,
self.timeline_dir.clone(),
&persisted_state,
start_lsn,
enable_remote_read,
self.wal_backup.clone(),
)
}

View File

@@ -6,7 +6,7 @@
use anyhow::Context;
use camino::Utf8PathBuf;
use remote_storage::RemotePath;
use remote_storage::{GenericRemoteStorage, RemotePath};
use tokio::fs::File;
use tokio::io::{AsyncRead, AsyncWriteExt};
use tracing::{debug, info, instrument, warn};
@@ -68,6 +68,10 @@ impl Manager {
#[instrument(name = "evict_timeline", skip_all)]
pub(crate) async fn evict_timeline(&mut self) -> bool {
assert!(!self.is_offloaded);
let Some(storage) = self.wal_backup.get_storage() else {
warn!("no remote storage configured, skipping uneviction");
return false;
};
let partial_backup_uploaded = match &self.partial_backup_uploaded {
Some(p) => p.clone(),
None => {
@@ -87,7 +91,7 @@ impl Manager {
.inc();
});
if let Err(e) = do_eviction(self, &partial_backup_uploaded).await {
if let Err(e) = do_eviction(self, &partial_backup_uploaded, &storage).await {
warn!("failed to evict timeline: {:?}", e);
return false;
}
@@ -102,6 +106,10 @@ impl Manager {
#[instrument(name = "unevict_timeline", skip_all)]
pub(crate) async fn unevict_timeline(&mut self) {
assert!(self.is_offloaded);
let Some(storage) = self.wal_backup.get_storage() else {
warn!("no remote storage configured, skipping uneviction");
return;
};
let partial_backup_uploaded = match &self.partial_backup_uploaded {
Some(p) => p.clone(),
None => {
@@ -121,7 +129,7 @@ impl Manager {
.inc();
});
if let Err(e) = do_uneviction(self, &partial_backup_uploaded).await {
if let Err(e) = do_uneviction(self, &partial_backup_uploaded, &storage).await {
warn!("failed to unevict timeline: {:?}", e);
return;
}
@@ -137,8 +145,12 @@ impl Manager {
/// Ensure that content matches the remote partial backup, if local segment exists.
/// Then change state in control file and in-memory. If `delete_offloaded_wal` is set,
/// delete the local segment.
async fn do_eviction(mgr: &mut Manager, partial: &PartialRemoteSegment) -> anyhow::Result<()> {
compare_local_segment_with_remote(mgr, partial).await?;
async fn do_eviction(
mgr: &mut Manager,
partial: &PartialRemoteSegment,
storage: &GenericRemoteStorage,
) -> anyhow::Result<()> {
compare_local_segment_with_remote(mgr, partial, storage).await?;
mgr.tli.switch_to_offloaded(partial).await?;
// switch manager state as soon as possible
@@ -153,12 +165,16 @@ async fn do_eviction(mgr: &mut Manager, partial: &PartialRemoteSegment) -> anyho
/// Ensure that content matches the remote partial backup, if local segment exists.
/// Then download segment to local disk and change state in control file and in-memory.
async fn do_uneviction(mgr: &mut Manager, partial: &PartialRemoteSegment) -> anyhow::Result<()> {
async fn do_uneviction(
mgr: &mut Manager,
partial: &PartialRemoteSegment,
storage: &GenericRemoteStorage,
) -> anyhow::Result<()> {
// if the local segment is present, validate it
compare_local_segment_with_remote(mgr, partial).await?;
compare_local_segment_with_remote(mgr, partial, storage).await?;
// atomically download the partial segment
redownload_partial_segment(mgr, partial).await?;
redownload_partial_segment(mgr, partial, storage).await?;
mgr.tli.switch_to_present().await?;
// switch manager state as soon as possible
@@ -181,6 +197,7 @@ async fn delete_local_segment(mgr: &Manager, partial: &PartialRemoteSegment) ->
async fn redownload_partial_segment(
mgr: &Manager,
partial: &PartialRemoteSegment,
storage: &GenericRemoteStorage,
) -> anyhow::Result<()> {
let tmp_file = mgr.tli.timeline_dir().join("remote_partial.tmp");
let remote_segfile = remote_segment_path(mgr, partial);
@@ -190,7 +207,7 @@ async fn redownload_partial_segment(
remote_segfile, tmp_file
);
let mut reader = wal_backup::read_object(&remote_segfile, 0).await?;
let mut reader = wal_backup::read_object(storage, &remote_segfile, 0).await?;
let mut file = File::create(&tmp_file).await?;
let actual_len = tokio::io::copy(&mut reader, &mut file).await?;
@@ -234,13 +251,16 @@ async fn redownload_partial_segment(
async fn compare_local_segment_with_remote(
mgr: &Manager,
partial: &PartialRemoteSegment,
storage: &GenericRemoteStorage,
) -> anyhow::Result<()> {
let local_path = local_segment_path(mgr, partial);
match File::open(&local_path).await {
Ok(mut local_file) => do_validation(mgr, &mut local_file, mgr.wal_seg_size, partial)
.await
.context("validation failed"),
Ok(mut local_file) => {
do_validation(mgr, &mut local_file, mgr.wal_seg_size, partial, storage)
.await
.context("validation failed")
}
Err(_) => {
info!(
"local WAL file {} is not present, skipping validation",
@@ -258,6 +278,7 @@ async fn do_validation(
file: &mut File,
wal_seg_size: usize,
partial: &PartialRemoteSegment,
storage: &GenericRemoteStorage,
) -> anyhow::Result<()> {
let local_size = file.metadata().await?.len() as usize;
if local_size != wal_seg_size {
@@ -270,7 +291,7 @@ async fn do_validation(
let remote_segfile = remote_segment_path(mgr, partial);
let mut remote_reader: std::pin::Pin<Box<dyn AsyncRead + Send + Sync>> =
wal_backup::read_object(&remote_segfile, 0).await?;
wal_backup::read_object(storage, &remote_segfile, 0).await?;
// remote segment should have bytes excatly up to `flush_lsn`
let expected_remote_size = partial.flush_lsn.segment_offset(mgr.wal_seg_size);

View File

@@ -35,7 +35,7 @@ use crate::state::TimelineState;
use crate::timeline::{ManagerTimeline, ReadGuardSharedState, StateSK, WalResidentTimeline};
use crate::timeline_guard::{AccessService, GuardId, ResidenceGuard};
use crate::timelines_set::{TimelineSetGuard, TimelinesSet};
use crate::wal_backup::{self, WalBackupTaskHandle};
use crate::wal_backup::{self, WalBackup, WalBackupTaskHandle};
use crate::wal_backup_partial::{self, PartialBackup, PartialRemoteSegment};
pub(crate) struct StateSnapshot {
@@ -200,6 +200,7 @@ pub(crate) struct Manager {
pub(crate) conf: SafeKeeperConf,
pub(crate) wal_seg_size: usize,
pub(crate) walsenders: Arc<WalSenders>,
pub(crate) wal_backup: Arc<WalBackup>,
// current state
pub(crate) state_version_rx: tokio::sync::watch::Receiver<usize>,
@@ -238,6 +239,7 @@ pub async fn main_task(
manager_tx: tokio::sync::mpsc::UnboundedSender<ManagerCtlMessage>,
mut manager_rx: tokio::sync::mpsc::UnboundedReceiver<ManagerCtlMessage>,
global_rate_limiter: RateLimiter,
wal_backup: Arc<WalBackup>,
) {
tli.set_status(Status::Started);
@@ -256,6 +258,7 @@ pub async fn main_task(
broker_active_set,
manager_tx,
global_rate_limiter,
wal_backup,
)
.await;
@@ -371,7 +374,7 @@ pub async fn main_task(
mgr.tli_broker_active.set(false);
// shutdown background tasks
if mgr.conf.is_wal_backup_enabled() {
if let Some(storage) = mgr.wal_backup.get_storage() {
if let Some(backup_task) = mgr.backup_task.take() {
// If we fell through here, then the timeline is shutting down. This is important
// because otherwise joining on the wal_backup handle might hang.
@@ -379,7 +382,7 @@ pub async fn main_task(
backup_task.join().await;
}
wal_backup::update_task(&mut mgr, false, &last_state).await;
wal_backup::update_task(&mut mgr, storage, false, &last_state).await;
}
if let Some(recovery_task) = &mut mgr.recovery_task {
@@ -415,11 +418,13 @@ impl Manager {
broker_active_set: Arc<TimelinesSet>,
manager_tx: tokio::sync::mpsc::UnboundedSender<ManagerCtlMessage>,
global_rate_limiter: RateLimiter,
wal_backup: Arc<WalBackup>,
) -> Manager {
let (is_offloaded, partial_backup_uploaded) = tli.bootstrap_mgr().await;
Manager {
wal_seg_size: tli.get_wal_seg_size().await,
walsenders: tli.get_walsenders().clone(),
wal_backup,
state_version_rx: tli.get_state_version_rx(),
num_computes_rx: tli.get_walreceivers().get_num_rx(),
tli_broker_active: broker_active_set.guard(tli.clone()),
@@ -477,8 +482,8 @@ impl Manager {
let is_wal_backup_required =
wal_backup::is_wal_backup_required(self.wal_seg_size, num_computes, state);
if self.conf.is_wal_backup_enabled() {
wal_backup::update_task(self, is_wal_backup_required, state).await;
if let Some(storage) = self.wal_backup.get_storage() {
wal_backup::update_task(self, storage, is_wal_backup_required, state).await;
}
// update the state in Arc<Timeline>
@@ -624,9 +629,9 @@ impl Manager {
/// Spawns partial WAL backup task if needed.
async fn update_partial_backup(&mut self, state: &StateSnapshot) {
// check if WAL backup is enabled and should be started
if !self.conf.is_wal_backup_enabled() {
let Some(storage) = self.wal_backup.get_storage() else {
return;
}
};
if self.partial_backup_task.is_some() {
// partial backup is already running
@@ -650,6 +655,7 @@ impl Manager {
self.conf.clone(),
self.global_rate_limiter.clone(),
cancel.clone(),
storage,
));
self.partial_backup_task = Some((handle, cancel));
}
@@ -669,6 +675,10 @@ impl Manager {
/// Reset partial backup state and remove its remote storage data. Since it
/// might concurrently uploading something, cancel the task first.
async fn backup_partial_reset(&mut self) -> anyhow::Result<Vec<String>> {
let Some(storage) = self.wal_backup.get_storage() else {
anyhow::bail!("remote storage is not enabled");
};
info!("resetting partial backup state");
// Force unevict timeline if it is evicted before erasing partial backup
// state. The intended use of this function is to drop corrupted remote
@@ -689,7 +699,7 @@ impl Manager {
}
let tli = self.wal_resident_timeline()?;
let mut partial_backup = PartialBackup::new(tli, self.conf.clone()).await;
let mut partial_backup = PartialBackup::new(tli, self.conf.clone(), storage).await;
// Reset might fail e.g. when cfile is already reset but s3 removal
// failed, so set manager state to None beforehand. In any case caller
// is expected to retry until success.

View File

@@ -25,6 +25,7 @@ use crate::rate_limit::RateLimiter;
use crate::state::TimelinePersistentState;
use crate::timeline::{Timeline, TimelineError, delete_dir, get_tenant_dir, get_timeline_dir};
use crate::timelines_set::TimelinesSet;
use crate::wal_backup::WalBackup;
use crate::wal_storage::Storage;
use crate::{SafeKeeperConf, control_file, wal_storage};
@@ -47,15 +48,24 @@ struct GlobalTimelinesState {
conf: Arc<SafeKeeperConf>,
broker_active_set: Arc<TimelinesSet>,
global_rate_limiter: RateLimiter,
wal_backup: Arc<WalBackup>,
}
impl GlobalTimelinesState {
/// Get dependencies for a timeline constructor.
fn get_dependencies(&self) -> (Arc<SafeKeeperConf>, Arc<TimelinesSet>, RateLimiter) {
fn get_dependencies(
&self,
) -> (
Arc<SafeKeeperConf>,
Arc<TimelinesSet>,
RateLimiter,
Arc<WalBackup>,
) {
(
self.conf.clone(),
self.broker_active_set.clone(),
self.global_rate_limiter.clone(),
self.wal_backup.clone(),
)
}
@@ -84,7 +94,7 @@ pub struct GlobalTimelines {
impl GlobalTimelines {
/// Create a new instance of the global timelines map.
pub fn new(conf: Arc<SafeKeeperConf>) -> Self {
pub fn new(conf: Arc<SafeKeeperConf>, wal_backup: Arc<WalBackup>) -> Self {
Self {
state: Mutex::new(GlobalTimelinesState {
timelines: HashMap::new(),
@@ -92,6 +102,7 @@ impl GlobalTimelines {
conf,
broker_active_set: Arc::new(TimelinesSet::default()),
global_rate_limiter: RateLimiter::new(1, 1),
wal_backup,
}),
}
}
@@ -147,7 +158,7 @@ impl GlobalTimelines {
/// just lock and unlock it for each timeline -- this function is called
/// during init when nothing else is running, so this is fine.
async fn load_tenant_timelines(&self, tenant_id: TenantId) -> Result<()> {
let (conf, broker_active_set, partial_backup_rate_limiter) = {
let (conf, broker_active_set, partial_backup_rate_limiter, wal_backup) = {
let state = self.state.lock().unwrap();
state.get_dependencies()
};
@@ -162,7 +173,7 @@ impl GlobalTimelines {
TimelineId::from_str(timeline_dir_entry.file_name().to_str().unwrap_or(""))
{
let ttid = TenantTimelineId::new(tenant_id, timeline_id);
match Timeline::load_timeline(conf.clone(), ttid) {
match Timeline::load_timeline(conf.clone(), ttid, wal_backup.clone()) {
Ok(tli) => {
let mut shared_state = tli.write_shared_state().await;
self.state
@@ -175,6 +186,7 @@ impl GlobalTimelines {
&conf,
broker_active_set.clone(),
partial_backup_rate_limiter.clone(),
wal_backup.clone(),
);
}
// If we can't load a timeline, it's most likely because of a corrupted
@@ -212,6 +224,10 @@ impl GlobalTimelines {
self.state.lock().unwrap().broker_active_set.clone()
}
pub fn get_wal_backup(&self) -> Arc<WalBackup> {
self.state.lock().unwrap().wal_backup.clone()
}
/// Create a new timeline with the given id. If the timeline already exists, returns
/// an existing timeline.
pub(crate) async fn create(
@@ -222,7 +238,7 @@ impl GlobalTimelines {
start_lsn: Lsn,
commit_lsn: Lsn,
) -> Result<Arc<Timeline>> {
let (conf, _, _) = {
let (conf, _, _, _) = {
let state = self.state.lock().unwrap();
if let Ok(timeline) = state.get(&ttid) {
// Timeline already exists, return it.
@@ -267,7 +283,7 @@ impl GlobalTimelines {
check_tombstone: bool,
) -> Result<Arc<Timeline>> {
// Check for existence and mark that we're creating it.
let (conf, broker_active_set, partial_backup_rate_limiter) = {
let (conf, broker_active_set, partial_backup_rate_limiter, wal_backup) = {
let mut state = self.state.lock().unwrap();
match state.timelines.get(&ttid) {
Some(GlobalMapTimeline::CreationInProgress) => {
@@ -296,7 +312,14 @@ impl GlobalTimelines {
};
// Do the actual move and reflect the result in the map.
match GlobalTimelines::install_temp_timeline(ttid, tmp_path, conf.clone()).await {
match GlobalTimelines::install_temp_timeline(
ttid,
tmp_path,
conf.clone(),
wal_backup.clone(),
)
.await
{
Ok(timeline) => {
let mut timeline_shared_state = timeline.write_shared_state().await;
let mut state = self.state.lock().unwrap();
@@ -314,6 +337,7 @@ impl GlobalTimelines {
&conf,
broker_active_set,
partial_backup_rate_limiter,
wal_backup,
);
drop(timeline_shared_state);
Ok(timeline)
@@ -336,6 +360,7 @@ impl GlobalTimelines {
ttid: TenantTimelineId,
tmp_path: &Utf8PathBuf,
conf: Arc<SafeKeeperConf>,
wal_backup: Arc<WalBackup>,
) -> Result<Arc<Timeline>> {
let tenant_path = get_tenant_dir(conf.as_ref(), &ttid.tenant_id);
let timeline_path = get_timeline_dir(conf.as_ref(), &ttid);
@@ -377,7 +402,7 @@ impl GlobalTimelines {
// Do the move.
durable_rename(tmp_path, &timeline_path, !conf.no_sync).await?;
Timeline::load_timeline(conf, ttid)
Timeline::load_timeline(conf, ttid, wal_backup)
}
/// Get a timeline from the global map. If it's not present, it doesn't exist on disk,

View File

@@ -2,6 +2,7 @@ use std::cmp::min;
use std::collections::HashSet;
use std::num::NonZeroU32;
use std::pin::Pin;
use std::sync::Arc;
use std::time::Duration;
use anyhow::{Context, Result};
@@ -17,7 +18,7 @@ use safekeeper_api::models::PeerInfo;
use tokio::fs::File;
use tokio::select;
use tokio::sync::mpsc::{self, Receiver, Sender};
use tokio::sync::{OnceCell, watch};
use tokio::sync::watch;
use tokio::task::JoinHandle;
use tokio_util::sync::CancellationToken;
use tracing::*;
@@ -63,7 +64,12 @@ pub(crate) fn is_wal_backup_required(
/// Based on peer information determine which safekeeper should offload; if it
/// is me, run (per timeline) task, if not yet. OTOH, if it is not me and task
/// is running, kill it.
pub(crate) async fn update_task(mgr: &mut Manager, need_backup: bool, state: &StateSnapshot) {
pub(crate) async fn update_task(
mgr: &mut Manager,
storage: Arc<GenericRemoteStorage>,
need_backup: bool,
state: &StateSnapshot,
) {
let (offloader, election_dbg_str) =
determine_offloader(&state.peers, state.backup_lsn, mgr.tli.ttid, &mgr.conf);
let elected_me = Some(mgr.conf.my_id) == offloader;
@@ -82,7 +88,12 @@ pub(crate) async fn update_task(mgr: &mut Manager, need_backup: bool, state: &St
return;
};
let async_task = backup_task_main(resident, mgr.conf.backup_parallel_jobs, shutdown_rx);
let async_task = backup_task_main(
resident,
storage,
mgr.conf.backup_parallel_jobs,
shutdown_rx,
);
let handle = if mgr.conf.current_thread_runtime {
tokio::spawn(async_task)
@@ -169,33 +180,31 @@ fn determine_offloader(
}
}
static REMOTE_STORAGE: OnceCell<Option<GenericRemoteStorage>> = OnceCell::const_new();
// Storage must be configured and initialized when this is called.
fn get_configured_remote_storage() -> &'static GenericRemoteStorage {
REMOTE_STORAGE
.get()
.expect("failed to get remote storage")
.as_ref()
.unwrap()
pub struct WalBackup {
storage: Option<Arc<GenericRemoteStorage>>,
}
pub async fn init_remote_storage(conf: &SafeKeeperConf) {
// TODO: refactor REMOTE_STORAGE to avoid using global variables, and provide
// dependencies to all tasks instead.
REMOTE_STORAGE
.get_or_init(|| async {
if let Some(conf) = conf.remote_storage.as_ref() {
Some(
GenericRemoteStorage::from_config(conf)
.await
.expect("failed to create remote storage"),
)
} else {
None
impl WalBackup {
/// Create a new WalBackup instance.
pub async fn new(conf: &SafeKeeperConf) -> Result<Self> {
if !conf.wal_backup_enabled {
return Ok(Self { storage: None });
}
match conf.remote_storage.as_ref() {
Some(config) => {
let storage = GenericRemoteStorage::from_config(config).await?;
Ok(Self {
storage: Some(Arc::new(storage)),
})
}
})
.await;
None => Ok(Self { storage: None }),
}
}
pub fn get_storage(&self) -> Option<Arc<GenericRemoteStorage>> {
self.storage.clone()
}
}
struct WalBackupTask {
@@ -204,12 +213,14 @@ struct WalBackupTask {
wal_seg_size: usize,
parallel_jobs: usize,
commit_lsn_watch_rx: watch::Receiver<Lsn>,
storage: Arc<GenericRemoteStorage>,
}
/// Offload single timeline.
#[instrument(name = "wal_backup", skip_all, fields(ttid = %tli.ttid))]
async fn backup_task_main(
tli: WalResidentTimeline,
storage: Arc<GenericRemoteStorage>,
parallel_jobs: usize,
mut shutdown_rx: Receiver<()>,
) {
@@ -223,6 +234,7 @@ async fn backup_task_main(
timeline_dir: tli.get_timeline_dir(),
timeline: tli,
parallel_jobs,
storage,
};
// task is spinned up only when wal_seg_size already initialized
@@ -293,6 +305,7 @@ impl WalBackupTask {
match backup_lsn_range(
&self.timeline,
self.storage.clone(),
&mut backup_lsn,
commit_lsn,
self.wal_seg_size,
@@ -322,6 +335,7 @@ impl WalBackupTask {
async fn backup_lsn_range(
timeline: &WalResidentTimeline,
storage: Arc<GenericRemoteStorage>,
backup_lsn: &mut Lsn,
end_lsn: Lsn,
wal_seg_size: usize,
@@ -352,7 +366,12 @@ async fn backup_lsn_range(
loop {
let added_task = match iter.next() {
Some(s) => {
uploads.push_back(backup_single_segment(s, timeline_dir, remote_timeline_path));
uploads.push_back(backup_single_segment(
&storage,
s,
timeline_dir,
remote_timeline_path,
));
true
}
None => false,
@@ -388,6 +407,7 @@ async fn backup_lsn_range(
}
async fn backup_single_segment(
storage: &GenericRemoteStorage,
seg: &Segment,
timeline_dir: &Utf8Path,
remote_timeline_path: &RemotePath,
@@ -395,7 +415,13 @@ async fn backup_single_segment(
let segment_file_path = seg.file_path(timeline_dir)?;
let remote_segment_path = seg.remote_path(remote_timeline_path);
let res = backup_object(&segment_file_path, &remote_segment_path, seg.size()).await;
let res = backup_object(
storage,
&segment_file_path,
&remote_segment_path,
seg.size(),
)
.await;
if res.is_ok() {
BACKED_UP_SEGMENTS.inc();
} else {
@@ -455,12 +481,11 @@ fn get_segments(start: Lsn, end: Lsn, seg_size: usize) -> Vec<Segment> {
}
async fn backup_object(
storage: &GenericRemoteStorage,
source_file: &Utf8Path,
target_file: &RemotePath,
size: usize,
) -> Result<()> {
let storage = get_configured_remote_storage();
let file = File::open(&source_file)
.await
.with_context(|| format!("Failed to open file {source_file:?} for wal backup"))?;
@@ -475,12 +500,11 @@ async fn backup_object(
}
pub(crate) async fn backup_partial_segment(
storage: &GenericRemoteStorage,
source_file: &Utf8Path,
target_file: &RemotePath,
size: usize,
) -> Result<()> {
let storage = get_configured_remote_storage();
let file = File::open(&source_file)
.await
.with_context(|| format!("Failed to open file {source_file:?} for wal backup"))?;
@@ -504,25 +528,20 @@ pub(crate) async fn backup_partial_segment(
}
pub(crate) async fn copy_partial_segment(
storage: &GenericRemoteStorage,
source: &RemotePath,
destination: &RemotePath,
) -> Result<()> {
let storage = get_configured_remote_storage();
let cancel = CancellationToken::new();
storage.copy_object(source, destination, &cancel).await
}
pub async fn read_object(
storage: &GenericRemoteStorage,
file_path: &RemotePath,
offset: u64,
) -> anyhow::Result<Pin<Box<dyn tokio::io::AsyncRead + Send + Sync>>> {
let storage = REMOTE_STORAGE
.get()
.context("Failed to get remote storage")?
.as_ref()
.context("No remote storage configured")?;
info!("segment download about to start from remote path {file_path:?} at offset {offset}");
let cancel = CancellationToken::new();
@@ -547,8 +566,10 @@ pub async fn read_object(
/// Delete WAL files for the given timeline. Remote storage must be configured
/// when called.
pub async fn delete_timeline(ttid: &TenantTimelineId) -> Result<()> {
let storage = get_configured_remote_storage();
pub async fn delete_timeline(
storage: &GenericRemoteStorage,
ttid: &TenantTimelineId,
) -> Result<()> {
let remote_path = remote_timeline_path(ttid)?;
// see DEFAULT_MAX_KEYS_PER_LIST_RESPONSE
@@ -618,14 +639,14 @@ pub async fn delete_timeline(ttid: &TenantTimelineId) -> Result<()> {
}
/// Used by wal_backup_partial.
pub async fn delete_objects(paths: &[RemotePath]) -> Result<()> {
pub async fn delete_objects(storage: &GenericRemoteStorage, paths: &[RemotePath]) -> Result<()> {
let cancel = CancellationToken::new(); // not really used
let storage = get_configured_remote_storage();
storage.delete_objects(paths, &cancel).await
}
/// Copy segments from one timeline to another. Used in copy_timeline.
pub async fn copy_s3_segments(
storage: &GenericRemoteStorage,
wal_seg_size: usize,
src_ttid: &TenantTimelineId,
dst_ttid: &TenantTimelineId,
@@ -634,12 +655,6 @@ pub async fn copy_s3_segments(
) -> Result<()> {
const SEGMENTS_PROGRESS_REPORT_INTERVAL: u64 = 1024;
let storage = REMOTE_STORAGE
.get()
.expect("failed to get remote storage")
.as_ref()
.unwrap();
let remote_dst_path = remote_timeline_path(dst_ttid)?;
let cancel = CancellationToken::new();

View File

@@ -19,9 +19,11 @@
//! file. Code updates state in the control file before doing any S3 operations.
//! This way control file stores information about all potentially existing
//! remote partial segments and can clean them up after uploading a newer version.
use std::sync::Arc;
use camino::Utf8PathBuf;
use postgres_ffi::{PG_TLI, XLogFileName, XLogSegNo};
use remote_storage::RemotePath;
use remote_storage::{GenericRemoteStorage, RemotePath};
use safekeeper_api::Term;
use serde::{Deserialize, Serialize};
use tokio_util::sync::CancellationToken;
@@ -154,12 +156,16 @@ pub struct PartialBackup {
conf: SafeKeeperConf,
local_prefix: Utf8PathBuf,
remote_timeline_path: RemotePath,
storage: Arc<GenericRemoteStorage>,
state: State,
}
impl PartialBackup {
pub async fn new(tli: WalResidentTimeline, conf: SafeKeeperConf) -> PartialBackup {
pub async fn new(
tli: WalResidentTimeline,
conf: SafeKeeperConf,
storage: Arc<GenericRemoteStorage>,
) -> PartialBackup {
let (_, persistent_state) = tli.get_state().await;
let wal_seg_size = tli.get_wal_seg_size().await;
@@ -173,6 +179,7 @@ impl PartialBackup {
conf,
local_prefix,
remote_timeline_path,
storage,
}
}
@@ -240,7 +247,8 @@ impl PartialBackup {
let remote_path = prepared.remote_path(&self.remote_timeline_path);
// Upload first `backup_bytes` bytes of the segment to the remote storage.
wal_backup::backup_partial_segment(&local_path, &remote_path, backup_bytes).await?;
wal_backup::backup_partial_segment(&self.storage, &local_path, &remote_path, backup_bytes)
.await?;
PARTIAL_BACKUP_UPLOADED_BYTES.inc_by(backup_bytes as u64);
// We uploaded the segment, now let's verify that the data is still actual.
@@ -326,7 +334,7 @@ impl PartialBackup {
let remote_path = self.remote_timeline_path.join(seg);
objects_to_delete.push(remote_path);
}
wal_backup::delete_objects(&objects_to_delete).await
wal_backup::delete_objects(&self.storage, &objects_to_delete).await
}
/// Delete all non-Uploaded segments from the remote storage. There should be only one
@@ -424,6 +432,7 @@ pub async fn main_task(
conf: SafeKeeperConf,
limiter: RateLimiter,
cancel: CancellationToken,
storage: Arc<GenericRemoteStorage>,
) -> Option<PartialRemoteSegment> {
debug!("started");
let await_duration = conf.partial_backup_timeout;
@@ -432,7 +441,7 @@ pub async fn main_task(
let mut commit_lsn_rx = tli.get_commit_lsn_watch_rx();
let mut flush_lsn_rx = tli.get_term_flush_lsn_watch_rx();
let mut backup = PartialBackup::new(tli, conf).await;
let mut backup = PartialBackup::new(tli, conf, storage).await;
debug!("state: {:?}", backup.state);

View File

@@ -21,6 +21,7 @@ use postgres_ffi::waldecoder::WalStreamDecoder;
use postgres_ffi::{PG_TLI, XLogFileName, XLogSegNo, dispatch_pgversion};
use pq_proto::SystemId;
use remote_storage::RemotePath;
use std::sync::Arc;
use tokio::fs::{self, File, OpenOptions, remove_file};
use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeekExt, AsyncWriteExt};
use tracing::*;
@@ -32,7 +33,7 @@ use crate::metrics::{
REMOVED_WAL_SEGMENTS, WAL_STORAGE_OPERATION_SECONDS, WalStorageMetrics, time_io_closure,
};
use crate::state::TimelinePersistentState;
use crate::wal_backup::{read_object, remote_timeline_path};
use crate::wal_backup::{WalBackup, read_object, remote_timeline_path};
pub trait Storage {
// Last written LSN.
@@ -645,7 +646,7 @@ pub struct WalReader {
wal_segment: Option<Pin<Box<dyn AsyncRead + Send + Sync>>>,
// S3 will be used to read WAL if LSN is not available locally
enable_remote_read: bool,
wal_backup: Arc<WalBackup>,
// We don't have WAL locally if LSN is less than local_start_lsn
local_start_lsn: Lsn,
@@ -664,7 +665,7 @@ impl WalReader {
timeline_dir: Utf8PathBuf,
state: &TimelinePersistentState,
start_pos: Lsn,
enable_remote_read: bool,
wal_backup: Arc<WalBackup>,
) -> Result<Self> {
if state.server.wal_seg_size == 0 || state.local_start_lsn == Lsn(0) {
bail!("state uninitialized, no data to read");
@@ -693,7 +694,7 @@ impl WalReader {
wal_seg_size: state.server.wal_seg_size as usize,
pos: start_pos,
wal_segment: None,
enable_remote_read,
wal_backup,
local_start_lsn: state.local_start_lsn,
timeline_start_lsn: state.timeline_start_lsn,
pg_version: state.server.pg_version / 10000,
@@ -812,9 +813,9 @@ impl WalReader {
}
// Try to open remote file, if remote reads are enabled
if self.enable_remote_read {
if let Some(storage) = self.wal_backup.get_storage() {
let remote_wal_file_path = self.remote_path.join(&wal_file_name);
return read_object(&remote_wal_file_path, xlogoff as u64).await;
return read_object(&storage, &remote_wal_file_path, xlogoff as u64).await;
}
bail!("WAL segment is not found")

View File

@@ -31,7 +31,7 @@ use pageserver_api::models::{
};
use pageserver_api::shard::TenantShardId;
use pageserver_api::upcall_api::{
PutTimelineImportStatusRequest, ReAttachRequest, ValidateRequest,
PutTimelineImportStatusRequest, ReAttachRequest, TimelineImportStatusRequest, ValidateRequest,
};
use pageserver_client::{BlockUnblock, mgmt_api};
use routerify::Middleware;
@@ -157,6 +157,29 @@ async fn handle_validate(req: Request<Body>) -> Result<Response<Body>, ApiError>
json_response(StatusCode::OK, state.service.validate(validate_req).await?)
}
async fn handle_get_timeline_import_status(req: Request<Body>) -> Result<Response<Body>, ApiError> {
check_permissions(&req, Scope::GenerationsApi)?;
let mut req = match maybe_forward(req).await {
ForwardOutcome::Forwarded(res) => {
return res;
}
ForwardOutcome::NotForwarded(req) => req,
};
let get_req = json_request::<TimelineImportStatusRequest>(&mut req).await?;
let state = get_state(&req);
json_response(
StatusCode::OK,
state
.service
.handle_timeline_shard_import_progress(get_req)
.await?,
)
}
async fn handle_put_timeline_import_status(req: Request<Body>) -> Result<Response<Body>, ApiError> {
check_permissions(&req, Scope::GenerationsApi)?;
@@ -2008,6 +2031,13 @@ pub fn make_router(
.post("/upcall/v1/validate", |r| {
named_request_span(r, handle_validate, RequestName("upcall_v1_validate"))
})
.get("/upcall/v1/timeline_import_status", |r| {
named_request_span(
r,
handle_get_timeline_import_status,
RequestName("upcall_v1_timeline_import_status"),
)
})
.post("/upcall/v1/timeline_import_status", |r| {
named_request_span(
r,

View File

@@ -1,3 +1,5 @@
use std::time::Duration;
use pageserver_api::models::detach_ancestor::AncestorDetached;
use pageserver_api::models::{
DetachBehavior, LocationConfig, LocationConfigListResponse, LsnLease, PageserverUtilization,
@@ -212,6 +214,7 @@ impl PageserverClient {
)
}
#[allow(unused)]
pub(crate) async fn timeline_detail(
&self,
tenant_shard_id: TenantShardId,
@@ -357,4 +360,20 @@ impl PageserverClient {
self.inner.wait_lsn(tenant_shard_id, request).await
)
}
pub(crate) async fn activate_post_import(
&self,
tenant_shard_id: TenantShardId,
timeline_id: TimelineId,
timeline_activate_timeout: Duration,
) -> Result<TimelineInfo> {
measured_request!(
"activate_post_import",
crate::metrics::Method::Put,
&self.node_id_label,
self.inner
.activate_post_import(tenant_shard_id, timeline_id, timeline_activate_timeout)
.await
)
}
}

View File

@@ -1666,6 +1666,39 @@ impl Persistence {
}
}
pub(crate) async fn get_timeline_import(
&self,
tenant_id: TenantId,
timeline_id: TimelineId,
) -> DatabaseResult<Option<TimelineImport>> {
use crate::schema::timeline_imports::dsl;
let persistent_import = self
.with_measured_conn(DatabaseOperation::ListTimelineImports, move |conn| {
Box::pin(async move {
let mut from_db: Vec<TimelineImportPersistence> = dsl::timeline_imports
.filter(dsl::tenant_id.eq(tenant_id.to_string()))
.filter(dsl::timeline_id.eq(timeline_id.to_string()))
.load(conn)
.await?;
if from_db.len() > 1 {
return Err(DatabaseError::Logical(format!(
"unexpected number of rows ({})",
from_db.len()
)));
}
Ok(from_db.pop())
})
})
.await?;
persistent_import
.map(TimelineImport::from_persistent)
.transpose()
.map_err(|err| DatabaseError::Logical(format!("failed to deserialize import: {err}")))
}
pub(crate) async fn delete_timeline_import(
&self,
tenant_id: TenantId,

Some files were not shown because too many files have changed in this diff Show More