Use the script like so, against the tenant to duplicate:
poetry run python3 ./test_runner/duplicate_tenant.py 7ea51af32d42bfe7fb93bf5f28114d09 200 8
backup of pageserver.toml
d =1
pg_distrib_dir ='/home/admin/neon-main/pg_install'
http_auth_type ='Trust'
pg_auth_type ='Trust'
listen_http_addr ='127.0.0.1:9898'
listen_pg_addr ='127.0.0.1:64000'
broker_endpoint ='http://127.0.0.1:50051/'
#control_plane_api ='http://127.0.0.1:1234/'
# Initial configuration file created by 'pageserver --init'
#listen_pg_addr = '127.0.0.1:64000'
#listen_http_addr = '127.0.0.1:9898'
#wait_lsn_timeout = '60 s'
#wal_redo_timeout = '60 s'
#max_file_descriptors = 10000
#page_cache_size = 160000
# initial superuser role name to use when creating a new tenant
#initial_superuser_name = 'cloud_admin'
#broker_endpoint = 'http://127.0.0.1:50051'
#log_format = 'plain'
#concurrent_tenant_size_logical_size_queries = '1'
#metric_collection_interval = '10 min'
#cached_metric_collection_interval = '0s'
#synthetic_size_calculation_interval = '10 min'
#disk_usage_based_eviction = { max_usage_pct = .., min_avail_bytes = .., period = "10s"}
#background_task_maximum_delay = '10s'
[tenant_config]
#checkpoint_distance = 268435456 # in bytes
#checkpoint_timeout = 10 m
#compaction_target_size = 134217728 # in bytes
#compaction_period = '20 s'
#compaction_threshold = 10
#gc_period = '1 hr'
#gc_horizon = 67108864
#image_creation_threshold = 3
#pitr_interval = '7 days'
#min_resident_size_override = .. # in bytes
#evictions_low_residence_duration_metric_threshold = '24 hour'
#gc_feedback = false
# make it determinsitic
gc_period = '0s'
checkpoint_timeout = '3650 day'
compaction_period = '20 s'
compaction_threshold = 10
compaction_target_size = 134217728
checkpoint_distance = 268435456
image_creation_threshold = 3
[remote_storage]
local_path = '/home/admin/neon-main/bench_repo_dir/repo/remote_storage_local_fs'
remove http handler
switch to generalized rewrite_summary & impl page_ctl subcommand to use it
WIP: change duplicate_tenant.py script to use the pagectl command
The script works but at restart, we detach the created tenants because
they're not known to the attachment service:
Detaching tenant, control plane omitted it in re-attach response tenant_id=1e399d390e3aee6b11c701cbc716bb6c
=> figure out how to further integrate this
(part of the getpage benchmarking epic #5771)
The plan is to make the benchmarking tool log on stderr and emit results
as JSON on stdout. That way, the test suite can simply take captures
stdout and json.loads() it, while interactive users of the benchmarking
tool have a reasonable experience as well.
Existing logging users continue to print to stdout, so, this change
should be a no-op functionally and performance-wise.
This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml`
communicates more clearly that we're referring to the `main` branch. And
the git revision is still pinned in `Cargo.lock`.
While reviewing code noticed a scary `layer_paths.pop().unwrap()` then
realized this should be further asyncified, something I forgot to do
when I switched the `compact_level0_phase1` back to async in #4938.
This keeps the double-fsync for new deltas as #4749 is still unsolved.
## Problem
Currently, control plane doesn't know about neon_superuser, so if a user
creates a database with owner neon_superuser it causes an exception when
it tries to forward it. It is also currently possible to ALTER ROLE
neon_superuser.
## Summary of changes
Disallow creating database with owner neon_superuser. This is probably
fine, since I don't think you can create a database with owner normal
superuser. Also forbids altering neon_superuser
This change brings down incremental compilation for me
from > 1min to 10s (and this is a pretty old Ryzen 1700X).
More details: "incremental compilation" here means to change one
character
in the `failed to read value from offset` string in `image_layer.rs`.
The command for incremental compilation is `cargo build_testing`.
The system on which I got these numbers uses `mold` via
`~/.cargo/config.toml`.
As a bonus, `rust-gdb` is now at least a little fun again.
Some tests are timing out in debug builds due to these changes.
This PR makes them skip for debug builds.
We run both with debug and release build, so, the loss of coverage is
marginal.
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
A very low number of layer loads have been marked wrongly as permanent,
as I did not remember that `VirtualFile::open` or reading could fail
transiently for contention. Return separate errors for transient and
persistent errors from `{Delta,Image}LayerInner::load`.
Includes drive-by comment changes.
The implementation looks quite ugly because having the same type be both
the inner (operation error) and outer (critical error), but with the
alternatives I tried I did not find a better way.
The longer a pageserver runs, the more walredo processes it accumulates
from tenants that are touched intermittently (e.g. by availability
checks). This can lead to getting OOM killed.
Changes:
- Add an Instant recording the last use of the walredo process for a
tenant
- After compaction iteration in the background task, check for idleness
and stop the walredo process if idle for more than 10x compaction
period.
Cc: #3620
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
First compaction/gc error backoff starts from 0 which is less than 2s
what it was before #5672. This is now fixed to be the intended 2**n.
Additionally noticed the `compaction_iteration` creating an
`anyhow::Error` via `into()` always captures a stacktrace even if we had
a stacktraceful anyhow error within the CompactionError because there is
no stable api for querying that.
Changes the error message encountered when the `neon.max_cluster_size`
limit is reached. Reasoning is that this is user-visible, and so should
*probably* use language that's closer to what users are familiar with.
## Problem
We don't know the number of users with the different kind of
authentication: ["sni", "endpoint in options" (A and B from
[here](https://neon.tech/docs/connect/connection-errors)),
"password_hack"]
## Summary of changes
Collect metrics by sni kind.