Conflicts:
pageserver/src/tenant.rs
test_runner/fixtures/neon_fixtures.py
test_runner/fixtures/overlayfs.py
test_runner/regress/test_tenant_detach.py
In neon_fixtures.py, retain some of our version of things, I have something in
git stash that won't apply otherwise.
## Problem
The code for tenant create and tenant attach was just a special case of
what upsert_location does.
## Summary of changes
- Use `upsert_location` for create and attach APIs
- Clean up error handling in upsert_location so that it can generate
appropriate HTTP response codes
- Update tests that asserted the old non-idempotent behavior of attach
- Rework the `test_ignore_while_attaching` test, and fix tenant shutdown
during activation, which this test was supposed to cover, but it was
actually just waiting for activation to complete.
Part of #5771
Extracted from https://github.com/neondatabase/neon/pull/6214
This PR makes the test suite sensitive to the new env var
`NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`.
If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs
to duplicate the the snapshot repo dir contents.
Since mounting requires root privileges, we use sudo to perform
the mounts. That, and macOS support, is also why copytree remains
the default.
If we ever run on a filesystem with copy reflink support, we should
consider that as an alternative.
This PR can be tried on a Linux machine on the
`test_backward_compatiblity` test, which uses `from_repo_dir`.
This is a lightweight change to keep the scrubber providing sensible
output when using sharding.
- The timeline count was wrong when using sharding
- When checking for tenant existence, we didn't re-use results between
different shards in the same tenant
Closes: https://github.com/neondatabase/neon/issues/5929
## Problem
`black` is slow sometimes, we can replace it with `ruff format` (a new
feature in 0.1.2 [0]), which produces pretty similar to black style [1].
On my local machine (MacBook M1 Pro 16GB):
```
# `black` on main
$ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ."
Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black .
Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s]
Range (min … max): 3.047 s … 3.354 s 10 runs
```
```
# `ruff format` on the current PR
$ hyperfine "RUFF_NO_CACHE=true poetry run ruff format"
Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format
Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms]
Range (min … max): 267.5 ms … 420.2 ms 10 runs
```
## Summary of changes
- Replace `black` with `ruff format` everywhere
- [0] https://docs.astral.sh/ruff/formatter/
- [1] https://docs.astral.sh/ruff/formatter/#black-compatibility
Implement API for cloning a single timeline inside a safekeeper. Also
add API for calculating a sha256 hash of WAL, which is used in tests.
`/copy` API works by copying objects inside S3 for all but the last
segments, and the last segments are copied on-disk. A special temporary
directory is created for a timeline, because copy can take a lot of
time, especially for large timelines. After all files segments have been
prepared, this directory is mounted to the main tree and timeline is
loaded to memory.
Some caveats:
- large timelines can take a lot of time to copy, because we need to
copy many S3 segments
- caller should wait for HTTP call to finish indefinetely and don't
close the HTTP connection, because it will stop the process, which is
not continued in the background
- `until_lsn` must be a valid LSN, otherwise bad things can happen
- API will return 200 if specified `timeline_id` already exists, even if
it's not a copy
- each safekeeper will try to copy S3 segments, so it's better to not
call this API in-parallel on different safekeepers
It has caveats such as creating half empty segment which can't be
offloaded. Instead we'll pursue approach of pull_timeline, seeding new state
from some peer.
To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL
records torn across several XLogData messages. Add failpoint to safekeeper to
slow down sending. Also check for corrupted WAL complains in standby log.
Make the test a bit simpler in passing, e.g. we don't need explicit commits as
autocommit is enabled by default.
https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719https://github.com/neondatabase/cloud/issues/9057
## Problem
During startup, a tenant delete request might have to retry for many
minutes waiting for a tenant to enter Active state.
## Summary of changes
- Refactor delete_tenant into TenantManager: this is not a functional
change, but will avoid merge conflicts with
https://github.com/neondatabase/neon/pull/6105 later
- Add 412 responses to the swagger definition of this endpoint.
- Use Tenant::wait_to_become_active in `TenantManager::delete_tenant`
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
This is a comment only change.
To ensure that our benchmarking results are fair we need to have correct
stats in catalog. Otherwise optimizer chooses seq scan instead of index
only scan for some queries. Added comment to run vacuum after data prep.
Adds a new disk usage based eviction option, EvictionOrder, which
selects whether to use the current `AbsoluteAccessed` or this new
proposed but not yet tested `RelativeAccessed`. Additionally a fudge
factor was noticed while implementing this, which might help sparing
smaller tenants at the expense of targeting larger tenants.
Cc: #5304
Co-authored-by: Arpad Müller <arpad@neon.tech>
## Problem
Currently a chart of S3 error rate is misleading: it can show errors any
time we are attaching a tenant (probing for index_part generation,
checking for remote delete marker).
Considering 404 successful isn't perfectly elegant, but it enables the
error rate to be used a a more meaningful alert signal: it would
indicate if we were having auth issues, sending bad requests, getting
throttled ,etc.
## Summary of changes
Track 404 requests in the AttemptOutcome::Ok bucket instead of the
AttemptOutcome::Err bucket.
This is a precursor to:
- https://github.com/neondatabase/neon/pull/6185
While that PR contains big changes to neon_local and attachment_service,
this PR contains a few unrelated standalone changes generated while
working on that branch:
- Fix restarting a pageserver when it contains multiple shards for the
same tenant
- When using location_config api to attach a tenant, create its
timelines dir
- Update test paths where generations were previously optional to make
them always-on: this avoids tests having to spuriously assert that
attachment_service is not None in order to make the linter happy.
- Add a TenantShardId python implementation for subsequent use in test
helpers that will be made shard-aware
- Teach scrubber to read across shards when checking for layer
existence: this is a refactor to track the list of existent layers at
tenant-level rather than locally to each timeline. This is a precursor
to testing shard splitting.
## Problem
To understand differences in performance between neon, aurora and rds we
want to collect explain analyze plans and pg_stat_statements for
selected benchmarking runs
## Summary of changes
Add workflow input options to collect explain and pg_stat_statements for
benchmarking workflow
Co-authored-by: BodoBolero <bodobolero@gmail.com>
## Problem
During startup, a client request might have to wait a long time while
the system is busy initializing all the attached tenants, even though
most of the attached tenants probably don't have any client requests to
service, and could wait a bit.
## Summary of changes
- Add a semaphore to limit how many Tenant::spawn()s may concurrently do
I/O to attach their tenant (i.e. read indices from remote storage, scan
local layer files, etc).
- Add Tenant::activate_now, a hook for kicking a tenant in its spawn()
method to skip waiting for the warmup semaphore
- For tenants that attached via warmup semaphore units, wait for logical
size calculation to complete before dropping the warmup units
- Set Tenant::activate_now in `get_active_tenant_with_timeout` (the page
service's path for getting a reference to a tenant).
- Wait for tenant activation in HTTP handlers for timeline creation and
deletion: like page service requests, these require an active tenant and
should prioritize activation if called.
## Problem
Various places in remote storage were not subject to a timeout (thereby
stuck TCP connections could hold things up), and did not respect a
cancellation token (so things like timeline deletion or tenant detach
would have to wait arbitrarily long).
## Summary of changes
- Add download_cancellable and upload_cancellable helpers, and use them
in all the places we wait for remote storage operations (with the
exception of initdb downloads, where it would not have been safe).
- Add a cancellation token arg to `download_retry`.
- Use cancellation token args in various places that were missing one
per #5066Closes: #5066
Why is this only "basic" handling?
- Doesn't express difference between shutdown and errors in return
types, to avoid refactoring all the places that use an anyhow::Error
(these should all eventually return a more structured error type)
- Implements timeouts on top of remote storage, rather than within it:
this means that operations hitting their timeout will lose their
semaphore permit and thereby go to the back of the queue for their
retry.
- Doing a nicer job is tracked in
https://github.com/neondatabase/neon/issues/6096