Timeline::activate() was only fallible because `launch_wal_receiver` was.
`launch_wal_receiver` was fallible only because of some preliminary
checks in `WalReceiver::start`.
Turns out these checks can be shifted to the type system by delaying
creatinon of the `WalReceiver` struct to the point where we activate the timeline.
The changes in this PR were enabled by my previous refactoring that funneled
the broker_client from pageserver startup to the activate() call sites.
(This is prep work to make `Timeline::activate` infallible.)
This patch removes the global storage_broker client instance from the
pageserver codebase.
Instead, pageserver startup instantiates it and passes it down to
the `Timeline::activate` function, which in turn passes it to
the WalReceiver, which is the entity that actually uses it.
(This is prep work to make `Timeline::activate()` infallible.)
The current possibility for failure in `Timeline::activate()`
is the broker client's presence / absence. It should be an assert, but
we're careful with these. So, I'm planning to pass in the broker
client to activate(), thereby eliminating the possiblity of its absence.
In the unit tests, we don't have a broker client. So, I thought I'd be
in trouble because the unit tests also called `activate()` before this
PR.
However, closer inspection reveals a long-standing FIXME about this,
which is addressed by this patch.
It turns out that the unit tests don't actually need the background
loops to be running. They just need the state value to be `Active`.
So, for the tests, we just set it to that value but don't spawn
the background loops.
We'll need to revisit this if we ever do more Rust unit tests in the
future. But right now, this refactoring improves the code, so, let's
revisit when we get there.
## Problem
`osgeo.org` is experiencing some problems with DNS resolving which
breaks `compute-node-image` (because it can't download postgis)
## Summary of changes
- Add `140.211.15.30 download.osgeo.org` to /etc/hosts by passing it via
the container option
This PR enforces that the tenant create / update-config APIs reject
requests with unknown fields.
This is a desirable property because some tenant config settings control
the lifetime of user data (e.g., GC horizon or PITR interval).
Suppose we inadvertently rename the `pitr_interval` field in the Rust
code.
Then, right now, a client that still uses the old name will send a
tenant config request to configure a new PITR interval.
Before this PR, we would accept such a request, ignore the old name
field, and use the pageserver.toml default value for what the new PITR
interval is.
With this PR, we will instead reject such a request.
One might argue that the client could simply check whether the config it
sent has been applied, using the `/v1/tenant/.../config` endpoint.
That is correct for tenant create and update-config.
But, attach will soon [^1] grow the ability to have attach-time config
as well.
If we ignore unknown fields and fall back to global defaults in that
case, we risk data loss.
Example:
1. Default PITR in pageservers is 7 days.
2. Create a tenant and set its PITR to 30 days.
3. For 30 days, fill the tenant continuously with data.
4. Detach the tenant.
5. Attach tenant.
Attach must use the 30-day PITR setting in this scenario.
If it were to fall back to the 7-day default value, we would lose 23
days of PITR capability for the tenant.
So, the PR that adds attach-time tenant config will build on the
(clunky) infrastructure added in this PR
[^1]: https://github.com/neondatabase/neon/pull/4255
Implementation Notes
====================
This could have been a simple `#[serde(deny_unknown_fields)]` but sadly,
that is documented- but silent-at-compile-time-incompatible with
`#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct.
`neon_local tenant config` now uses the `.remove()` pattern + bail if
there are leftover config args. That's in line with what
`neon_local tenant create` does. We should dedupe that logic in a future
PR.
---------
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: Alex Chi <iskyzh@gmail.com>
Previously we didn't handle XACT_XINFO_HAS_INVALS and XACT_XINFO_HAS_DROPPED_STAT correctly,
which led to getting incorrect value of twophase_xid for records with XACT_XINFO_HAS_TWOPHASE.
This caused 'twophase file for xid {} does not exist' errors in test_isolation
We had a hot debate on whether we should try to make our code
cancellation-safe, or just accept that it's not, and make sure that
our Futures are driven to completion. The decision is that we drive
Futures to completion. This documents the decision, and summarizes the
reasoning for that.
Discussion that sparked this:
https://github.com/neondatabase/neon/pull/4198#discussion_r1190209316
Disable background tasks to not get compaction downloading all layers
but also stop safekeepers before checkpointing, use a readonly endpoint.
Fixes: #3666
Co-authored-by: Christian Schwarz <christian@neon.tech>
- Group tests by Postgres version
- Merge different build types
- Add a command to GitHub comment on how to rerun all failed tests
(different command for different Postgres versions)
- Restore a link to a test report in the build summary
This is prep for https://github.com/neondatabase/neon/pull/4255
[1/X] OpenAPI: share a single definition of TenantConfig
DRYs up the pageserver OpenAPI YAML's representation of
tenant config.
All the fields of tenant config are now located in a model schema
called TenantConfig.
The tenant create & config-change endpoints have separate schemas,
TenantCreateInfo and TenantConfigureArg, respectively.
These schemas inherit from TenantConfig, using allOf 1.
The tenant config-GET handler's response was previously named
TenantConfig.
It's now named TenantConfigResponse.
None of these changes affect how the request looks on the wire.
The generated Go code will change for Console because the OpenAPI code
generator maps `allOf` to a Go struct embedding.
Luckily, usage of tenant config in Console is still very lightweigt,
but that will change in the near future.
So, this is a good chance to set things straight.
The console changes are tracked in
https://github.com/neondatabase/cloud/pull/5046
[2/x]: extract the tenant config parts of create & config requests
[3/x]: code movement: move TenantConfigRequestConfig next to
TenantCreateRequestConfig
[4/x] type-alias TenantConfigRequestConfig = TenantCreateRequestConfig;
They are exactly the same.
[5/x] switch to qualified use for tenant create/config request api
models
[6/x] rename models::TenantConfig{RequestConfig,} and remove the alias
[7/x] OpenAPI: sync tenant create & configure body names from Rust code
[8/x]: dedupe the two TryFrom<...> for TenantConfOpt impls
The only difference is that the TenantConfigRequest impl does
```
tenant_conf.max_lsn_wal_lag = request_data.max_lsn_wal_lag;
tenant_conf.trace_read_requests = request_data.trace_read_requests;
```
and the TenantCreateRequest impl does
```
if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag {
tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag);
}
if let Some(trace_read_requests) = request_data.trace_read_requests {
tenant_conf.trace_read_requests = Some(trace_read_requests);
}
```
As far as I can tell, these are identical.
## Problem
`neondatabase/zenith-coverage-data` is too big:
- It takes ~6 minutes to clone and push the repo
- GitHub fails to publish an HTML report to github.io
Part of https://github.com/neondatabase/neon/issues/3543
## Summary of changes
Replace pushing code coverage report to
`neondatabase/zenith-coverage-data` with uploading it to S3
With this patch, the attach handler now follows the same pattern as
tenant create with regards to instantiation of the new tenant:
1. Prepare on-disk state using `create_tenant_files`.
2. Use the same code path as pageserver startup to load it into memory
and start background loops (`schedule_local_tenant_processing`).
It's a bit sad we can't use the
`PageServerConfig::tenant_attaching_mark_file_path` method inside
`create_tenant_files` because it operates in a temporary directory.
However, it's a small price to pay for the gained simplicity.
During implementation, I noticed that we don't handle failures post
`create_tenant_files` well. I left TODO comments in the code linking to
the issue that I created for this [^1].
Also, I'll dedupe the spawn_load and spawn_attach code in a future
commit.
refs https://github.com/neondatabase/neon/issues/1555
part of https://github.com/neondatabase/neon/issues/886 (Tenant
Relocation)
[^1]: https://github.com/neondatabase/neon/issues/4233
## Problem
Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).
Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129
## Summary of changes
Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
This commit replaces all usages of connection_manager.rs:
wait_for_active_timeline with Timeline::wait_to_become_active.
wait_to_become_active is better and in the right module.
close https://github.com/neondatabase/neon/issues/4189
Co-authored-by: Shany Pozin <shany@neon.tech>
Use an enum instead of an array. Before that there was no connection
between definition of the metric and point where it was used aside from
matching string literals. Now its possible to use IDE features to check
for references. Also this allows to avoid mismatch between set of
metrics that was defined and set of metrics that was actually used
What is interesting is that `init logical size` case is not used. I
think `LogicalSize` is a duplicate of `InitLogicalSize`. So removed the latter.
Conflicts:
- Changes in PG15's xlogrecovery.c resulted in non-substantial conflicts between
ecb01e6ebb5a67f3fc00840695682a8b1ba40461 and
aee72b7be903e52d9bdc6449aa4c17fb852d8708
Fixes#4207
This PR is simply the patch from
https://github.com/neondatabase/neon/issues/4008 except we enabled
`force_path_style` for custom endpoints. This is because at some
version, the s3 sdk by default uses the virtual-host style access, which
is not supported by MinIO in the default configuration. By enforcing
path style access for custom endpoints, we can pass all e2e test cases.
SDK 0.55 is not the latest version and we can bump it further later when
all flaky tests in this PR are resolved.
This PR also (hopefully) fixes flaky test
`test_ondemand_download_timetravel`.
close https://github.com/neondatabase/neon/issues/4008
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Should fix flakiness caused by the error
```
FATAL: could not resize shared memory segment "/PostgreSQL.3944613150" to 1048576 bytes: No space left on device
```
## Describe your changes
`pageserver_disconnect()` calls `PQfinish()` which deallocates resources
on the connection structure. `PQerrorMessage()` hands back a pointer to
an allocated resource. Duplicate the error message prior to calling
`pageserver_disconnect()`.
## Issue ticket number and link
Fixes https://github.com/neondatabase/neon/issues/4214
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [x] Do not forget to reformat commit message to not include the above
checklist
Await for upload to complete before returning 201 Created on
`branch_timeline` or when `bootstrap_timeline` happens. Should either of
those waits fail, then on the retried request await for uploads again.
This should work as expected assuming control-plane does not start to
use timeline creation as a wait_for_upload mechanism.
Fixes#3865, started from
https://github.com/neondatabase/neon/pull/3857/files#r1144468177
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
After tenant attach, there is a window where the child timeline is
loaded and accepts GetPage requests, but its parent is not. If a
GetPage request needs to traverse to the parent, it needs to wait for
the parent timeline to become active, or it might miss some records on
the parent timeline.
It's also possible that the parent timeline is active, but it hasn't
yet received all the WAL up to the branch point from the safekeeper.
This happens if a pageserver crashes soon after creating a timeline,
so that the WAL leading to the branch point has not yet been uploaded
to remote storage. After restart, the WAL will be re-streamed and
ingested from the safekeeper, but that takes a while. Because of that,
it's not enough to check that the parent timeline is active, we also
need to wait for the WAL to arrive on the parent timeline, just like
at the beginning of GetPage handling. We probably should change the
behavior at create_timeline so that a timeline can only be created
after all the WAL up to the branch point has been uploaded to remote
storage, but that's not currently the case and out of scope for this
PR (see github issue #4218).
@NanoBjorn encountered this while working on tenant migration. After
migrating a tenant with a parent and child branch, connecting to the
child branch failed with an error like:
```
FATAL: "base/16385" is not a valid data directory
DETAIL: File "base/16385/PG_VERSION" is missing.
```
This commit adds two tests that reproduce the bug, with slightly
different symptoms.
Before this PR, the gather_inputs() calls made to imitate synthetic size
calculation accesses were accounted towards the real logical size
calculation metric.
This PR forces all callers to declare the cause for making logical size
calculations, making the decision which cause counts towards which
metric explicit.
This is follow-up to
```
commit 1d266a6365
Author: Christian Schwarz <christian@neon.tech>
Date: Thu May 11 16:09:29 2023 +0200
logical size calculation metrics: differentiate regular vs imitated (#4197)
```
After merging this patch, I hope to be able to explain why we have ca
30x more "logical size" ops in prod than "imitate logical size" for any
given observation interval.
refs https://github.com/neondatabase/neon/issues/4154
We already have the warn!() log line for this condition. This PR adds a
corresponding metric on which we can have a dedicated alert. Cheaper and
more reliable than alerting on the logs, because, we run into log rate
limits from time to time these days.
refs https://github.com/neondatabase/neon/issues/4222
This PR adds tests runs on Postgres 15 and created unified Allure report
with results for all tests.
- Split `.github/actions/allure-report` into
`.github/actions/allure-report-store` and
`.github/actions/allure-report-generate`
- Add debug or release pytest parameter for all tests (depending on
`BUILD_TYPE` env variable)
- Add Postgres version as a pytest parameter for all tests (depending on
`DEFAULT_PG_VERSION` env variable)
- Fix `test_wal_restore` and `restore_from_wal.sh` to support path with
`[`/`]` in it (fixed by applying spellcheck to the script and fixing all
warnings), `restore_from_wal_archive.sh` is deleted as unused.
- All known failures on Postgres 15 marked with xfail
While investigating https://github.com/neondatabase/neon/issues/4154 I
found that the `Calculating logical size for timeline` tracing events
created from within the logical size computation code are not always
attributable to the background task that caused it.
My goal is to be able to distinguish in the logs whether a `Calculating
logical size for timeline` was logged as part of a real synthetic size
calculation VS an imitation by the eviction task.
I want this distinction so I can prove my assumption that the disk IO
peaks which we see every 24h on prod are due to eviction's imitate
synthetic size calculations.
The alternative here, which I would have preferred, but is more work:
link RequestContext's into a child->parent list and dump this list when
we log `Calculating logical size for timeline`.
I would have preferred that over what we have in this PR because,
technically, the ondemand logical size computation can outlive the
caller that spawned it. This is against the idea of correctly nested
spans.
I guess in OpenTelemetry land, the correct modelling would be a link
between the caller's span and the task_mgr task's span.
Anyways, I think the case where we hang up on the spawned ondemand
logical size calculation is quite rare. So, I'm willing to tolerate
incorrectly nested spans for these edge-cases.
refs https://github.com/neondatabase/neon/issues/4154