## Problem
Once we use async file system APIs for `VirtualFile`, these functions
will also need to be async fn.
## Summary of changes
Makes the functions `open, open_with_options, create,sync_all,with_file`
of `VirtualFile` async fn, including all functions that call it. Like in
the prior PRs, the actual I/O operations are not using async APIs yet,
as per request in the #4743 epic.
We switch towards not using `VirtualFile` in the par_fsync module,
hopefully this is only temporary until we can actually do fully async
I/O in `VirtualFile`. This might cause us to exhaust fd limits in the
tests, but it should only be an issue for the local developer as we have
high ulimits in prod.
This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.
It's a good idea to keep up-to-date in general. One noteworthy change is
that PostGIS 3.3.3 adds support for PostgreSQL v16. We'll need that.
PostGIS 3.4.0 has already been released, and we should consider
upgrading to that. However, it's a major upgrade and requires running
"SELECT postgis_extensions_upgrade();" in each database, to upgrade the
catalogs. I don't want to deal with that right now.
## Problem
We likely need this to support Postgres 16
It's also been asked by a user
https://github.com/neondatabase/neon/discussions/5042
The latest version is 3.2.0, but it requires some changes in the build
script (which I haven't checked, but it didn't work right away)
## Summary of changes
```
3.1.8 2023-08-01
- force v8 to compile in release mode
3.1.7 2023-06-26
- fix byteoffset issue with arraybuffers
- support postgres 16 beta
3.1.6 2023-04-08
- fix crash issue on fetch apply
- fix interrupt issue
```
From https://github.com/plv8/plv8/blob/v3.1.8/Changes
## Problem
We've got `approved-for-ci-run` to work 🎉
But it's still a bit rough, this PR should improve the UX for external
contributors.
## Summary of changes
- `build_and_test.yml`: add `check-permissions` job, which fails if PR is
created from a fork. Make all jobs in the workflow to be dependant on
`check-permission` to fail fast
- `approved-for-ci-run.yml`: add `cleanup` job to close `ci-run/pr-*`
PRs and delete linked branches when the parent PR is closed
- `approved-for-ci-run.yml`: fix the layout for the `ci-run/pr-*` PR
description
- GitHub Autocomment: add a comment with tests result to the original PR
(instead of a PR from `ci-run/pr-*` )
Add a `walreceiver_state` field to `TimelineInfo` (response of `GET /v1/tenant/:tenant_id/timeline/:timeline_id`) and while doing that, refactor out a common `Timeline::walreceiver_state(..)`. No OpenAPI changes, because this is an internal debugging addition.
Fixes#3115.
Co-authored-by: Joonas Koivunen <joonas.koivunen@gmail.com>
It was easy to interpret comment in the page cache initialization code
to be about justifying why we leak here at all, not just why this
specific type of leaking is done (which the comment was actually meant
to describe).
See
https://github.com/neondatabase/neon/pull/5125#discussion_r1308445993
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
- `SCALE: unbound variable` from
https://github.com/neondatabase/neon/pull/5079
- The layout of the GitHub auto-comment is broken if the code coverage
section follows flaky test section from
https://github.com/neondatabase/neon/pull/4999
## Summary of changes
- `benchmarking.yml`: Rename `SCALE` to `TEST_OLAP_SCALE`
- `comment-test-report.js`: Add an extra new-line before Code coverage
section
## Problem
When the next release is coming, we want to let everyone know about it by
posting a message to the Slack channel with a list of commits.
## Summary of changes
- `.github/workflows/release-notify.yml` is added
- the workflow sends a message to
`vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID` (or
[#test-release-notifications](https://neondb.slack.com/archives/C05QQ9J1BRC)
if not configured)
- On each PR update, the workflow updates the list of commits in the
message (it doesn't send additional messages)
## Problem
We want to convert the `VirtualFile` APIs to async fn so that we can
adopt one of the async I/O solutions.
## Summary of changes
This PR is a follow-up of #5189, #5190, and #5195, and does the
following:
* Move the used `Write` trait functions of `VirtualFile` into inherent
functions
* Add optional buffering to `WriteBlobWriter`. The buffer is discarded
on drop, similarly to how tokio's `BufWriter` does it: drop is neither
async nor does it support errors.
* Remove the generics by `Write` impl of `WriteBlobWriter`, alwaays
using `VirtualFile`
* Rename `WriteBlobWriter` to `BlobWriter`
* Make various functions in the write path async, like
`VirtualFile::{write,write_all}`.
Part of #4743.
## Problem
`CI_ACCESS_TOKEN` has quite limited access (which is good), but this
doesn't allow it to remove labels from PRs (which is bad)
## Summary of changes
- Use `GITHUB_TOKEN` to remove labels
- Use `CI_ACCESS_TOKEN` to create PRs
## Problem
- #5050
Closes: https://github.com/neondatabase/neon/issues/5136
## Summary of changes
- A new configuration property `control_plane_api` controls other
functionality in this PR: if it is unset (default) then everything still
works as it does today.
- If `control_plane_api` is set, then on startup we call out to control
plane `/re-attach` endpoint to discover our attachments and their
generations. If an attachment is missing from the response we implicitly
detach the tenant.
- Calls to pageserver `/attach` API may include a `generation`
parameter. If `control_plane_api` is set, then this parameter is
mandatory.
- RemoteTimelineClient's loading of index_part.json is generation-aware,
and will try to load the index_part with the most recent generation <=
its own generation.
- The `neon_local` testing environment now includes a new binary
`attachment_service` which implements the endpoints that the pageserver
requires to operate. This is on by default if running `cargo neon` by
hand. In `test_runner/` tests, it is off by default: existing tests
continue to run with in the legacy generation-less mode.
Caveats:
- The re-attachment during startup assumes that we are only re-attaching
tenants that have previously been attached, and not totally new tenants
-- this relies on the control plane's attachment logic to keep retrying
so that we should eventually see the attach API call. That's important
because the `/re-attach` API doesn't tell us which timelines we should
attach -- we still use local disk state for that. Ref:
https://github.com/neondatabase/neon/issues/5173
- Testing: generations are only enabled for one integration test right
now (test_pageserver_restart), as a smoke test that all the machinery
basically works. Writing fuller tests that stress tenant migration will
come later, and involve extending our test fixtures to deal with
multiple pageservers.
- I'm not in love with "attachment_service" as a name for the neon_local
component, but it's not very important because we can easily rename
these test bits whenever we want.
- Limited observability when in re-attach on startup: when I add
generation validation for deletions in a later PR, I want to wrap up the
control plane API calls in some small client class that will expose
metrics for things like errors calling the control plane API, which will
act as a strong red signal that something is not right.
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
Pull Requests created by GitHub Actions bot doesn't have access to
secrets, so we need to use our bot for it to auto-trigger a tests run
See previous PRs #4663, #5210, #5212
## Summary of changes
- Use our bot to create PRs
## Problem
- Scrubber's `tidy` command requires presence of a control plane
- Scrubber has no tests at all
## Summary of changes
- Add re-usable async streams for reading metadata from a bucket
- Add a `scan-metadata` command that reads from those streams and calls
existing `checks.rs` code to validate metadata, then returns a summary
struct for the bucket. Command returns nonzero status if errors are
found.
- Add an `enable_scrub_on_exit()` function to NeonEnvBuilder so that
tests using remote storage can request to have the scrubber run after
they finish
- Enable remote storarge and scrub_on_exit in test_pageserver_restart
and test_pageserver_chaos
This is a "toe in the water" of the overall space of validating the
scrubber. Later, we should:
- Enable scrubbing at end of tests using remote storage by default
- Make the success condition stricter than "no errors": tests should
declare what tenants+timelines they expect to see in the bucket (or
sniff these from the functions tests use to create them) and we should
require that the scrubber reports on these particular tenants/timelines.
The `tidy` command is untouched in this PR, but it should be refactored
later to use similar async streaming interface instead of the current
batch-reading approach (the streams are faster with large buckets), and
to also be covered by some tests.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Conrad Ludgate <conrad@neon.tech>
## Problem
It's hard to find out which DB size we use for OLAP benchmarks (TPC-H in
particular).
This PR adds handling of `TEST_OLAP_SCALE` env var, which is get added
to a test name as a parameter.
This is required for performing larger periodic benchmarks.
## Summary of changes
- Handle `TEST_OLAP_SCALE` in
`test_runner/performance/test_perf_olap.py`
- Set `TEST_OLAP_SCALE` in `.github/workflows/benchmarking.yml` to a
TPC-H scale
Fixes#3830 by adding the `#[cfg(not(feature = "testing"))]` attribute
to unnecessary loggings in `pageserver/src/tenant/tasks.rs`.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
Continuation of #4663, #5210
We're still getting an error:
```
GraphQL: Resource not accessible by integration (removeLabelsFromLabelable)
```
## Summary of changes
- trigger `approved-for-ci-run.yml` workflow on `pull_request_target`
instead of `pull_request`
Fix issue where updating the size of the Local File Cache could lead to
invalid reads:
## Problem
LFC cache can get re-enabled when lfc_max_size is set, e.g. through an
autoscaler configuration, or PostgreSQL not liking us setting the
variable.
1. initialize: LFC enabled (lfc_size_limit > 0; lfc_desc = 0)
2. Open LFC file fails, lfc_desc = -1. lfc_size_limit is set to 0;
3. lfc_size_limit is updated by autoscaling to >0
4. read() now thinks LFC is enabled (size_limit > 0) and lfc_desc is
valid, but doesn't try to read from the invalid file handle and thus
doesn't update the buffer content with the page's data, but does think
the data was read...
Any buffer we try to read from local file cache is essentially
uninitialized memory. Those are likely 0-bytes, but might potentially be
any old buffer that was previously read from or flushed to disk.
Fix this by adding a more definitive disable flag, plus better invalid state handling.
## Problem
When a remote custom extension build fails, it looks a bit confusing on
neon CI:
- `trigger-custom-extensions-build` is green
- `wait-for-extensions-build` is red
- `build-and-upload-extensions` is red
But to restart the build (to get everything green), you need to restart
the only passed `trigger-custom-extensions-build`.
## Summary of changes
- Merge `trigger-custom-extensions-build` and
`wait-for-extensions-build` jobs into
`trigger-custom-extensions-build-and-wait`
## Problem
We want to convert the `VirtualFile` APIs to async fn so that we can
adopt one of the async I/O solutions.
## Summary of changes
Convert the following APIs of `VirtualFile` to async fn (as well as all
of the APIs calling it):
* `VirtualFile::seek`
* `VirtualFile::metadata`
* Also, prepare for deletion of the write impl by writing the summary to
a buffer before writing it to disk, as suggested in
https://github.com/neondatabase/neon/issues/4743#issuecomment-1700663864
. This change adds an additional warning for the case when the summary
exceeds a block. Previously, we'd have silently corrupted data in this
(unlikely) case.
* `WriteBlobWriter::write_blob`, in preparation for making
`VirtualFile::write_all` async.
A set of changes to enable neon to work in IPv6 environments. The
changes are backward-compatible but allow to deploy neon even to
IPv6-only environments:
- bind to both IPv4 and IPv6 interfaces
- allow connections to Postgres from IPv6 interface
- parse the address from control plane that could also be IPv6
## Problem
`VirtualFile` does both reading and writing, and it would be nice if
both could be converted to async, so that it doesn't have to support an
async read path and a blocking write path (especially for the locks this
is annoying as none of the lock implementations in std, tokio or
parking_lot have support for both async and blocking access).
## Summary of changes
This PR is some initial work on making the `VirtualFile` APIs async. It
can be reviewed commit-by-commit.
* Introduce the `MaybeVirtualFile` enum to be generic in a test that
compares real files with virtual files.
* Make various APIs of `VirtualFile` async, including `write_all_at`,
`read_at`, `read_exact_at`.
Part of #4743 , successor of #5180.
Co-authored-by: Christian Schwarz <me@cschwarz.com>
## Problem
The `VirtualFile::crashsafe_overwrite` function was introduced by #5186
but it was not turned `async fn` yet. We want to make these functions
async fn as part of #4743.
## Summary of changes
Make `VirtualFile::crashsafe_overwrite` async fn, as well as all the
functions calling it. Don't make anything inside `crashsafe_overwrite`
use async functionalities, as per #4743 instructions.
Also, add rustdoc to `crashsafe_overwrite`.
Part of #4743.
## Problem
If a pageserver crashes partway through deleting a tenant's directory,
it might leave a partial state that confuses a subsequent
startup/attach.
## Summary of changes
Rename tenant directory to a temporary path before deleting.
Timeline deletions already have deletion markers to provide safety.
In future, it would be nice to exploit this to send responses to detach
requests earlier: https://github.com/neondatabase/neon/issues/5183
(part of #4743)
(preliminary to #5180)
This PR adds a special-purpose API to `VirtualFile` for write-once
files.
It adopts it for `save_metadata` and `persist_tenant_conf`.
This is helpful for the asyncification efforts (#4743) and specifically
asyncification of `VirtualFile` because above two functions were the
only ones that needed the VirtualFile to be an `std::io::Write`.
(There was also `manifest.rs` that needed the `std::io::Write`, but, it
isn't used right now, and likely won't be used because we're taking a
different route for crash consistency, see #5172. So, let's remove it.
It'll be in Git history if we need to re-introduce it when picking up
the compaction work again; that's why it was introduced in the first
place).
We can't remove the `impl std::io::Write for VirtualFile` just yet
because of the `BufWriter` in
```rust
struct DeltaLayerWriterInner {
...
blob_writer: WriteBlobWriter<BufWriter<VirtualFile>>,
}
```
But, @arpad-m and I have a plan to get rid of that by extracting the
append-only-ness-on-top-of-VirtualFile that #4994 added to
`EphemeralFile` into an abstraction that can be re-used in the
`DeltaLayerWriterInner` and `ImageLayerWriterInner`.
That'll be another PR.
### Performance Impact
This PR adds more fsyncs compared to before because we fsync the parent
directory every time.
1. For `save_metadata`, the additional fsyncs are unnecessary because we
know that `metadata` fits into a kernel page, and hence the write won't
be torn on the way into the kernel. However, the `metadata` file in
general is going to lose signficance very soon (=> see #5172), and the
NVMes in prod can definitely handle the additional fsync. So, let's not
worry about it.
2. For `persist_tenant_conf`, which we don't check to fit into a single
kernel page, this PR makes it actually crash-consistent. Before, we
could crash while writing out the tenant conf, leaving a prefix of the
tenant conf on disk.
## Problem
Tests using remote storage have manually entered `test_name` parameters,
which:
- Are easy to accidentally duplicate when copying code to make a new
test
- Omit parameters, so don't actually create unique S3 buckets when
running many tests concurrently.
## Summary of changes
- Use the `request` fixture in neon_env_builder fixture to get the test
name, then munge that into an S3 compatible bucket name.
- Remove the explicit `test_name` parameters to enable_remote_storage
For
[#5086](https://github.com/neondatabase/neon/pull/5086#issuecomment-1701331777)
we will require remote storage to be configured in pageserver.
This PR enables `localfs`-based storage for all Rust unit tests.
Changes:
- In `TenantHarness`, set up localfs remote storage for the tenant.
- `create_test_timeline` should mimic what real timeline creation does,
and real timeline creation waits for the timeline to reach remote
storage. With this PR, `create_test_timeline` now does that as well.
- All the places that create the harness tenant twice need to shut down
the tenant before the re-create through a second call to `try_load` or
`load`.
- Without shutting down, upload tasks initiated by/through the first
incarnation of the harness tenant might still be ongoing when the second
incarnation of the harness tenant is `try_load`/`load`ed. That doesn't
make sense in the tests that do that, they generally try to set up a
scenario similar to pageserver stop & start.
- There was one test that recreates a timeline, not the tenant. For that
case, I needed to create a `Timeline::shutdown` method. It's a
refactoring of the existing `Tenant::shutdown` method.
- The remote_timeline_client tests previously set up their own
`GenericRemoteStorage` and `RemoteTimelineClient`. Now they re-use the
one that's pre-created by the TenantHarness. Some adjustments to the
assertions were needed because the assertions now need to account for
the initial image layer that's created by `create_test_timeline` to be
present.
This RFC describes a simple scheme to make layer map updates crash
consistent by leveraging the index_part.json in remote storage. Without
such a mechanism, crashes can induce certain edge cases in which broadly
held assumptions about system invariants don't hold.
The `remote_timeline_client` tests use `#[tokio::test]` and rely on the
fact that the test runtime that is set up by this macro is
single-threaded.
In PR https://github.com/neondatabase/neon/pull/5164, we observed
interesting flakiness with the `upload_scheduling` test case:
it would observe the upload of the third layer (`layer_file_name_3`)
before we did `wait_completion`.
Under the single-threaded-runtime assumption, that wouldn't be possible,
because the test code doesn't await inbetween scheduling the upload
and calling `wait_completion`.
However, RemoteTimelineClient was actually using `BACKGROUND_RUNTIME`.
That means there was parallelism where the tests didn't expect it,
leading to flakiness such as execution of an UploadOp task before
the test calls `wait_completion`.
The most confusing scenario is code like this:
```
schedule upload(A);
wait_completion.await; // B
schedule_upload(C);
wait_completion.await; // D
```
On a single-threaded executor, it is guaranteed that the upload up C
doesn't run before D, because we (the test) don't relinquish control
to the executor before D's `await` point.
However, RemoteTimelineClient actually scheduled onto the
BACKGROUND_RUNTIME, so, `A` could start running before `B` and
`C` could start running before `D`.
This would cause flaky tests when making assertions about the state
manipulated by the operations. The concrete issue that led to discover
of this bug was an assertion about `remote_fs_dir` state in #5164.
## Problem
Currently, the `deploy` job doesn't wait for the custom extension job
(in another repo) and can be started even with failed extensions build.
This PR adds another job that polls the status of the extension build job
and fails if the extension build fails.
## Summary of changes
- Add `wait-for-extensions-build` job, which waits for a custom
extension build in another repo.
## Problem
- https://github.com/neondatabase/neon/issues/5167
## Summary of changes
Accept "will not become active" log line with _either_ Broken or
Stopping state, because we may hit it while in the process of doing the
`/ignore` (earlier in the test than the test expects to see the same
line with Broken)
## Problem
The S3 scrubber currently lives at
https://github.com/neondatabase/s3-scrubber
We don't have tests that use it, and it has copies of some data
structures that can get stale.
## Summary of changes
- Import the s3-scrubber as `s3_scrubber/
- Replace copied_definitions/ in the scrubber with direct access to the
`utils` and `pageserver` crates
- Modify visibility of a few definitions in `pageserver` to allow the
scrubber to use them
- Update scrubber code for recent changes to `IndexPart`
- Update `KNOWN_VERSIONS` for IndexPart and move the definition into
index.rs so that it is easier to keep up to date
As a future refinement, it would be good to pull the remote persistence
types (like IndexPart) out of `pageserver` into a separate library so
that the scrubber doesn't have to link against the whole pageserver, and
so that it's clearer which types need to be public.
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
Test failing on a different ERROR log than it anticipated.
Closes: https://github.com/neondatabase/neon/issues/5148
## Summary of changes
Add the "could not flush frozen layer" error log to the permitted
errors.