## Problem
Move from aws based arm64 runners to bare-metal based
## Summary of changes
Changes in GitHub action workflows where `runs-on: arm64` used. More
parallelism added, build time for `neon with extra platform builds`
workflow reduced from 45m to 25m
## Problem
This caused a variation of the stats bug fixed by
https://github.com/neondatabase/neon/pull/7662. That PR also fixed this
case, but we still shouldn't make redundant get calls.
## Summary of changes
- Only call get in the create image layers loop at the end of a range if
some keys have been accumulated
## Problem
There is no global per-ep rate limiter in proxy.
## Summary of changes
* Return global per-ep rate limiter back.
* Rename weak compute rate limiter (the cli flags were not used
anywhere, so it's safe to rename).
## Problem
https://github.com/neondatabase/neon/pull/7637 breaks forward compat
test.
On commit ea531d448e.
https://neon-github-public-dev.s3.amazonaws.com/reports/main/8988324349/index.html
```
test_create_snapshot
2024-05-07T16:03:11.331883Z INFO version: git-env:ea531d448eb65c4f58abb9ef7d8cd461952f7c5f failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 16:03:11.316131763 UTC build_tag: build_tag-env:5159
test_forward_compatibility
2024-05-07T16:07:02.310769Z INFO version: git-env:ea531d448eb65c4f58abb9ef7d8cd461952f7c5f failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 16:07:02.294676183 UTC build_tag: build_tag-env:5159
```
The forward compatibility test is actually using the same tag as the
current build.
The commit before that,
https://neon-github-public-dev.s3.amazonaws.com/reports/main/8988126011/index.html
```
test_create_snapshot
2024-05-07T15:47:21.900796Z INFO version: git-env:2dbd1c1ed5cd0458933e8ffd40a9c0a5f4d610b8 failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 15:47:21.882784185 UTC build_tag: build_tag-env:5158
test_forward_compatibility
2024-05-07T15:50:48.828733Z INFO version: git-env:c4d7d5982553d2cf66634d1fbf85d95ef44a6524 failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 15:50:48.816635176 UTC build_tag: build_tag-env:release-5434
```
This pull request patches the bin path so that the new neon_local will
use the old binary.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
This PR does two things:
First, it fixes a bug with tiered compaction's k-merge implementation.
It ignored the lsn of a key during ordering, so multiple updates of the
same key could be read in arbitrary order, say from different layers.
For example there is layers `[(a, 2),(b, 3)]` and `[(a, 1),(c, 2)]` in
the heap, they might return `(a,2)` and `(a,1)`.
Ultimately, this change wasn't enough to fix the ordering issues in
#7296, in other words there is likely still bugs in the k-merge. So as
the second thing, we switch away from the k-merge to an in-memory based
one, similar to #4839, but leave the code around to be improved and
maybe switched to later on.
Part of #7296
## Problem
Storage controller was using a zero layer count in SecondaryProgress as
a proxy for "not initialized". However, in tenants with zero timelines
(a legitimate state), the layer count remains zero forever.
This caused https://github.com/neondatabase/neon/pull/7583 to
destabilize the storage controller scale test, which creates lots of
tenants, some of which don't get any timelines.
## Summary of changes
- Use a None mtime instead of zero layer count to determine if a
SecondaryProgress should be ignored.
- Adjust the test to use a shorter heatmap upload period to let it
proceed faster while waiting for scheduling optimizations to complete.
Before this PR, using the AWS SDK profile feature for running against
minio didn't work because
* our SDK versions were too old and didn't include
https://github.com/awslabs/aws-sdk-rust/issues/1060 and
* we didn't massage the s3 client config builder correctly.
This PR
* udpates all the AWS SDKs we use to, respectively, the latest version I
could find on crates.io (Is there a better process?)
* changes the way remote_storage constructs the S3 client, and
* documents how to run the test suite against real S3 & local minio.
Regarding the changes to `remote_storage`: if one reads the SDK docs, it
is clear that the recommended way is to use `aws_config::from_env`, then
customize.
What we were doing instead is to use the `aws_sdk_s3` builder directly.
To get the `local-minio` in the added docs working, I needed to update
both the SDKs and make the changes to the `remote_storage`. See the
commit history in this PR for details.
Refs:
* byproduct: https://github.com/smithy-lang/smithy-rs/pull/3633
* follow-up on deprecation:
https://github.com/neondatabase/neon/issues/7665
* follow-up for scrubber S3 setup:
https://github.com/neondatabase/neon/issues/7667
For aux file keys (v1 or v2) the vectored read path does not return an
error when they're missing. Instead they are omitted from the resulting
btree (this is a requirement, not a bug). Skip updating the metric in
these cases to avoid infinite results.
## Problem
See #6714, #6967
## Summary of changes
Completely ignore page header when comparing VM pages.
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
- Rename "filename" types which no longer map directly to a filename
(LayerFileName -> LayerName)
- Add a -v1- part to local layer paths to smooth the path to future
updates (we anticipate a -v2- that uses checksums later)
- Rename methods that refer to the string-ized version of a LayerName to
no longer be called "filename"
- Refactor reconcile() function to use a LocalLayerFileMetadata type
that includes the local path, rather than carrying local path separately
in a tuple and unwrap()'ing it later.
The test utils should only be used during tests. Users should not be
able to create this extension on their own.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
We currently have no way to see what the current LSN of a compute its,
and in case of read replicas, we don't know what the difference in LSNs
is.
## Summary of changes
Adds these metrics
Fixes flaky test `test_gc_of_remote_layers`, which was failing because
of the `Nothing to GC` pageserver log.
I looked into the fails, it seems that backround `gc_loop` sometimes
started GC for initial tenant, which wasn't
configured to disable GC. The fix is to not create initial tenant with
enabled gc at all.
Fixes#7538
Before this PR, `neon_local` would store a copy of a subset of the
initial `pageserver.toml` in its `.neon/config`, e.g, `listen_pg_addr`.
That copy is represented as `struct PageServerConf`.
This copy was used to inform e.g., `neon_local endpoint` and other
commands that depend on Pageserver about which port to connect to.
The problem with that scheme is that the duplicated information in
`.neon/config` can get stale if `pageserver.toml` is changed.
This PR fixes that by eliminating populating `struct PageServerConf`
from the `pageserver.toml`s.
The `[[pageservers]]` TOML table in the `.neon/config` is obsolete.
As of this PR, `neon_local` will fail to start and print an error
informing about this change.
Code-level changes:
- Remove the `--pg-version` flag, it was only used for some checks
during `neon_local init`
- Remove the warn-but-continue behavior for when auth key creation fails
but auth keys are not required. It's just complexity that is unjustified
for a tool like `neon_local`.
- Introduce a type-system-level distinction between the runtime state
and the two (!) toml formats that are almost the same but not quite.
- runtime state: `struct PageServerConf`, now without `serde` derives
- toml format 1: the state in `.neon/config` => `struct OnDiskState`
- toml format 2: the `neon_local init --config TMPFILE` that, unlike
`struct OnDiskState`, allows specifying `pageservers`
- Remove `[[pageservers]]` from the `struct OnDiskState` and load the
data from the individual `pageserver.toml`s instead.
Improves the tiered compaction tests:
* Adds a new test that is a simpler version of the ignored
`test_many_updates_for_single_key` test.
* Reduces the amount of data that `test_many_updates_for_single_key`
processes to make it execute more quickly.
* Adds logging support.
Adds ordering asserts to the output of the delta key iterator
`MergeDeltaKeys` that implements a k-merge.
Part of #7296 : the asserts added by this PR get hit in the reproducers
of #7296 as well, but they are earlier in the pipeline.
## Problem
If a permit cannot be acquired to connect to compute, the cache is
invalidated. This had the observed affect of sending more traffic to
ProxyWakeCompute on cplane.
## Summary of changes
Make sure that permit acquire failures are marked as "should not
invalidate cache".
This does to `neon_local` what
https://github.com/neondatabase/aws/pull/1322 does to our production
deployment.
After both are merged, there are no users of `pageserver --init` /
`pageserver --config-override` left, and we can remove those flags
eventually.
Part of applying the changes from #7600. This piece *technically* can
change the semantics because now the context guard is held before
process_cli, but... the difference is likely quite small.
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
This commit is intentionally designed to have as small a diff as
possible. To that end, the basic idea is that each distinct "chunk" of
the previous main() has been wrapped in its own function, with the
return values from each function being passed directly into the next.
The structure of main() is now visible from its contents, which have a
handful of smaller functions.
There's a lot of other work that can / should(?) be done beyond this,
but I figure that's more opinionated, and this should be a solid start.
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
## Problem
Some HTTP client connections can stay open for quite a long time.
## Summary of changes
When there are too many HTTP client connections, pick a random
connection and gracefully cancel it.
## Problem
In https://github.com/neondatabase/neon/pull/7531, we would like to be
able to rewrite layers safely. One option is to make `Layer` able to
rewrite files in place safely (e.g. by blocking evictions/deletions for
an old Layer while a new one is created), but that's relatively fragile.
It's more robust in general if we simply never overwrite the same local
file: we can do that by putting the generation number in the filename.
## Summary of changes
- Add `local_layer_path` (counterpart to `remote_layer_path`) and
convert all locations that manually constructed a local layer path by
joining LayerFileName to timeline path
- In the layer upload path, construct remote paths with
`remote_layer_path` rather than trying to build them out of a local
path.
- During startup, carry the full path to layer files through
`init::reconcile`, and pass it into `Layer::for_resident`
- Add a test to make sure we handle upgrades properly.
- Comment out the generation part of `local_layer_path`, since we need
to maintain forward compatibility for one release. A tiny followup PR
will enable it afterwards.
We could make this a bit simpler if we bulk renamed existing layers on
startup instead of carrying literal paths through init, but that is
operationally risky on existing servers with millions of layer files. We
can always do a renaming change in future if it becomes annoying, but
for the moment it's kind of nice to have a structure that enables us to
change local path names again in future quite easily.
We should rename `LayerFileName` to `LayerName` or somesuch, to make it
more obvious that it's not a literal filename: this was already a bit
confusing where that type is used in remote paths. That will be a
followup, to avoid polluting this PR's diff.
This pull request adds the new basebackup read path + aux file write
path. In the regression test, all logical replication tests are run with
matrix aux_file_v2=false/true.
Also fixed the vectored get code path to correctly return missing key
error when being called from the unified sequential get code path.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Preceding PR https://github.com/neondatabase/neon/pull/7613 reduced the
usage of `--pageserver-config-override`.
This PR builds on top of that work and fully removes the `neon_local
--pageserver-config-override`.
Tests that need a non-default `pageserver.toml` control it using two
options:
1. Specify `NeonEnvBuilder.pageserver_config_override` before
`NeonEnvBuilder.init_start()`. This uses a new `neon_local init
--pageserver-config` flag.
2. After `init_start()`: `env.pageserver.stop()` +
`NeonPageserver.edit_config_toml()` + `env.pageserver.start()`
A few test cases were using
`env.pageserver.start(overrides=("--pageserver-config-override...",))`.
I changed them to use one of the options above.
Future Work
-----------
The `neon_local init --pageserver-config` flag still uses `pageserver
--config-override` under the hood. In the future, neon_local should just
write the `pageserver.toml` directly.
The `NeonEnvBuilder.pageserver_config_override` field should be renamed
to `pageserver_initial_config`. Let's save this churn for a separate
refactor commit.
RemoteTimelineClient has a lot of mandatory cloning. By using a single
way of creating IndexPart out of UploadQueueInitialized we can simplify
things and also avoid cloning the latest files for each
`index_part.json` upload (the contents will still be cloned).
The `test_forward_compatibility` test runs the old production binaries,
but is supposed to always run the latest neon_local binary.
I think commit 6acbee23 broke that by accident because in that commit,
`from_repo_dir` is introduced and runs an `init_start()` before the
`test_forward_compatibility` gets a chance to patch up the
neon_local_binpath.
## Problem
Ref https://neondb.slack.com/archives/C036U0GRMRB/p1688122168477729
## Summary of changes
- Add sha from postgres repo into postgres version string (via
`--with-extra-version`)
- Add a test that Postgres version matches the expected one
- Remove build-time hard check and allow only related tests to fail
## Problem
Timelines cannot be deleted if they have children. In many production
cases, a branch or a timeline has been created off the main branch for
various reasons to the effect of having now a "new main" branch. This
feature will make it possible to detach a timeline from its ancestor by
inheriting all of the data before the branchpoint to the detached
timeline and by also reparenting all of the ancestor's earlier branches
to the detached timeline.
## Summary of changes
- Earlier added copy_lsn_prefix functionality is used
- RemoteTimelineClient learns to adopt layers by copying them from
another timeline
- LayerManager adds support for adding adopted layers
-
`timeline::Timeline::{prepare_to_detach,complete_detaching}_from_ancestor`
and `timeline::detach_ancestor` are added
- HTTP PUT handler
Cc: #6994
Co-authored-by: Christian Schwarz <christian@neon.tech>
## Problem
After a shard split of a large existing tenant, child tenants can end up
with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.
This PR is followed by https://github.com/neondatabase/neon/pull/7531
Related issue: https://github.com/neondatabase/neon/issues/7504
## Summary of changes
- Add a new compaction phase `compact_shard_ancestors`, which identifies
layers that are no longer needed after a shard split.
- Add a Timeline->LayerMap code path called `rewrite_layers` , which is
currently only used to drop layers, but will later be used to rewrite
them as well in https://github.com/neondatabase/neon/pull/7531
- Add a new test that compacts after a split, and checks that something
is deleted.
Note that this doesn't have much impact on a tenant's resident size
(since unused layers would end up evicted anyway), but it:
- Makes index_part.json much smaller
- Makes the system easier to reason about: avoid having tenants which
are like "my physical size is 4TiB but don't worry I'll never actually
download it", instead have tenants report the real physical size of what
they might download.
Why do we remove these layers in compaction rather than during the
split? Because we have existing split tenants that need cleaning up. We
can add it to the split operation in future as an optimization.
## Problem
Usually, the connection itself is quite fast (bellow 10ms for p999:
https://neonprod.grafana.net/goto/aOyn8vYIg?orgId=1).
It doesn't make a lot of sense to wait for a lot of time for the lock,
if it takes a lot of time to acquire it, probably, something goes wrong.
We also spawn a lot of retries, but they are not super helpful (0 means
that it was connected successfully, 1, most probably, that it was
re-request of the compute node address
https://neonprod.grafana.net/goto/J_8VQvLIR?orgId=1). Let's try to keep
a small number of retries.
## Problem
There's allegedly a bug where if we connect a subscriber before WAL is
downloaded from the safekeeper, it creates an error.
## Summary of changes
Adds support for pausing safekeepers from sending WAL to computes, and
then creates a compute and attaches a subscriber while it's in this
paused state. Fails to reproduce the issue, but probably a good test to
have
---------
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Changes parameters to fix the flakiness of `test_ts_of_lsn_api`. Already
now, the amount of flakiness of the test is pretty low. With this, it's
even lower.
cc #5768