## Problem
Storage controller uses unsecure http for pageserver API.
Closes: https://github.com/neondatabase/cloud/issues/23734
Closes: https://github.com/neondatabase/cloud/issues/24091
## Summary of changes
- Add an optional `listen_https_port` field to storage controller's Node
state and its API (RegisterNode/ListNodes/etc).
- Allow updating `listen_https_port` on node registration to gradually
add https port for all nodes.
- Add `use_https_pageserver_api` CLI option to storage controller to
enable https.
- Pageserver doesn't support https for now and always reports
`https_port=None`. This will be addressed in follow-up PR.
ALTER SUBSCRIPTION requires AccessExclusive lock
which conflicts with iteration over pg_subscription when multiple
databases are present
and operations are applied concurrently.
Fix by explicitly locking pg_subscription
in the beginning of the transaction in each database.
## Problem
https://github.com/neondatabase/cloud/issues/24292
Fix an issue caused by PR
https://github.com/neondatabase/neon/pull/10891: we introduced the
concept of timeouts for heartbeats, where we would hang up on the other
side of the oneshot channel if a timeout happened (future gets
cancelled, receiver is dropped).
This hang up would make the heartbeat task panic when it did obtain the
response, as we unwrap the result of the result sending operation. The
panic would lead to the heartbeat task panicing itself, which is then
according to logs the last sign of life we of that process invocation.
I'm not sure what brings down the process, in theory tokio [should
continue](https://docs.rs/tokio/latest/tokio/runtime/enum.UnhandledPanic.html#variant.Ignore),
but idk.
Alternative to #10901.
## Problem
Background heatmap uploads and downloads were blocking the ones done
manually by the test.
## Summary of changes
Disable Background heatmap uploads and downloads for the cold migration
test. The test does
them explicitly.
## Problem
Pinning build tools still replicated the ACR/ECR/Docker Hub login and
pushing, even though we have a reusable workflow for this. Was mentioned
as a TODO in https://github.com/neondatabase/neon/pull/10613.
## Summary of changes
Reuse `_push-to-container-registry.yml` for pinning the build-tools
images.
`pprof::symbolize()` used a regex to strip the Rust monomorphization
suffix from generic methods. However, the `backtrace` crate can do this
itself if formatted with the `:#` flag.
Also tighten up the code a bit.
This PR does the following things:
* The initial heartbeat round blocks the storage controller from
becoming online again. If all safekeepers are unresponsive, this can
cause storage controller startup to be very slow. The original intent of
#10583 was that heartbeats don't affect normal functionality of the
storage controller. So add a short timeout to prevent it from impeding
storcon functionality.
* Fix the URL of the utilization endpoint.
* Don't send heartbeats to safekeepers which are decomissioned.
Part of https://github.com/neondatabase/neon/issues/9011
context: https://neondb.slack.com/archives/C033RQ5SPDH/p1739966807592589
## Problem
A simpler version of https://github.com/neondatabase/neon/pull/10812
## Summary of changes
Image layer creation will be preempted by L0 accumulated on other
timelines. We stop image layer generation if there's a pending L0
compaction request.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Adds CPU/heap profiling for storcon.
Also fixes allowlists to match on the path only, since profiling
endpoints take query parameters.
Requires #10892 for heap profiling.
## Problem
We'd like to enable CPU/heap profiling for storcon. This requires
jemalloc.
## Summary of changes
Use jemalloc as the global allocator, and enable heap sampling for
profiling.
## Problem
We've seen the previous default of 50 cause OOMs. Compacting many L0
layers at once now has limited benefit, since the cost is mostly linear
anyway. This is already being reduced to 20 in production settings.
## Summary of changes
Reduce `DEFAULT_COMPACTION_UPPER_LIMIT` to 20.
Once released, let's remove the config overrides.
## Problem
Our AWS account IDs are copy-pasted all over the place. A wrong paste
might only be caught late if we hardcode them, but will get flagged
instantly by actionlint if we access them from github actions variables.
Resolves https://github.com/neondatabase/neon/issues/10787, follow-up
for https://github.com/neondatabase/neon/pull/10613.
## Summary of changes
Access AWS account IDs using Github Actions variables.
## Problem
Autosplits are crucial for bulk ingest performance. However, autosplits
were only attempted when there was no other pending work. This could
cause e.g. mass AZ affinity violations following Pageserver restarts to
starve out autosplits for hours.
Resolves#10762.
## Summary of changes
Always attempt autosplits in the background reconciliation loop,
regardless of other pending work.
## Problem
we measure ingest performance for a few variants (stripe-sizes,
pre-sharded, shard-splitted).
However some phenomena (e.g. related to L0 compaction) in PS can be
better observed and optimized with un-sharded tenants.
## Summary of changes
- Allow to create projects with a policy that disables sharding
(`{"scheduling": "Essential"}`)
- add a variant to ingest_benchmark that uses that policy for the new
project
## Test run
https://github.com/neondatabase/neon/actions/runs/13396325970
## Problem
The nightly test discovered problems in the extensions upgrade test.
1. `PLv8` has different versions on PGv17 and PGv16 and a different test
set, which was not implemented correctly
[sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930271)
2. The same for `semver`
[sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930017)
3. `pgtap` interfered with the other tests, e.g. tables, created by
other extensions caused the tests to fail.
## Summary of changes
The discovered problems were fixed.
1. The tests list for `PLv8` is now generated using the original
Makefile
2. The patches for `semver` are now split for PGv16 and PGv17.
3. `pgtap` is being tested in a separate database now.
---------
Co-authored-by: Mikhail Kot <mikhail@neon.tech>
## Problem
Read errors during repartition should be a critical error.
## Summary of changes
<del>We only have one call site</del> We have two call sites of
`repartition` where one of them is during the initial image upload
optimization and another is during image layer creation, so I added a
`critical!` here instead of inside `collect_keyspace`.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
The usual workflow for me to debug read path errors in staging is:
download the tenant to my laptop, import, and then run some read tests.
With this patch, we can do this directly over staging pageservers.
## Summary of changes
* Add a new `touchpagelsn` API that does a page read but does not return
page info back.
* Allow read from latest record LSN from get/touchpagelsn
* Add read_debug config in the context.
* The read path will read the context config to decide whether to enable
read path tracing or not.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
We log image compaction stats even when no image compaction happened.
This is logged every 10 seconds for every timeline.
## Summary of changes
Only log when we actually performed any image compaction.
## Problem
If the deploy job on the release branch doesn't succeed, the preprod
deployment will not have happened. It was requested that this triggers a
notification in https://github.com/neondatabase/neon/issues/10662.
## Summary of changes
If we're on the release branch and the deploy job doesn't end up in
"success", notify storage oncall on slack.
## Problem
We lack an API for warming up attached locations based on the heatmap
contents.
This is problematic in two places:
1. If we manually migrate and cut over while the secondary is still cold
2. When we re-attach a previously offloaded tenant
## Summary of changes
https://github.com/neondatabase/neon/pull/10597 made heatmap generation
additive
across migrations, so we won't clobber it a after a cold migration. This
allows us to implement:
1. An endpoint for downloading all missing heatmap layers on the
pageserver:
`/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`.
Only one such operation per timeline is allowed at any given time. The
granularity is tenant shard.
2. An endpoint to the storage controller to trigger the downloads on the
pageserver:
`/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`.
This works both at
tenant and tenant shard level. If an unsharded tenant id is provided,
the operation is started on
all shards, otherwise only the specified shard.
3. A storcon cli command. Again, tenant and tenant-shard level
granularities are supported.
Cplane will call into storcon and trigger the downloads for all shards.
When we want to rescue a migration, we will use storcon cli targeting
the specific tenant shard.
Related: https://github.com/neondatabase/neon/issues/10541
## Problem
ref https://github.com/neondatabase/neon/issues/10517
## Summary of changes
For some reasons the job split algorithm decides to have different image
coverage range for two compactions before/after restart. So we remove
the subcompaction key range and let it generate an image covering the
full range, which should make the test more stable.
Also slightly tuned the logging span.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
This reverts commit 443c8d0b4b.
## Problem
We observe a massive amount of compaction errors.
## Summary of changes
If the tenant did not write any L1 layers (i.e., they accumulate L0
layers where number of them is below L0 threshold), image creation will
always fail. Therefore, it's not correct to simply use the
disk_consistent_lsn or L0/L1 boundary for the image creation.
Preparations for a successor of #10440:
* move `pull_timeline` to `safekeeper_api` and add it to
`SafekeeperClient`. we want to do `pull_timeline` on any creations that
we couldn't do initially.
* Add a `SafekeeperGeneration` type instead of relying on a type alias.
we want to maintain a safekeeper specific generation number now in the
storcon database. A separate type is important to make it impossible to
mix it up with the tenant's pageserver specific generation number. We
absolutely want to avoid that for correctness reasons. If someone mixes
up a safekeeper and pageserver id (both use the `NodeId` type), that's
bad but there is no wrong generations flying around.
part of #9011
## Problem
`discard all` cannot run in a transaction (even if implicit)
## Summary of changes
Split up the query into two, we don't need transaction support.
## Problem
Tests with mixed versions of binaries always pick up new versions if
services are started using `neon_local`.
## Summary of changes
- Set `neon_local_binpath` along with `neon_binpath` and
`pg_distrib_dir` for tests with mixed versions
This replaces the use of the awscli utility. awscli binary is massive,
it added about 200 MB to the docker image size, while the s3 client was
already a dependency so using that is essentially free, as far as binary
size is concerned.
I implemented a simple upload function that tries to keep 10 uploads
going in parallel. I believe that's the default behavior of the "aws s3
sync" command too.
These generated Postgres settings JSON files can get out of sync causing
the control plane to reject updated to an endpoint or project's Postgres
settings.
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
The current `pageserver_layers_per_read` histogram buckets don't
represent the current reality very well. For the percentiles we care
about (e.g. p50 and p99), we often see fairly high read amp, especially
during ingestion, and anything below 4 can be considered very good.
## Summary of changes
Change the per-timeline read amp histogram buckets to `[4.0, 8.0, 16.0,
32.0, 64.0, 128.0, 256.0]`.
## Problem
Test fails when comparing the first WAL segment because the system id in
the segment header is different. The system id is not consistently set
correctly since segments are usually inited on the safekeeper sync step
with sysid 0.
## Summary of Chnages
Compare timeline digests instead. This skips the header.
Closes https://github.com/neondatabase/neon/issues/10596
I was looking into
https://github.com/neondatabase/serverless/issues/144, I recall previous
cases where proxy would trigger these prepared statements which would
conflict with other statements prepared by our client downstream.
Because of that, and also to aid in debugging, I've made sure all
prepared statements that proxy needs to make have specific names that
likely won't conflict and makes it clear in a error log if it's our
statements that are causing issues
Instead of hardcoding the request timeout, let's make it configurable as
a PGC_SUSET GUC.
Additionally, add a connect timeout GUC. Although the extension server
runs on the compute, it is always best to keep operations from hanging.
Better to present a timeout error to the user than a stuck backend.
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
In #10707 some new fields were introduced in TimelineInfo.
I forgot that we do not only use TimelineInfo for encoding, but also
decoding when the storage controller calls into a pageserver, so this
broke some calls from controller to pageserver while in a mixed-version
state.
## Summary of changes
- Make new fields have default behavior so that they are optional
Originally I wanted to switch back to the `neon` branch before merging
#10825, but I forgot to do it. Do it in a separate PR now.
No actual change of the source code, only changes the branch name (so
that maybe in a few weeks we can delete the temporary branch
`arpad/neon-rebase`).
## Problem
`SmgrOpFlushInProgress::measure()` takes a `socket_fd` argument which is
only used on Linux. This causes linter warnings on macOS.
Touches #10823.
## Summary of changes
Add a noop use of `socket_fd` on non-Linux branch.
## Problem
See https://github.com/neondatabase/neon/issues/10839
rho(x,b) functions returns values in range [1,b+1] and addSHLL tries to
store it in array of size b+1.
## Summary of changes
Subtract 1 fro value returned by rho
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>