## Problem
The storage scrubber was reporting warnings for lots of timelines like:
```
WARN Missed some shards at count ShardCount(0) tenant_id=25eb7a83d9a2f90ac0b765b6ca84cf4c
```
These were spurious: these tenants are fine. There was a bug in
accumulating the ShardIndex for each tenant, whereby multiple timelines
would lead us to add the same ShardIndex more than one.
Closes: #8646
## Summary of changes
- Accumulate ShardIndex in a BTreeSet instead of a Vec
- Extend the test to reproduce the issue
## Problem
See https://github.com/neondatabase/neon/issues/8499
## Summary of changes
Save HEAP_COMBOCID flag in WAL and do not clear it in redo handlers.
Related Postgres PRs:
https://github.com/neondatabase/postgres/pull/457https://github.com/neondatabase/postgres/pull/458https://github.com/neondatabase/postgres/pull/459
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
With the persistent gc blocking, we can now retry reparenting timelines
which had failed for whatever reason on the previous attempt(s).
Restructure the detach_ancestor into three phases:
- prepare (insert persistent gc blocking, copy lsn prefix, layers)
- detach and reparent
- reparenting can fail, so we might need to retry this portion
- complete (remove persistent gc blocking)
Cc: #6994
After #8655 we've had a few issues (mostly tracked on #8708) with the
graceful shutdown. In order to shutdown more of the processes and catch
more errors, for example, from all pageservers, do an immediate shutdown
for those nodes which fail the initial (possibly graceful) shutdown.
Cc: #6485
## Problem
We use a set of **Neon** reuse databases in benchmarking.yml which are
still using pg14.
Because we want to compare apples to apples and have migrated the AWS
reuse clusters to pg16 we should also use pg16 for Neon.
## Summary of changes
- Automatically restore the test databases for Neon project
A few of the benchmarks have started failing after #8655 where they are
waiting for compactor task. Reads done by image layer creation should
already be cancellation sensitive because vectored get does a check each
time, but try sprinkling additional cancellation points to:
- each partition
- after each vectored read batch
It seems that some benchmarks are failing because they are simply not
stopping to ingest wal on shutdown. It might mean that the tests were
never ran on a stable pageserver situation and WAL has always been left
to be ingested on safekeepers, but let's see if this silences the
failures and "stops the bleeding".
Cc: https://github.com/neondatabase/neon/issues/8712
## Problem
We want to mark new PRs and issues created by external users
## Summary of changes
- Add a new workflow which adds `external` label for issues and PRs
created by external users
## Problem
When the utilization API was added, it was just a stub with disk space
information.
Disk space information isn't a very good metric for assigning tenants to
pageservers, because pageservers making full use of their disks would
always just have 85% utilization, irrespective of how much pressure they
had for disk space.
## Summary of changes
- Use the new layer visibiilty metric to calculate a "wanted size" per
tenant, and sum these to get a total local disk space wanted per
pageserver. This acts as the primary signal for utilization.
- Also use the shard count to calculate a utilization score, and take
the max of this and the disk-driven utilization. The shard count limit
is currently set as a constant 20,000, which matches contemporary
operational practices when loading pageservers.
The shard count limit means that for tiny/empty tenants, on a machine
with 3.84TB disk, each tiny tenant influences the utilization score as
if it had size 160MB.
## Problem
When pooled connections are used, session semantic its not preserved,
including GUC settings.
Many customers have particular problem with setting search_path.
But pgbouncer 1.20 has `track_extra_parameters` settings which allows to
track parameters included in startup package which are reported by
Postgres. Postgres has [an official list of parameters that it reports
to the
client](https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC).
This PR makes Postgres also report `search_path` and so allows to
include it in `track_extra_parameters`.
## Summary of changes
Set GUC_REPORT flag for `search_path`.
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Pageserver exposes some vectored get related configs which are not in
use.
## Summary of changes
Remove the following pageserver configs: get_impl, get_vectored_impl,
and `validate_get_vectored`.
They are not used in the pageserver since
https://github.com/neondatabase/neon/pull/8601.
Manual overrides have been removed from the aws repo in
https://github.com/neondatabase/aws/pull/1664.
## Problem
This follows a PR that insists all input keys are representable in 16
bytes:
- https://github.com/neondatabase/neon/pull/8648
& a PR that prevents postgres from sending us keys that use the high
bits of field2:
- https://github.com/neondatabase/neon/pull/8657
Motivation for this change:
1. Ingest is bottlenecked on CPU
2. InMemoryLayer can create huge (~1M value) BTreeMap<Key,_> for its
index.
3. Maps over i128 are much faster than maps over an arbitrary 18 byte
struct.
It may still be worthwhile to make the index two-tier to optimize for
the case where only the last 4 bytes (blkno) of the key vary frequently,
but simply using the i128 representation of keys has a big impact for
very little effort.
Related: #8452
## Summary of changes
- Introduce `CompactKey` type which contains an i128
- Use this instead of Key in InMemoryLayer's index, converting back and
forth as needed.
## Performance
All the small-value `bench_ingest` cases show improved throughput.
The one that exercises this index most directly shows a 35% throughput
increase:
```
ingest-small-values/ingest 128MB/100b seq, no delta
time: [374.29 ms 378.56 ms 383.38 ms]
thrpt: [333.88 MiB/s 338.13 MiB/s 341.98 MiB/s]
change:
time: [-26.993% -26.117% -25.111%] (p = 0.00 < 0.05)
thrpt: [+33.531% +35.349% +36.974%]
Performance has improved.
```
## Problem
We use infrastructure as code (TF) to deploy AWS Aurora and AWS RDS
Postgres database clusters.
Whenever we have a change in TF (e.g. **every year** to upgrade to a
higher Postgres version or when we change the cluster configuration) TF
will apply the change and create a new AWS database cluster.
However our benchmarking testcase also expects databases in these
clusters and tables loaded with data.
So we add auto-detection - if the AWS RDS instances are "empty" we
create the necessary databases and restore a pg_dump.
**Important Notes:**
- These steps are NOT run in each benchmarking run, but only after a new
RDS instance has been deployed.
- the benchmarking workflows use GitHub secrets to find the connection
string for the database. These secrets still need to be (manually or
programmatically using git cli) updated if some port of the connection
string (e.g. user, password or hostname) changes.
## Summary of changes
In each benchmarking run check if
- database has already been created - if not create it
- database has already been restored - if not restore it
Supported databases
- tpch
- clickbench
- user example
Supported platforms:
- AWS RDS Postgres
- AWS Aurora serverless Postgres
Sample workflow run - but this one uses Neon database to test the
restore step and not real AWS databases
https://github.com/neondatabase/neon/actions/runs/10321441086/job/28574350581
Sample workflow run - with real AWS database clusters
https://github.com/neondatabase/neon/actions/runs/10346816389/job/28635997653
Verification in second run - with real AWS database clusters - that
second time the restore is skipped
https://github.com/neondatabase/neon/actions/runs/10348469517/job/28640778223
## Problem
Storage controller restarts cause temporary unavailability from the
control plane POV. See RFC for more details.
## Summary of changes
* A couple of small refactors of the storage controller start-up
sequence to make extending it easier.
* A leader table is added to track the storage controller instance
that's currently the leader (if any)
* A peer client is added such that storage controllers can send
`step_down` requests to each other (implemented in
https://github.com/neondatabase/neon/pull/8512).
* Implement the leader cut-over as described in the RFC
* Add `start-as-candidate` flag to the storage controller to gate the
rolling restart behaviour. When the flag is `false` (the default), the
only change from the current start-up sequence is persisting the leader
entry to the database.
It should give us all possible allowed_errors more consistently.
While getting the workflows to pass on
https://github.com/neondatabase/neon/pull/8632 it was noticed that
allowed_errors are rarely hit (1/4). This made me realize that we always
do an immediate stop by default. Doing a graceful shutdown would had
made the draining more apparent and likely we would not have needed the
#8632 hotfix.
Downside of doing this is that we will see more timeouts if tests are
randomly leaving pause failpoints which fail the shutdown.
The net outcome should however be positive, we could even detect too
slow shutdowns caused by a bug or deadlock.
## Problem
This test was disabled.
## Summary of changes
- Remove the skip marker.
- Explicitly avoid doing compaction & gc during checkpoints (the default
scale doesn't do anything here, but when experimeting with larger scales
it messes things up)
- Set a data size that gives a ~20s runtime on a Hetzner dev machine,
previous one gave very noisy results because it was so small
For reference on a Hetzner AX102:
```
------------------------------ Benchmark results -------------------------------
test_bulk_insert[neon-release-pg16].insert: 25.664 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB
test_bulk_insert[neon-release-pg16].peak_mem: 577 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 8
test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 25.373 s
test_bulk_insert[neon-release-pg16].compaction: 0.035 s
```
It should take syncrep flush_lsn into account because WAL before it on endpoint
restart is lost, which makes replication miss some data if slot had already been
advanced too far. This commit adds test reproducing the issue and bumps
vendor/postgres to commit with the actual fix.
## Problem
In several workflows, we have repeating code which is separated into
two steps:
```bash
mkdir -p $(pwd)/.docker-custom
echo DOCKER_CONFIG=/tmp/.docker-custom >> $GITHUB_ENV
...
rm -rf $(pwd)/.docker-custom
```
Such copy-paste is prone to errors; for example, in one case, instead of
`$(pwd)/.docker-custom`, we use `/tmp/.docker-custom`, which is shared
between workflows.
## Summary of changes
- Create a new action `actions/set-docker-config-dir`, which sets
`DOCKER_CONFIG` and deletes it in a Post action part
Noticed this while debugging a test failure in #8673 which only occurs
with real S3 instead of mock S3: if you authenticate to S3 via
`AWS_PROFILE`, then it requires the `HOME` env var to be set so that it
can read inside the `~/.aws` directory.
The scrubber abstraction `StorageScrubber::scrubber_cli` in
`neon_fixtures.py` would otherwise not work. My earlier PR #6556 has
done similar things for the `neon_local` wrapper.
You can try:
```
aws sso login --profile dev
export ENABLE_REAL_S3_REMOTE_STORAGE=y REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests REMOTE_STORAGE_S3_REGION=eu-central-1 AWS_PROFILE=dev
RUST_BACKTRACE=1 BUILD_TYPE=debug DEFAULT_PG_VERSION=16 ./scripts/pytest -vv --tb=short -k test_scrubber_tenant_snapshot
```
before and after this patch: this patch fixes it.
## Problem
This page had many dead links, and was confusing for folks looking for
documentation about our product.
Closes: https://github.com/neondatabase/neon/issues/8535
## Summary of changes
- Add a link to the product docs up top
- Remove dead/placeholder links
## Problem
We install and try to use `cachepot`. But it is not configured correctly
and doesn't work (after https://github.com/neondatabase/neon/pull/2290)
## Summary of changes
- Remove `cachepot`
## Problem
Migrations of tenant shards with cold secondaries are holding up drains
in during production deployments.
## Summary of changes
If a secondary locations is lagging by more than 256MiB (configurable,
but that's the default), then skip cutting it over to the secondary as part of the node drain.
## Problem
This type of error can happen during shutdown & was triggering a circuit
breaker alert.
## Summary of changes
- Map NotIntialized::Stopped to CompactionError::ShuttingDown, so that
we may handle it cleanly
## Problem
Azure login fails in `pin-build-tools-image` workflow because the job
doesn't have the required permissions.
```
Error: Please make sure to give write permissions to id-token in the workflow.
Error: Login failed with Error: Error message: Unable to get ACTIONS_ID_TOKEN_REQUEST_URL env variable. Double check if the 'auth-type' is correct. Refer to https://github.com/Azure/login#readme for more information.
```
## Summary of changes
- Add `id-token: write` permission to `pin-build-tools-image`
- Add an input to force image tagging
- Unify pushing to Docker Hub with other registries
- Split the job into two to have less if's
part of https://github.com/neondatabase/neon/issues/8653
Disable create tablespace stmt. It turns out it requires much less
effort to do the regress test mode flag than patching the test cases,
and given that we might need to support tablespaces in the future, I
decided to add a new flag `regress_test_mode` to change the behavior of
create tablespace.
Tested manually that without setting regress_test_mode, create
tablespace will be rejected.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
This reverts #8076 - which was already reverted from the release branch
since forever (it would have been a breaking change to release for all
users who currently set TimeZone options). It's causing conflicts now so
we should revert it here as well.
## Problem
Latency from one cloud provider to another one is higher than within the
same cloud provider.
Some of our benchmarks are latency sensitive - we run a pgbench or psql
in the github action runner and the system under test is running in Neon
(database project).
For realistic perf tps and latency results we need to compare apples to
apples and run the database client in the same "latency distance" for
all tests.
## Summary of changes
Move job steps that test Neon databases deployed on Azure into Azure
action runners.
- bench strategy variant using azure database
- pgvector strategy variant using azure database
- pgbench-compare strategy variants using azure database
## Test run
https://github.com/neondatabase/neon/actions/runs/10314848502
## Problem
We're adding more third party dependencies to support more diverse +
realistic test cases in `test_runner/logical_repl`. I ❤️ these
tests, they are a good thing.
The slight glitch is that python packaging is hard, and some third party
python packages have issues. For example the current kafka dependency
doesn't work on latest python. We can mitigate that by only importing
these more specialized dependencies in the tests that use them.
## Summary of changes
- Move the `kafka` import into a test body, so that folks running the
regular `test_runner/regress` tests don't have to have a working kafka
client package.
## Problem
This code was to mitigate risk in
https://github.com/neondatabase/neon/pull/8427
As expected, we did not hit this code path - the new continuous updates
of gc_info are working fine, we can remove this code now.
## Summary of changes
- Remove block that double-checks retain_lsns
avoid "leaking" the completions of BackgroundPurges by:
1. switching it to TaskTracker for provided close+wait
2. stop using tokio::fs::remove_dir_all which will consume two units of
memory instead of one blocking task
Additionally, use more graceful shutdown in tests which do actually some
background cleanup.
## Problem
See
https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD
Logical replication subscription by default use `synchronous_commit=off`
which cause problems with safekeeper
## Summary of changes
Set `synchronous_commit=on` for logical replication subscription in
test_subscriber_restart.py
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>