## Problem
When pooled connections are used, session semantic its not preserved,
including GUC settings.
Many customers have particular problem with setting search_path.
But pgbouncer 1.20 has `track_extra_parameters` settings which allows to
track parameters included in startup package which are reported by
Postgres. Postgres has [an official list of parameters that it reports
to the
client](https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC).
This PR makes Postgres also report `search_path` and so allows to
include it in `track_extra_parameters`.
## Summary of changes
Set GUC_REPORT flag for `search_path`.
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Pageserver exposes some vectored get related configs which are not in
use.
## Summary of changes
Remove the following pageserver configs: get_impl, get_vectored_impl,
and `validate_get_vectored`.
They are not used in the pageserver since
https://github.com/neondatabase/neon/pull/8601.
Manual overrides have been removed from the aws repo in
https://github.com/neondatabase/aws/pull/1664.
## Problem
This follows a PR that insists all input keys are representable in 16
bytes:
- https://github.com/neondatabase/neon/pull/8648
& a PR that prevents postgres from sending us keys that use the high
bits of field2:
- https://github.com/neondatabase/neon/pull/8657
Motivation for this change:
1. Ingest is bottlenecked on CPU
2. InMemoryLayer can create huge (~1M value) BTreeMap<Key,_> for its
index.
3. Maps over i128 are much faster than maps over an arbitrary 18 byte
struct.
It may still be worthwhile to make the index two-tier to optimize for
the case where only the last 4 bytes (blkno) of the key vary frequently,
but simply using the i128 representation of keys has a big impact for
very little effort.
Related: #8452
## Summary of changes
- Introduce `CompactKey` type which contains an i128
- Use this instead of Key in InMemoryLayer's index, converting back and
forth as needed.
## Performance
All the small-value `bench_ingest` cases show improved throughput.
The one that exercises this index most directly shows a 35% throughput
increase:
```
ingest-small-values/ingest 128MB/100b seq, no delta
time: [374.29 ms 378.56 ms 383.38 ms]
thrpt: [333.88 MiB/s 338.13 MiB/s 341.98 MiB/s]
change:
time: [-26.993% -26.117% -25.111%] (p = 0.00 < 0.05)
thrpt: [+33.531% +35.349% +36.974%]
Performance has improved.
```
## Problem
We use infrastructure as code (TF) to deploy AWS Aurora and AWS RDS
Postgres database clusters.
Whenever we have a change in TF (e.g. **every year** to upgrade to a
higher Postgres version or when we change the cluster configuration) TF
will apply the change and create a new AWS database cluster.
However our benchmarking testcase also expects databases in these
clusters and tables loaded with data.
So we add auto-detection - if the AWS RDS instances are "empty" we
create the necessary databases and restore a pg_dump.
**Important Notes:**
- These steps are NOT run in each benchmarking run, but only after a new
RDS instance has been deployed.
- the benchmarking workflows use GitHub secrets to find the connection
string for the database. These secrets still need to be (manually or
programmatically using git cli) updated if some port of the connection
string (e.g. user, password or hostname) changes.
## Summary of changes
In each benchmarking run check if
- database has already been created - if not create it
- database has already been restored - if not restore it
Supported databases
- tpch
- clickbench
- user example
Supported platforms:
- AWS RDS Postgres
- AWS Aurora serverless Postgres
Sample workflow run - but this one uses Neon database to test the
restore step and not real AWS databases
https://github.com/neondatabase/neon/actions/runs/10321441086/job/28574350581
Sample workflow run - with real AWS database clusters
https://github.com/neondatabase/neon/actions/runs/10346816389/job/28635997653
Verification in second run - with real AWS database clusters - that
second time the restore is skipped
https://github.com/neondatabase/neon/actions/runs/10348469517/job/28640778223
## Problem
Storage controller restarts cause temporary unavailability from the
control plane POV. See RFC for more details.
## Summary of changes
* A couple of small refactors of the storage controller start-up
sequence to make extending it easier.
* A leader table is added to track the storage controller instance
that's currently the leader (if any)
* A peer client is added such that storage controllers can send
`step_down` requests to each other (implemented in
https://github.com/neondatabase/neon/pull/8512).
* Implement the leader cut-over as described in the RFC
* Add `start-as-candidate` flag to the storage controller to gate the
rolling restart behaviour. When the flag is `false` (the default), the
only change from the current start-up sequence is persisting the leader
entry to the database.
It should give us all possible allowed_errors more consistently.
While getting the workflows to pass on
https://github.com/neondatabase/neon/pull/8632 it was noticed that
allowed_errors are rarely hit (1/4). This made me realize that we always
do an immediate stop by default. Doing a graceful shutdown would had
made the draining more apparent and likely we would not have needed the
#8632 hotfix.
Downside of doing this is that we will see more timeouts if tests are
randomly leaving pause failpoints which fail the shutdown.
The net outcome should however be positive, we could even detect too
slow shutdowns caused by a bug or deadlock.
## Problem
This test was disabled.
## Summary of changes
- Remove the skip marker.
- Explicitly avoid doing compaction & gc during checkpoints (the default
scale doesn't do anything here, but when experimeting with larger scales
it messes things up)
- Set a data size that gives a ~20s runtime on a Hetzner dev machine,
previous one gave very noisy results because it was so small
For reference on a Hetzner AX102:
```
------------------------------ Benchmark results -------------------------------
test_bulk_insert[neon-release-pg16].insert: 25.664 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB
test_bulk_insert[neon-release-pg16].peak_mem: 577 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 8
test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 25.373 s
test_bulk_insert[neon-release-pg16].compaction: 0.035 s
```
It should take syncrep flush_lsn into account because WAL before it on endpoint
restart is lost, which makes replication miss some data if slot had already been
advanced too far. This commit adds test reproducing the issue and bumps
vendor/postgres to commit with the actual fix.
## Problem
In several workflows, we have repeating code which is separated into
two steps:
```bash
mkdir -p $(pwd)/.docker-custom
echo DOCKER_CONFIG=/tmp/.docker-custom >> $GITHUB_ENV
...
rm -rf $(pwd)/.docker-custom
```
Such copy-paste is prone to errors; for example, in one case, instead of
`$(pwd)/.docker-custom`, we use `/tmp/.docker-custom`, which is shared
between workflows.
## Summary of changes
- Create a new action `actions/set-docker-config-dir`, which sets
`DOCKER_CONFIG` and deletes it in a Post action part
Noticed this while debugging a test failure in #8673 which only occurs
with real S3 instead of mock S3: if you authenticate to S3 via
`AWS_PROFILE`, then it requires the `HOME` env var to be set so that it
can read inside the `~/.aws` directory.
The scrubber abstraction `StorageScrubber::scrubber_cli` in
`neon_fixtures.py` would otherwise not work. My earlier PR #6556 has
done similar things for the `neon_local` wrapper.
You can try:
```
aws sso login --profile dev
export ENABLE_REAL_S3_REMOTE_STORAGE=y REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests REMOTE_STORAGE_S3_REGION=eu-central-1 AWS_PROFILE=dev
RUST_BACKTRACE=1 BUILD_TYPE=debug DEFAULT_PG_VERSION=16 ./scripts/pytest -vv --tb=short -k test_scrubber_tenant_snapshot
```
before and after this patch: this patch fixes it.
## Problem
This page had many dead links, and was confusing for folks looking for
documentation about our product.
Closes: https://github.com/neondatabase/neon/issues/8535
## Summary of changes
- Add a link to the product docs up top
- Remove dead/placeholder links
## Problem
We install and try to use `cachepot`. But it is not configured correctly
and doesn't work (after https://github.com/neondatabase/neon/pull/2290)
## Summary of changes
- Remove `cachepot`
## Problem
Migrations of tenant shards with cold secondaries are holding up drains
in during production deployments.
## Summary of changes
If a secondary locations is lagging by more than 256MiB (configurable,
but that's the default), then skip cutting it over to the secondary as part of the node drain.
## Problem
This type of error can happen during shutdown & was triggering a circuit
breaker alert.
## Summary of changes
- Map NotIntialized::Stopped to CompactionError::ShuttingDown, so that
we may handle it cleanly
## Problem
Azure login fails in `pin-build-tools-image` workflow because the job
doesn't have the required permissions.
```
Error: Please make sure to give write permissions to id-token in the workflow.
Error: Login failed with Error: Error message: Unable to get ACTIONS_ID_TOKEN_REQUEST_URL env variable. Double check if the 'auth-type' is correct. Refer to https://github.com/Azure/login#readme for more information.
```
## Summary of changes
- Add `id-token: write` permission to `pin-build-tools-image`
- Add an input to force image tagging
- Unify pushing to Docker Hub with other registries
- Split the job into two to have less if's
part of https://github.com/neondatabase/neon/issues/8653
Disable create tablespace stmt. It turns out it requires much less
effort to do the regress test mode flag than patching the test cases,
and given that we might need to support tablespaces in the future, I
decided to add a new flag `regress_test_mode` to change the behavior of
create tablespace.
Tested manually that without setting regress_test_mode, create
tablespace will be rejected.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
This reverts #8076 - which was already reverted from the release branch
since forever (it would have been a breaking change to release for all
users who currently set TimeZone options). It's causing conflicts now so
we should revert it here as well.
## Problem
Latency from one cloud provider to another one is higher than within the
same cloud provider.
Some of our benchmarks are latency sensitive - we run a pgbench or psql
in the github action runner and the system under test is running in Neon
(database project).
For realistic perf tps and latency results we need to compare apples to
apples and run the database client in the same "latency distance" for
all tests.
## Summary of changes
Move job steps that test Neon databases deployed on Azure into Azure
action runners.
- bench strategy variant using azure database
- pgvector strategy variant using azure database
- pgbench-compare strategy variants using azure database
## Test run
https://github.com/neondatabase/neon/actions/runs/10314848502
## Problem
We're adding more third party dependencies to support more diverse +
realistic test cases in `test_runner/logical_repl`. I ❤️ these
tests, they are a good thing.
The slight glitch is that python packaging is hard, and some third party
python packages have issues. For example the current kafka dependency
doesn't work on latest python. We can mitigate that by only importing
these more specialized dependencies in the tests that use them.
## Summary of changes
- Move the `kafka` import into a test body, so that folks running the
regular `test_runner/regress` tests don't have to have a working kafka
client package.
## Problem
This code was to mitigate risk in
https://github.com/neondatabase/neon/pull/8427
As expected, we did not hit this code path - the new continuous updates
of gc_info are working fine, we can remove this code now.
## Summary of changes
- Remove block that double-checks retain_lsns
avoid "leaking" the completions of BackgroundPurges by:
1. switching it to TaskTracker for provided close+wait
2. stop using tokio::fs::remove_dir_all which will consume two units of
memory instead of one blocking task
Additionally, use more graceful shutdown in tests which do actually some
background cleanup.
## Problem
See
https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD
Logical replication subscription by default use `synchronous_commit=off`
which cause problems with safekeeper
## Summary of changes
Set `synchronous_commit=on` for logical replication subscription in
test_subscriber_restart.py
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
## Problem
Some developers build on MacOS, which doesn't have io_uring.
## Summary of changes
- Add `io_engine_for_bench`, which on linux will give io_uring or panic
if it's unavailable, and on MacOS will always panic.
We do not want to run such benchmarks with StdFs: the results aren't
interesting, and will actively waste the time of any developers who
start investigating performance before they realize they're using a
known-slow I/O backend.
Why not just conditionally compile this benchmark on linux only? Because
even on linux, I still want it to refuse to run if it can't get
io_uring.
Part of #8130, [RFC: Direct IO For Pageserver](https://github.com/neondatabase/neon/blob/problame/direct-io-rfc/docs/rfcs/034-direct-io-for-pageserver.md)
## Description
Add pageserver config for evaluating/enabling direct I/O.
- Disabled: current default, uses buffered io as is.
- Evaluate: still uses buffered io, but could do alignment checking and
perf simulation (pad latency by direct io RW to a fake file).
- Enabled: uses direct io, behavior on alignment error is configurable.
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
We've noticed increased memory usage with the latest release. Drain the
joinset of `page_service` connection handlers to avoid leaking them
until shutdown. An alternative would be to use a TaskTracker.
TaskTracker was not discussed in original PR #8339 review, so not hot
fixing it in here either.
Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered
list of reparented. Turns out we did not need it at all. Replace it with
an unordered hashset. Additionally refactor the reparented direct
children query out, it will later be used from more places.
Split off from #8430.
Cc: #6994
Ephemeral files cleanup on drop but did not delay shutdown, leading to
problems with restarting the tenant. The solution is as proposed:
- make ephemeral files carry the gate guard to delay `Timeline::gate`
closing
- flush in-memory layers and strong references to those on
`Timeline::shutdown`
The above are realized by making LayerManager an `enum` with `Open` and
`Closed` variants, and fail requests to modify `LayerMap`.
Additionally:
- fix too eager anyhow conversions in compaction
- unify how we freeze layers and handle errors
- optimize likely_resident_layers to read LayerFileManager hashmap
values instead of bouncing through LayerMap
Fixes: #7830
## Problem
1. Hard to correlate startup parameters with the endpoint that provided
them.
2. Some configurations are not needed in the `ProxyConfig` struct.
## Summary of changes
Because of some borrow checker fun, I needed to switch to an
interior-mutability implementation of our `RequestMonitoring` context
system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock
for such a use-case (needed to be thread safe).
Removed the lock of each startup message, instead just logging only the
startup params in a successful handshake.
Also removed from values from `ProxyConfig` and kept as arguments.
(needed for local-proxy config)
Timeline cancellation running in parallel with gc yields error log lines
like:
```
Gc failed 1 times, retrying in 2s: TimelineCancelled
```
They are completely harmless though and normal to occur. Therefore, only
print those messages at an info level. Still print them at all so that
we know what is going on if we focus on a single timeline.
Part of #8128.
## Problem
Currently, scrubber `scan_metadata` command will return with an error
code if the metadata on remote storage is corrupted with fatal errors.
To safely deploy this command in a cronjob, we want to differentiate
between failures while running scrubber command and the erroneous
metadata. At the same time, we also want our regression tests to catch
corrupted metadata using the scrubber command.
## Summary of changes
- Return with error code only when the scrubber command fails
- Uses explicit checks on errors and warnings to determine metadata
health in regression tests.
**Resolve conflict with `tenant-snapshot` command (after shard split):**
[`test_scrubber_tenant_snapshot`](https://github.com/neondatabase/neon/blob/yuchen/scrubber-scan-cleanup-before-prod/test_runner/regress/test_storage_scrubber.py#L23)
failed before applying 422a8443dd
- When taking a snapshot, the old `index_part.json` in the unsharded
tenant directory is not kept.
- The current `list_timeline_blobs` implementation consider no
`index_part.json` as a parse error.
- During the scan, we are only analyzing shards with highest shard
count, so we will not get a parse error. but we do need to add the
layers to tenant object listing, otherwise we will get index is
referencing a layer that is not in remote storage error.
- **Action:** Add s3_layers from `list_timeline_blobs` regardless of
parsing error
Signed-off-by: Yuchen Liang <yuchen@neon.tech>