## Problem
Part of https://github.com/neondatabase/neon/issues/9114
## Summary of changes
gc-compaction could take a long time in some cases, for example, if the
job split heuristics is wrong and we selected a too large region for
compaction that can't be finished within a reasonable amount of time. We
will give up such tasks and yield to L0 compaction. Each gc-compaction
sub-compaction job is atomic and cannot be split further so we have to
give up (instead of storing a state and continue later as in image
creation).
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
After splitting teams it became a bit more complicated for the PerfCorr
team to work on tests changes.
## Summary of changes
1. Add PerfCorr team co-owners for `.github/` folder
2. Add PerCorr team as owner for `test_runner/` folder
## Problem
In f37eeb56, I properly escaped the identifier, but I haven't noticed
that the resulting string is used in the `format('...')`, so it needs
additional escaping. Yet, after looking at it closer and with Heikki's
and Tristan's help, it appeared to be that it's a full can of worms and
we have problems all over the code in places where we use PL/pgSQL
blocks.
## Summary of changes
Add a new `pg_quote_dollar()` helper to deal with it, as dollar-quoting
of strings seems to be the only robust way to escape strings in dynamic
PL/pgSQL blocks. We mimic the Postgres' `pg_get_functiondef` logic here
[1].
While on it, I added more tests and caught a couple of more bugs with
string escaping:
1. `get_existing_dbs_async()` was wrapping `owner` in additional
double-quotes if it contained special characters
2. `construct_superuser_query()` was flawed in even more ways than the
rest of the code. It wasn't realistic to fix it quickly, but after
thinking about it more, I realized that we could drop most of it
altogether. IIUC, it was added as some sort of migration, probably back
when we haven't had migrations yet. So all the complicated code was
needed to properly update existing roles and DBs. In the current Neon,
this code only runs before we create the very first DB and role. When we
create roles and DBs, all `neon_superuser` grants are added in the
different places. So the worst thing that could happen is that there is
an ancient branch somewhere, so when users poke it, they will realize
that not all Neon features work as expected. Yet, the fix is simple and
self-serve -- just create a new role via UI or API, and it will get a
proper `neon_superuser` grant.
[1]:
8b49392b27/src/backend/utils/adt/ruleutils.c (L3153)Closesneondatabase/cloud#25048
## Problem
When periodic pagebench runs only once a day a lot of commits can be in
between a good run and a regression.
## Summary of changes
Run the workflow every 3 hours
## Problem
part of https://github.com/neondatabase/neon/issues/9114
## Summary of changes
* Add timers for each phase of the gc-compaction.
* Add a final ratio computation to directly show the garbage collection
ratio in the logs.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Original Slack discussion:
https://neondb.slack.com/archives/C04DGM6SMTM/p1739915430147169
TL;DR in Postgres, it's totally normal to have 'invalid' DBs (state
after the interrupted `DROP DATABASE`). Yet, some of our metrics
collected with `sql_exporter` try to get the size of such invalid DBs.
Typical log lines:
```
time=2025-03-05T16:30:32.368Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=pg_stats_userdb] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0"
time=2025-03-05T16:30:32.369Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=db_total_size] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0"
```
## Summary of changes
Ignore invalid DBs in these two metrics -- `pg_stats_userdb` and
`db_total_size`
## Problem
On unarchival, we update the previous heatmap with all visible layers.
When the primary generates a new heatmap it includes all those layers,
so the secondary will download them. Since they're not actually resident
on the primary (we didn't call the warm up API), they'll never be
evicted,
so they remain in the heatmap.
We want these layers in the heatmap, since we might wish to warm-up an
unarchived timeline after a shard migration. However, we don't want them
to be downloaded on the secondary until we've warmed up the primary.
## Summary of Changes
Include these layers in the heatmap and mark them as cold. All heatmap
operations act on non-cold layers apart from the attached location
warming up API,
which will download the cold layers. Once the cold layers are downloaded
on the primary,
they'll be included in the next heatmap as hot and the secondary starts
fetching them too.
## Problem
I like using `cargo stress` to hammer on a test, but it doesn't work out
of the box because it does parallel runs by default and tests always use
the same repo dir.
## Summary of changes
Add an uuid to the test repo dir when generating it.
## Problem
In a batch, `pageserver_layers_per_read_global` counts all layer visits
towards every read in the batch, since this directly affects the
observed latency of the read. However, this doesn't give a good picture
of the amortized read amplification due to batching.
## Summary of changes
Add two more global read amp metrics:
* `pageserver_layers_per_read_batch_global`: number of layers visited
per batch.
* `pageserver_layers_per_read_amortized_global`: number of layers
divided by reads in a batch.
* Remove callsite identifier registration on span creation. Forgot to
remove from last PR. Was part of alternative idea.
* Move "spans" object to right after "fields", so event and span fields
are listed together.
## Problem
For pg_regress test, we do both v1 and v2; for all the rest, we default
to v2.
part of https://github.com/neondatabase/neon/issues/9516
## Summary of changes
Use reldir v2 across test cases by default.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
This PR adds a new key to neon.neon_lfc_stats —
'file_cache_chunk_size_pages'. It just returns the value of
BLOCKS_PER_CHUNK from the LFC implementation.
The new value should (eventually) allow changing the chunk size without
breaking any places that rely on LFC stats values measured in number of
chunks.
See neondatabase/cloud#25170 for more.
## Problem
Our benchmarking workflows contain links to grafana dashboards to
troubleshoot problems.
This works fine for non-pooled endpoints.
For pooled endpoints we need to remove the `-pooler` suffix from the
endpoint's hostname to get a valid endpoint ID.
Example link that doesn't work in this run
https://github.com/neondatabase/neon/actions/runs/13678933253/job/38246028316#step:8:311
## Summary of changes
Check if connection string is a -pooler connection string and if so
remove this suffix from the endpoint ID.
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
## Problem
We realized that we may use this metric for more 'live' info about
extension installations vs. what we have with installed extensions
metric, which is only updated at start, atm.
## Summary of changes
Add `filename` label to `compute_ctl_remote_ext_requests_total`. Note
that it contains the raw archive name with `.tar.zst` at the end, so the
consumer may need to strip this suffix.
Closes https://github.com/neondatabase/cloud/issues/24694
Setup pgaudit and pgauditlogtofile extensions
in compute_ctl when the ComputeAuditLogLevel is
set to 'hipaa'.
See cloud PR https://github.com/neondatabase/cloud/pull/24568
Add rsyslog setup for compute_ctl.
Spin up a rsyslog server in the compute VM,
and configure it to send logs to the endpoint
specified in AUDIT_LOGGING_ENDPOINT env.
## Problem
part of https://github.com/neondatabase/neon/issues/9114
We used anyhow::Error everywhere and it's time to fix.
## Summary of changes
* Make sure that cancel errors are correctly propagated as
CompactionError::ShuttingDown.
* Skip all the trigger computation work if gc_cutoff is not generated
yet.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
If an image fails to push to dev registries, we shouldn't trigger the
deploy job, because that depends on images existing in dev registries.
To ensure this is the case, the deploy job needs to depend on pushing to
dev registries.
## Summary of changes
Make `deploy` depend on `push-neon-image-dev` and
`push-compute-image-dev`.
## Problem
`test_check_visibility_map` is the slowest test in CI, and can cause
timeouts under particularly slow configurations (`debug` and
`without-lfc`).
## Summary of changes
* Reduce the `pgbench` scale factor from 10 to 8.
* Omit a redundant vacuum during `pgbench` init.
* Remove a final `vacuum freeze` + `pg_check_visible` pass, which has
questionable value (we've already done a vacuum freeze previously, and
we don't flush the compute cache before checking anyway).
## Problem
On unarchival, we update the previous heatmap with all visible layers.
When the primary generates a new heatmap it includes all those layers,
so the secondary will download them. Since they're not actually resident
on the primary (we didn't call the warm up API), they'll never be
evicted, so they remain in the heatmap.
This leads to oversized secondary locations like we saw in pre-prod.
## Summary of changes
Gate the loading of the previous heatmaps and the heatmap generation on
unarchival behind configuration
flags. They are disabled by default, but enabled in tests.
## Problem
Each page of the slru segment is fetched individually when it's loaded
on demand.
## Summary of Changes
Use `Timeline::get_vectored` to fetch 16 at a time.
fixes https://github.com/neondatabase/cloud/issues/24292
Do not drop tablesync replication slots on the publisher,
when we're in the process of dropping subscriptions inherited by a neon
branch.
Because these slots are still needed by the parent branch subscriptions.
For regular slots we handle this by setting the slot_name to NONE
before calling DROP SUBSCRIPTION, but tablesync slots are not exposed to
SQL.
rely on GUC disable_logical_replication_subscribers=true
to know that we're in the Neon-specific process of dropping
subscriptions.
## Problem
We don't have any logging for Sentry initialization. This makes it hard
to verify that it has been configured correctly.
## Summary of changes
Log some basic info when Sentry has been initialized, but omit the
public key (which allows submitting events). Also log when `SENTRY_DSN`
isn't specified at all, and when it fails to initialize (which is
supposed to panic, but we may as well).
## Problem
Grafana Loki's JSON handling is somewhat limited and the log message
should be structured in a way that it's easy to sift through logs and
filter.
## Summary of changes
* Drop span_id. It's too short lived to be of value and only bloats the
logs.
* Use the span's name as the object key, but append a unique numeric
value to prevent name collisions.
* Extract interesting span fields into a separate object at the root.
New format:
```json
{
"timestamp": "2025-03-04T18:54:44.134435Z",
"level": "INFO",
"message": "connected to compute node at 127.0.0.1 (127.0.0.1:5432) latency=client: 22.002292ms, cplane: 0ns, compute: 5.338875ms, retry: 0ns",
"fields": {
"cold_start_info": "unknown"
},
"process_id": 56675,
"thread_id": 9122892,
"task_id": "24",
"target": "proxy::compute",
"src": "proxy/src/compute.rs:288",
"trace_id": "5eb89b840ec63fee5fc56cebd633e197",
"spans": {
"connect_request#1": {
"ep": "endpoint",
"role": "proxy",
"session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514",
"protocol": "tcp",
"conn_info": "127.0.0.1"
},
"connect_to_compute#6": {},
"connect_once#8": {
"compute_id": "compute",
"pid": "853"
}
},
"extract": {
"session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514"
}
}
```
## Problem
Our benchmarking workflow has a job step `bench`which runs all tests in
test_runner/performance/* except those that we want to run separately.
We recently added two test cases to that testcase directory that we want
to run separately but forgot to ignore them during the bench step. This
is now causing
[failures](https://github.com/neondatabase/neon/actions/runs/13667689340/job/38212087331#step:7:392).
## Summary of changes
Ignore the separately run tests in the bench step.
## Problem
New async prefetch introduces `prefetch+lookup[` function which is
called before LFC lookup to check if prefetch request is already
completed.
This function is not containing now check that response is actually
`T_NeonGetPageResponse` (and not error).
## Summary of changes
Add checks for response tag.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Periodic pagebench workflow runs periodically from latest main commit
and also allows to dispatch it manually for a given commit hash to
bi-sect regressions.
However in the dashboards we can not distinguish manual runs from
periodic runs which makes it harder to follow the trend.
## Summary of changes
Send an additional flag commit type to the benchmark runner instance to
distinguish the run type.
Note: this needs a follow-up PR on the receiving side.
The compute should only act if requests come from the control plane.
Signed-off-by: Tristan Partin <tristan@neon.tech>
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
On macOS, the `unused` lint complains about two variables not used in
`!linux` builds.
These were introduced in #11007.
## Summary of changes
Appease the linter by explicitly using the variables in `!linux`
branches.
## Problem
part of https://github.com/neondatabase/neon/issues/11067
My observation is that with the current value of settings, x86-v1
usually takes 30s, arm-v1 1m30s, x86-v2 1m, arm-v2 3m. But sometimes the
system could run too slow and cause test to timeout on arm with reldir
v2.
While I investigate what's going on and further improve the performance,
I'd like to set both of them to use the same test input, so that it
doesn't timeout and we don't abuse this test case as a performance test.
## Summary of changes
Use the same settings for both test cases.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
The code to generate symbolized pprof heap profiles and flamegraph SVGs
has been upstreamed to the `jemalloc_pprof` crate:
* https://github.com/polarsignals/rust-jemalloc-pprof/pull/22
* https://github.com/polarsignals/rust-jemalloc-pprof/pull/23
## Summary of changes
Use `jemalloc_pprof` to generate symbolized pprof heap profiles and
flamegraph SVGs.
This reintroduces a bunch of internal jemalloc stack frames that we'd
previously strip, e.g. each stack now always ends with
`prof_backtrace_impl` (where jemalloc takes a stack trace for heap
profiling), but that seems ok.
## Problem
Incoming requests often take the service lock, and sometimes even do
database transactions. That creates a risk that a rogue client can
starve the controller of the ability to do its primary job of
reconciling tenants to an available state.
## Summary of changes
* Use the `governor` crate to rate limit tenant requests at 10 requests
per second. This is ~10-100x lower than the worst "attack" we've seen
from a client bug. Admin APIs are not rate limited.
* Add a `storage_controller_http_request_rate_limited` histogram for
rate limited requests.
* Log a warning every 10 seconds for rate limited tenants.
The rate limiter is parametrized on TenantId, because the kinds of
client bug we're protecting against generally happen within tenant
scope, and the rates should be somewhat stable: we expect the global
rate of requests to increase as we do more work, but we do not expect
the rate of requests to one tenant to increase.
---------
Co-authored-by: John Spray <john@neon.tech>
## Problem
part of https://github.com/neondatabase/neon/issues/9516
## Summary of changes
Similar to the aux v2 migration, we persist the relv2 migration status
into index_part, so that even the config item is set to false, we will
still read from the v2 storage to avoid loss of data.
Note that only the two variants `None` and
`Some(RelSizeMigration::Migrating)` are used for now. We don't have full
migration implemented so it will never be set to
`RelSizeMigration::Migrated`.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
The "did not trigger" gets logged at 10k/minute in staging.
## Summary of changes
Change it to debug level.
Signed-off-by: Alex Chi Z <chi@neon.tech>
The interpreted reader tracks a record aligned current position in the
WAL stream.
Partial reads move the stream internally, but not from the pov of the
interpreted WAL reader.
Hence, where new shards subscribe with a start position that matches the
reader's current position,
but we've also done some partial reads. This confuses the gap tracking.
To make it more robust,
update the current batch start to the min between the new start position
and its current value.
Since no record has been decoded yet (position matches), we can't have
lost it
## Problem
When a build is made with sanitizers, this is not reflected in the
artifact name, which can lead to overriding normal builds with sanitized
ones.
## Summary of changes
Take this property of a build into account when constructing the
artifact name.
## Problem
Image layers may be nested inside in-memory layers as diagnosed
[here](https://github.com/neondatabase/neon/issues/10720#issuecomment-2649419252).
The read path doesn't support this and may skip over the image layer,
resulting in a failure to reconstruct the page.
## Summary of changes
We already support nesting of image layers inside delta layers. The
logic lives in `LayerMap::select_layer`.
The main goal of this PR is to propagate the candidate in-memory layer
down to that point and update
the selection logic.
Important changes are:
1. Support partial reads for the in-memory layer. Previously, we could
only specify the start LSN of the read.
We need to control the end LSN too.
2. `LayerMap::ranged_search` considers in-memory layers too. Previously,
the search for in-memory layers
was done explicitly in `Timeline::get_reconstruct_data_timeline`. Note
that `LayerMap::ranged_search` now returns
a weak readable layer which the `LayerManager` can upgrade. This dance
is such that we can unit test the layer selection logic.
3. Update `LayerMap::select_layer` to consider the candidate in-memory
layer too
Loosely related drive bys:
1. Remove the "keys not found" tracking in the ranged search. This
wasn't used anywhere and it just complicates things.
2. Remove the difficulty map stuff from the layer map. Again, not used
anywhere.
Closes https://github.com/neondatabase/neon/issues/9185
Closes https://github.com/neondatabase/neon/issues/10720
## Problem
If a caller times out on safekeeper timeline deletion on a large
timeline, and waits a while before retrying, the deletion will not
progress while the retry is waiting. The net effect is very very slow
deletion as it only proceeds in 30 second bursts across 5 minute idle
periods.
Related: https://github.com/neondatabase/neon/issues/10265
## Summary of changes
- Run remote deletion in a background task
- Carry a watch::Receiver on the Timeline for other callers to join the
wait
- Restart deletion if the API is called again and the previous attempt
failed
## Problem
We want to support larger tenants (regarding logical database size,
number of transactions per second etc.) and should increase our test
coverage of OLTP transactions at larger scale.
## Summary of changes
Start a new benchmark that over time will add more OLTP tests at larger
scale.
This PR covers the first version and will be extended in further PRs.
Also fix some infrastructure:
- default for new connections and large tenants is to use connection
pooler pgbouncer, however our fixture always added
`statement_timeout=120` which is not compatible with pooler
[see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter)
- action to create branch timed out after 10 seconds and 10 retries but
for large tenants it can take longer so use increasing back-off for
retries
## Test run
https://github.com/neondatabase/neon/actions/runs/13593446706
## Problem
Timeline shutdown during basebackup logs at error level because the the
canecellation error is smushed into BasebackupError::Server.
## Summary of changes
Introduce BasebackupError::Shutdown and use it. `log_query_error` will
now see `QueryError::Shutdown` and log at info level.
## Problem
Preparation for https://github.com/neondatabase/neon/issues/10851
## Summary of changes
Add walproposer `safekeepers_generations` field which can be set by
prefixing `neon.safekeepers` GUC with `g#n:`. Non zero value (n) forces
walproposer to use generations. In particular, this also disables
implicit timeline creation as timeline will be created by storcon. Add
test checking this. Also add missing infra: `--safekeepers-generation`
flag to neon_local endpoint start + fix `--start-timeout` flag: it
existed but value wasn't used.
## Problem
See https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339
smgrextend for FSM fork is called during page reconstruction by walredo
process causing overflow of inmem SMGR (64 pages).
## Summary of changes
Do not store zero pages in inmem SMGR because `inmem_read` returns zero
page if it is not able to locate specified block.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
We build compute-nodes as multi-arch images, but not the
vm-compute-nodes. The PR adds multiarch vm images the same way as in
autoscaling repo.
## Summary of changes
Add architecture to the matrix for vm compute build steps
Add merge job
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>