## Problem
#7371
## Summary of changes
* The VirtualFile::open, open_with_options, and create methods use
AsRef, similar to the standard library's std::fs APIs.
## Problem
proxy params being a `HashMap<String,String>` when it contains just
```
application_name: psql
database: neondb
user: neondb_owner
```
is quite wasteful allocation wise.
## Summary of changes
Keep the params in the wire protocol form, eg:
```
application_name\0psql\0database\0neondb\0user\0neondb_owner\0
```
Using a linear search for the map is fast enough at small sizes, which
is the normal case.
## Problem
We were rate limiting wake_compute in the wrong place
## Summary of changes
Move wake_compute rate limit to after the permit is acquired. Also makes
a slight refactor on normalize, as it caught my eye
## Problem
We use ubuntu-latest as a default OS for running jobs. It can cause
problems due to instability, so we should use the LTS version of Ubuntu.
## Summary of changes
The image ubuntu-latest was changed with ubuntu-22.04 in workflows.
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
## Problem
See https://github.com/neondatabase/cloud/issues/10845
## Summary of changes
Do not report error if GIN page is not restored
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Computes that are healthy can manage many connection attempts at a time.
Unhealthy computes cannot. We initially handled this with a fixed
concurrency limit, but it seems this inhibits pgbench.
## Summary of changes
Support AIMD for connect_to_compute lock to allow varying the
concurrency limit based on compute health
Get rid of postgres-native-tls and openssl in favour of rustls in our
dependency tree.
Do further steps to completely remove native-tls and openssl.
Among other advantages, this allows us to do static musl builds more
easily: #7889
## Problem
In 4ce6e2d2fc we added a warning when progress stats don't look right at
the end of a secondary download pass.
This `Correcting drift in progress stats` warning fired in staging on a
pageserver that had been doing some disk usage eviction.
The impact is low because in the same place we log the warning, we also
fix up the progress values.
## Summary of changes
- When we skip downloading a layer because it was recently evicted,
update the progress stats to ensure they still reach a clean complete
state at the end of a download pass.
- Also add a log for evicting secondary location layers, for symmetry
with attached locations, so that we can clearly see when eviction has
happened for a particular tenant's layers when investigating issues.
This is a point fix -- the code would also benefit from being refactored
so that there is some "download result" type with a Skip variant, to
ensure that we are updating the progress stats uniformly for those
cases.
## Problem
We want to regularly verify the performance of pgvector HNSW parallel
index builds and parallel similarity search using HNSW indexes.
The first release that considerably improved the index-build parallelism
was pgvector 0.7.0 and we want to make sure that we do not regress by
our neon compute VM settings (swap, memory over commit, pg conf etc.)
## Summary of changes
Prepare a Neon project with 1 million openAI vector embeddings (vector
size 1536).
Run HNSW indexing operations in the regression test for the various
distance metrics.
Run similarity queries using pgbench with 100 concurrent clients.
I have also added the relevant metrics to the grafana dashboards pgbench
and olape
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
## Problem
After [0e4f182680] which introduce async
connect
Neon is not able to connect to page server.
## Summary of changes
Perform sync commit at MacOS/X
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Do pull_timeline while WAL is being removed. To this end
- extract pausable_failpoint to utils, sprinkle pull_timeline with it
- add 'checkpoint' sk http endpoint to force WAL removal.
After fixing checking for pull file status code test fails so far which is
expected.
## Problem
Seems the websocket buffering was broken for large query responses only
## Summary of changes
Move buffering until after the underlying stream is ready.
Tested locally confirms this fixes the bug.
Also fixes the pg-sni-router missing metrics bug
## Problem
A title for automatic proxy release PRs is `Proxy release`, and for
storage & compute, it's just `Release`
## Summary of changes
- Amend PR title for Storage & Compute releases to "Storage & Compute
release"
## Problem
Ongoing hunt for secondary location shutdown hang issues.
## Summary of changes
- Revert the functional changes from #7675
- Tweak a log in secondary downloads to make it more apparent when we
drop out on cancellation
- Modify DownloadStream's behavior to always return an Err after it has
been cancelled. This _should_ not impact anything, but it makes the
behavior simpler to reason about (e.g. even if the poll function somehow
got called again, it could never end up in an un-cancellable state)
Related #https://github.com/neondatabase/cloud/issues/13576
## Problem
- After a shard split of a large existing tenant, child tenants can end
up with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.
This PR follows https://github.com/neondatabase/neon/pull/7531, and adds
rewriting of layers that contain a mixture of needed & un-needed
contents, in addition to dropping un-needed layers.
Closes: https://github.com/neondatabase/neon/issues/7504
## Summary of changes
- Add methods to ImageLayer for reading back existing layers
- Extend `compact_shard_ancestors` to rewrite layer files that contain a
mixture of keys that we want and keys we do not, if unwanted keys are
the majority of those in the file.
- Amend initialization code to handle multiple layers with the same
LayerName properly
- Get rid of of renaming bad layer files to `.old` since that's now
expected on restarts during rewrites.
## Problem
One database is too limiting. We have agreed to raise this limit to 10.
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
* Make PS connection startup use async APIs
This allows for improved query cancellation when we start connections
* Make PS connections have per-shard connection retry state.
Previously they shared global backoff state, which is bad for quickly
getting all connections started and/or back online.
* Make sure we clean up most connection state on failed connections.
Previously, we could technically leak some resources that we'd otherwise
clean up. Now, the resources are correctly cleaned up.
* pagestore_smgr.c now PANICs on unexpected response message types.
Unexpected responses are likely a symptom of having a desynchronized
view of the connection state. As a desynchronized connection state can
cause corruption, we PANIC, as we don't know what data may have been
written to buffers: the only solution is to fail fast & hope we didn't
write wrong data.
* Catch errors in sync pagestream request handling.
Previously, if a query was cancelled after a message was sent to
the pageserver, but before the data was received, the backend
could forget that it sent the synchronous request, and let others
deal with the repercussions. This could then lead to incorrect
responses, or errors such as "unexpected response from page
server with tag 0x68"
## Problem
## Summary of changes
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
Once upon a time, we used to have duplicated types for runtime IndexPart
and whatever we stored. Because of the serde fixes in #5335 we have no
need for duplicated IndexPart type anymore, but the `IndexLayerMetadata`
stayed.
- remove the type
- remove LayerFileMetadata::file_size() in favor of direct field access
Split off from #7833. Cc: #3072.
* Reduce the logging level for create image layers of metadata keys.
(question: is it possible to adjust logging levels at runtime?)
* Do a info logging of image layers only after the layer is created. Now
there are a lot of cases where we create the image layer writer but then
discarding that image layer because it does not contain any key.
Therefore, I changed the new image layer logging to trace, and create
image layer logging to info.
Signed-off-by: Alex Chi Z <chi@neon.tech>
Reduces duplication between tiered and legacy compaction by using the
`Timeline::create_image_layer_for_rel_blocks` function. This way, we
also use vectored get in tiered compaction, so the change has two
benefits in one.
fixes#7659
---------
Co-authored-by: Alex Chi Z. <iskyzh@gmail.com>
The list timeline API gives something like
`"wal_source_connstr":"PgConnectionConfig { host:
Domain(\"safekeeper-5.us-east-2.aws.neon.build\"), port: 6500, password:
Some(REDACTED-STRING) }"`, which is weird. This pull request makes it
somehow like a connection string. This field is not used at least in the
neon database, so I assume no one is reading or parsing it.
Signed-off-by: Alex Chi Z <chi@neon.tech>
We'd like to get some bits reserved in the length field of image layers
for future usage (compression). This PR bases on the assumption that we
don't have any blobs that require more than 28 bits (3 bytes + 4 bits)
to store the length, but as a preparation, before erroring, we want to
first emit warnings as if the assumption is wrong, such warnings are less
disruptive than errors.
A metric would be even less disruptive (log messages are more slow, if
we have a LOT of such large blobs then it would take a lot of time to
print them). At the same time, likely such 256 MiB blobs will occupy an
entire layer file, as they are larger than our target size. For layer
files we already log something, so there shouldn't be a large increase
in overhead.
Part of #5431
## Problem
By default, pgvector compiles with `-march=native` on some platforms for
best performance. However, this can lead to `Illegal instruction` errors
if trying to run the compiled extension on a different machine.
I had this problem when trying to run the Neon compute docker image on
MacOS with Apple Silicon with Rosetta.
see
ff9b22977e/README.md (L1021)
## Summary of changes
Pass OPTFLAGS="" to make.
I looked at the metrics from
https://github.com/neondatabase/neon/pull/7768 on staging and it seems
that manager does too many iterations. This is probably caused by
background job `remove_wal.rs` which iterates over all timelines and
tries to remove WAL and persist control file. This causes shared state
updates and wakes up the manager. The fix is to skip notifying about the
updates if nothing was updated.
## Problem
We've seen some strange behaviors when doing lots of migrations
involving secondary locations. One of these was where a tenant was
apparently stuck in the `Scheduler::running` list, but didn't appear to
be making any progress. Another was a shutdown hang
(https://github.com/neondatabase/cloud/issues/13576).
## Summary of changes
- Fix one issue (probably not the only one) where a tenant in the
`pending` list could proceed to `spawn` even if the same tenant already
had a running task via `handle_command` (this could have resulted in a
weird value of SecondaryProgress)
- Add various extra logging:
- log before as well as after layer downloads so that it would be
obvious if we were stuck in remote storage code (we shouldn't be, it has
built in timeouts)
- log the number of running + pending jobs from the scheduler every time
it wakes up to do a scheduling iteration (~10s) -- this is quite chatty,
but not compared with the volume of logs on a busy pageserver. It should
give us confidence that the scheduler loop is still alive, and
visibility of how many tasks the scheduler thinks are running.
## Problem
This test relied on some sleeps, and was failing ~5% of the time.
## Summary of changes
Use log-watching rather than straight waits, and make timeouts more
generous for the CI environment.
## Problem
Despite making password hashing async, it can still take time away from
the network code.
## Summary of changes
Introduce a custom threadpool, inspired by rayon. Features:
### Fairness
Each task is tagged with it's endpoint ID. The more times we have seen
the endpoint, the more likely we are to skip the task if it comes up in
the queue. This is using a min-count-sketch estimator for the number of
times we have seen the endpoint, resetting it every 1000+ steps.
Since tasks are immediately rescheduled if they do not complete, the
worker could get stuck in a "always work available loop". To combat
this, we check the global queue every 61 steps to ensure all tasks
quickly get a worker assigned to them.
### Balanced
Using crossbeam_deque, like rayon does, we have workstealing out of the
box. I've tested it a fair amount and it seems to balance the workload
accordingly
## Problem
If an existing user already has some aux v1 files, we don't want to
switch them to the global tenant-level config.
Part of #7462
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Once all the computes in production have restarted, we can remove
protocol version 1 altogether.
See issue #6211.
This was done earlier already in commit 0115fe6cb2, but reverted before
it was released to production in commit bbe730d7ca because of issue
https://github.com/neondatabase/neon/issues/7692. That issue was fixed
in commit 22afaea6e1, so we are ready to change the default again.
Don't set last-written LSN of a page when the record is replayed, only
when the page is evicted from cache. For comparison, we don't update
the last-written LSN on every page modification on the primary either,
only when the page is evicted. Do update the last-written LSN when the
page update is skipped in WAL redo, however.
In neon_get_request_lsns(), don't be surprised if the last-written LSN
is equal to the record being replayed. Use the LSN of the record being
replayed as the request LSN in that case. Add a long comment
explaining how that can happen.
In neon_wallog_page, update last-written LSN also when Shutdown has
been requested. We might still fetch and evict pages for a while,
after shutdown has been requested, so we better continue to do that
correctly.
Enable the check that we don't evict a page with zero LSN also in
standby, but make it a LOG message instead of PANIC
Fixes issue https://github.com/neondatabase/neon/issues/7791
the gate was accidentially being dropped before the final blocking
phase, possibly explaining the resident physical size global problems
during deletions.
it could had caused more harm as well, but the path is not actively
being tested because cplane no longer puts locationconfigs with higher
generation number during normal operation which prompted the last wave
of fixes.
Cc: #7341.
## Problem
Safekeeper Timeline uses a channel for cancellation, but we have a
dedicated type for that.
## Summary of changes
- Use CancellationToken in Timeline
## Problem
We don't build our docker images for ARM arch, and that makes it harder
to run images on ARM (on MacBooks with Apple Silicon, for example).
## Summary of changes
- Build `neondatabase/neon` for ARM and create a multi-arch image
- Build `neondatabase/compute-node-vXX` for ARM and create a multi-arch
image
- Run `test-images` job on ARM as well
## Problem
Currently, `latest` tag is added to the images in several cases:
```
github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy'
```
This leads to a race; the `latest` tag jumps back and forth depending on
the branch that has built images.
## Summary of changes
- Do not push `latest` images to prod ECR (we don't use it)
- Use `docker buildx imagetools` instead of `crane` for tagging images
- Unify `vm-compute-node-image` job with others and use dockerhub as a
first source for images (sync images with ECR)
- Tag images with `latest` only for commits in `main`