Commit Graph

5313 Commits

Author SHA1 Message Date
Sasha Krassovsky
ea2e830707 Remove apostrophe (#7868)
## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-05-23 20:35:59 +00:00
Joonas Koivunen
7cf726e36e refactor(rtc): remove the duplicate IndexLayerMetadata (#7860)
Once upon a time, we used to have duplicated types for runtime IndexPart
and whatever we stored. Because of the serde fixes in #5335 we have no
need for duplicated IndexPart type anymore, but the `IndexLayerMetadata`
stayed.

- remove the type
- remove LayerFileMetadata::file_size() in favor of direct field access

Split off from #7833. Cc: #3072.
2024-05-23 23:24:31 +03:00
Alex Chi Z
6b3164269c chore(pageserver): reduce logging related to image layers (#7864)
* Reduce the logging level for create image layers of metadata keys.
(question: is it possible to adjust logging levels at runtime?)
* Do a info logging of image layers only after the layer is created. Now
there are a lot of cases where we create the image layer writer but then
discarding that image layer because it does not contain any key.
Therefore, I changed the new image layer logging to trace, and create
image layer logging to info.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-23 15:30:43 +00:00
Arpad Müller
75a52ac7fd Use Timeline::create_image_layer_for_rel_blocks in tiered compaction (#7850)
Reduces duplication between tiered and legacy compaction by using the
`Timeline::create_image_layer_for_rel_blocks` function. This way, we
also use vectored get in tiered compaction, so the change has two
benefits in one.

fixes #7659

---------

Co-authored-by: Alex Chi Z. <iskyzh@gmail.com>
2024-05-23 15:10:24 +00:00
Alex Chi Z
e28e46f20b fix(pageserver): make wal connstr a connstr (#7846)
The list timeline API gives something like
`"wal_source_connstr":"PgConnectionConfig { host:
Domain(\"safekeeper-5.us-east-2.aws.neon.build\"), port: 6500, password:
Some(REDACTED-STRING) }"`, which is weird. This pull request makes it
somehow like a connection string. This field is not used at least in the
neon database, so I assume no one is reading or parsing it.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-23 09:45:29 -04:00
Arpad Müller
d5d15eb6eb Warn if a blob in an image is larger than 256 MiB (#7852)
We'd like to get some bits reserved in the length field of image layers
for future usage (compression). This PR bases on the assumption that we
don't have any blobs that require more than 28 bits (3 bytes + 4 bits)
to store the length, but as a preparation, before erroring, we want to
first emit warnings as if the assumption is wrong, such warnings are less
disruptive than errors.

A metric would be even less disruptive (log messages are more slow, if
we have a LOT of such large blobs then it would take a lot of time to
print them). At the same time, likely such 256 MiB blobs will occupy an
entire layer file, as they are larger than our target size. For layer
files we already log something, so there shouldn't be a large increase
in overhead.

Part of #5431
2024-05-23 14:28:05 +02:00
Joonas Koivunen
49d7f9b5a4 test_import_from_pageserver_small: try to make less flaky (#7843)
With #7828 and proper fullbackup testing the test became flaky
([evidence]).

- produce better assertion messages in `assert_pageserver_backups_equal`
- use read only endpoint to confirm the row count

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7839/9192447962/index.html#suites/89cfa994d71769e01e3fc4f475a1f3fa/49009214d0f8b8ce
2024-05-23 14:44:08 +03:00
Peter Bendel
95a49f0075 remove march=native from pgvector Makefile's OPTFLAGS (#7854)
## Problem

By default, pgvector compiles with `-march=native` on some platforms for
best performance. However, this can lead to `Illegal instruction` errors
if trying to run the compiled extension on a different machine.

I had this problem when trying to run the Neon compute docker image on
MacOS with Apple Silicon with Rosetta.

see
ff9b22977e/README.md (L1021)

## Summary of changes

Pass OPTFLAGS="" to make.
2024-05-23 10:08:06 +00:00
John Spray
545f7e8cd7 tests: fix an allow list entry (#7856)
https://github.com/neondatabase/neon/pull/7844 typo'd one of the
expressions:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9196993886/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/e420fbfdb193bf80/
2024-05-23 10:50:21 +01:00
Anna Khanova
cd6d811213 [proxy] Do not fail after parquet upload error (#7858)
## Problem

If the parquet upload was unsuccessful, it will panic.

## Summary of changes

Write error in logs instead.
2024-05-23 09:41:29 +00:00
Arthur Petukhovsky
8f3c316bae Skip unnecessary shared state updates in safekeepers (#7851)
I looked at the metrics from
https://github.com/neondatabase/neon/pull/7768 on staging and it seems
that manager does too many iterations. This is probably caused by
background job `remove_wal.rs` which iterates over all timelines and
tries to remove WAL and persist control file. This causes shared state
updates and wakes up the manager. The fix is to skip notifying about the
updates if nothing was updated.
2024-05-23 09:45:24 +01:00
Joonas Koivunen
58e31fe098 test_attach_tenant_config: add allowed error (#7839)
[evidence] of quite rare flaky. the detach can cause this with the right
timing.

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7650/9191613501/index.html#suites/7745dadbd815ab87f5798aa881796f47/2190222925001078
2024-05-23 11:25:38 +03:00
John Spray
a43a1ad1df pageserver: fix API-driven secondary downloads possibly colliding with background downloads (#7848)
## Problem

We've seen some strange behaviors when doing lots of migrations
involving secondary locations. One of these was where a tenant was
apparently stuck in the `Scheduler::running` list, but didn't appear to
be making any progress. Another was a shutdown hang
(https://github.com/neondatabase/cloud/issues/13576).

## Summary of changes

- Fix one issue (probably not the only one) where a tenant in the
`pending` list could proceed to `spawn` even if the same tenant already
had a running task via `handle_command` (this could have resulted in a
weird value of SecondaryProgress)
- Add various extra logging:
- log before as well as after layer downloads so that it would be
obvious if we were stuck in remote storage code (we shouldn't be, it has
built in timeouts)
- log the number of running + pending jobs from the scheduler every time
it wakes up to do a scheduling iteration (~10s) -- this is quite chatty,
but not compared with the volume of logs on a busy pageserver. It should
give us confidence that the scheduler loop is still alive, and
visibility of how many tasks the scheduler thinks are running.
2024-05-23 09:13:55 +01:00
Oleg Vasilev
eb0c026aac Bump vm-builder v0.28.1 -> v0.29.3 (#7849)
One change:
runner: allow coredump collection (#931)
2024-05-22 21:48:59 +00:00
Alex Chi Z
ff560a1113 chore(pageserver): use kebab case for compaction algorithms (#7845)
Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-22 21:28:47 +00:00
Alex Chi Z
4a278cce7c chore(pageserver): add force aux file policy switch handler (#7842)
For existing users, we want to allow doing a force switch for their aux
file policy.

Part of #7462 

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-22 19:05:26 +00:00
John Spray
f98fdd20e3 tests: add a couple of allow lists for shutdown cases (#7844)
## Problem

Failures on some of our uglier shutdown log messages:

https://neon-github-public-dev.s3.amazonaws.com/reports/main/9192662995/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/51b365408678c66f/

## Summary of changes

- Allow-list these errors.
2024-05-22 18:38:22 +00:00
John Spray
014f822a78 tests: refine test_secondary_background_downloads (#7829)
## Problem

This test relied on some sleeps, and was failing ~5% of the time.

## Summary of changes

Use log-watching rather than straight waits, and make timeouts more
generous for the CI environment.
2024-05-22 19:17:47 +01:00
Alex Chi Z
ddd8ebd253 chore(pageserver): use kebab case for aux file flag (#7840)
part of https://github.com/neondatabase/neon/issues/7462

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-22 17:06:00 +00:00
Conrad Ludgate
9cfe08e3d9 proxy password threadpool (#7806)
## Problem

Despite making password hashing async, it can still take time away from
the network code.

## Summary of changes

Introduce a custom threadpool, inspired by rayon. Features:

### Fairness

Each task is tagged with it's endpoint ID. The more times we have seen
the endpoint, the more likely we are to skip the task if it comes up in
the queue. This is using a min-count-sketch estimator for the number of
times we have seen the endpoint, resetting it every 1000+ steps.

Since tasks are immediately rescheduled if they do not complete, the
worker could get stuck in a "always work available loop". To combat
this, we check the global queue every 61 steps to ensure all tasks
quickly get a worker assigned to them.

### Balanced

Using crossbeam_deque, like rayon does, we have workstealing out of the
box. I've tested it a fair amount and it seems to balance the workload
accordingly
2024-05-22 17:05:43 +00:00
Alex Chi Z
64577cfddc feat(pageserver): auto-detect previous aux file policy (#7841)
## Problem

If an existing user already has some aux v1 files, we don't want to
switch them to the global tenant-level config.

Part of #7462 

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-22 12:41:13 -04:00
Heikki Linnakangas
37f81289c2 Make 'neon.protocol_version = 2' the default, take two (#7819)
Once all the computes in production have restarted, we can remove
protocol version 1 altogether.

See issue #6211.

This was done earlier already in commit 0115fe6cb2, but reverted before
it was released to production in commit bbe730d7ca because of issue
https://github.com/neondatabase/neon/issues/7692. That issue was fixed
in commit 22afaea6e1, so we are ready to change the default again.
2024-05-22 18:24:52 +03:00
Heikki Linnakangas
9217564026 Fix issues with determining request LSN in read replica (#7795)
Don't set last-written LSN of a page when the record is replayed, only
when the page is evicted from cache. For comparison, we don't update
the last-written LSN on every page modification on the primary either,
only when the page is evicted. Do update the last-written LSN when the
page update is skipped in WAL redo, however.

In neon_get_request_lsns(), don't be surprised if the last-written LSN
is equal to the record being replayed. Use the LSN of the record being
replayed as the request LSN in that case. Add a long comment
explaining how that can happen.

In neon_wallog_page, update last-written LSN also when Shutdown has
been requested. We might still fetch and evict pages for a while,
after shutdown has been requested, so we better continue to do that
correctly.

Enable the check that we don't evict a page with zero LSN also in
standby, but make it a LOG message instead of PANIC

Fixes issue https://github.com/neondatabase/neon/issues/7791
2024-05-22 18:24:21 +03:00
Heikki Linnakangas
3404e76a51 Fix confusion between 1-based Buffer and 0-based index (#7825)
The code was working correctly, but was incorrectly using Buffer for a
0-based index into the BufferDesc array.
2024-05-22 18:24:21 +03:00
Joonas Koivunen
62aac6c8ad fix(Layer): carry gate until eviction is complete (#7838)
the gate was accidentially being dropped before the final blocking
phase, possibly explaining the resident physical size global problems
during deletions.

it could had caused more harm as well, but the path is not actively
being tested because cplane no longer puts locationconfigs with higher
generation number during normal operation which prompted the last wave
of fixes.

Cc: #7341.
2024-05-22 18:13:45 +03:00
John Spray
e015b2bf3e safekeeper: use CancellationToken instead of watch channel (#7836)
## Problem

Safekeeper Timeline uses a channel for cancellation, but we have a
dedicated type for that.

## Summary of changes

- Use CancellationToken in Timeline
2024-05-22 16:10:58 +01:00
Alexander Bayandin
a7f31f1a59 CI: build multi-arch images (#7696)
## Problem

We don't build our docker images for ARM arch, and that makes it harder
to run images on ARM (on MacBooks with Apple Silicon, for example).

## Summary of changes
- Build `neondatabase/neon` for ARM and create a multi-arch image
- Build `neondatabase/compute-node-vXX` for ARM and create a multi-arch
image
- Run `test-images` job on ARM as well
2024-05-22 16:06:05 +01:00
Alexander Bayandin
325f3784f9 CI(promote-images): simplify & fix the job (#7826)
## Problem

Currently, `latest` tag is added to the images in several cases: 
```
github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy'
```

This leads to a race; the `latest` tag jumps back and forth depending on
the branch that has built images.

## Summary of changes
- Do not push `latest` images to prod ECR (we don't use it)
- Use `docker buildx imagetools` instead of `crane` for tagging images
- Unify `vm-compute-node-image` job with others and use dockerhub as a
first source for images (sync images with ECR)
- Tag images with `latest` only for commits in `main`
2024-05-22 15:02:20 +00:00
Tristan Partin
900f391115 Make postgres_version action input default to a string
This is "required" by GitHub Actions, though they must do some coersion
on their side.
2024-05-22 09:20:00 -05:00
Tristan Partin
8901ce9c99 Fix typos in action definitions 2024-05-22 09:20:00 -05:00
Joonas Koivunen
ce44dfe353 openapi: document timeline ancestor detach (#7650)
The openapi description with the error descriptions:

- 200 is used for "detached or has been detached previously"
- 400 is used for "cannot be detached right now" -- it's an odd thing,
but good enough
- 404 is used for tenant or timeline not found
- 409 is used for "can never be detached" (root timeline)
- 500 is used for transient errors (basically ill-defined shutdown
errors)
- 503 is used for busy (other tenant ancestor detach underway,
pageserver shutdown)

Cc: #6994
2024-05-22 13:55:34 +00:00
Alexander Bayandin
d1d55bbd9f CI(report-benchmarks-failures): fix condition (#7820)
## Problem

`report-benchmarks-failures` got skipped if a dependent job fails.

## Summary of changes
- Fix the if-condition by adding `&& failures()` to it; it'll make the
job run if the dependent job fails.
2024-05-22 14:43:10 +01:00
Joonas Koivunen
df9ab1b5e3 refactor(test): duplication with fullbackup, tar content hashing (#7828)
"taking a fullbackup" is an ugly multi-liner copypasted in multiple
places, most recently with timeline ancestor detach tests. move it under
`PgBin` which is not a great place, but better than yet another utility
function.

Additionally:
- cleanup `psql_env` repetition (PgBin already configures that)
- move the backup tar comparison as a yet another free utility function
- use backup tar comparison in `test_import.py` where a size check was
done previously
- cleanup extra timeline creation from test

Cc: #7715
2024-05-22 15:43:21 +03:00
Heikki Linnakangas
ef96c82c9f Fix zenith_test_evict mode and clear_buffer_cache() function
Using InvalidateBuffer is wrong, because if the page is concurrently
dirtied, it will throw away the dirty page without calling
smgwrite(). In Neon, that means that the last-written LSN update for
the page is missed.

In v16, use the new InvalidateVictimBuffer() function that does what
we need. In v15 and v14, backport the InvalidateVictimBuffer()
function.

Fixes issue https://github.com/neondatabase/neon/issues/7802
2024-05-22 14:26:03 +03:00
Arseny Sher
b43f6daa48 One more iteration on making walcraft test more robust.
Some WAL might be inserted on the page boundary before XLOG_SWITCH lands there,
repeat construction in this case.
2024-05-22 14:23:49 +03:00
Arpad Müller
664f92dc6e Refactor PageServerHandler::process_query parsing (#7835)
In the process_query function in page_service.rs there was some
redundant duplication. Remove it and create a vector of whitespace
separated parts at the start and then use `slice::strip_prefix`. Only
use `starts_with` in the places with multiple whitespace separated
parameters: here we want to preserve grep/rg ability.

Followup of #7815, requested in
https://github.com/neondatabase/neon/pull/7815#pullrequestreview-2068835674
2024-05-22 12:43:03 +02:00
Arthur Petukhovsky
bd5cb9e86b Implement timeline_manager for safekeeper background tasks (#7768)
In safekeepers we have several background tasks. Previously `WAL backup`
task was spawned by another task called `wal_backup_launcher`. That task
received notifications via `wal_backup_launcher_rx` and decided to spawn
or kill existing backup task associated with the timeline. This was
inconvenient because each code segment that touched shared state was
responsible for pushing notification into `wal_backup_launcher_tx`
channel. This was error prone because it's easy to miss and could lead
to deadlock in some cases, if notification pushing was done in the wrong
order.

We also had a similar issue with `is_active` timeline flag. That flag
was calculated based on the state and code modifying the state had to
call function to update the flag. We had a few bugs related to that,
when we forgot to update `is_active` flag in some places where it could
change.

To fix these issues, this PR adds a new `timeline_manager` background
task associated with each timeline. This task is responsible for
managing all background tasks, including `is_active` flag which is used
for pushing broker messages. It is subscribed for updates in timeline
state in a loop and decides to spawn/kill background tasks when needed.

There is a new structure called `TimelinesSet`. It stores a set of
`Arc<Timeline>` and allows to copy the set to iterate without holding
the mutex. This is what replaced `is_active` flag for the broker. Now
broker push task holds a reference to the `TimelinesSet` with active
timelines and use it instead of iterating over all timelines and
filtering by `is_active` flag.

Also added some metrics for manager iterations and active backup tasks.
Ideally manager should be doing not too many iterations and we should
not have a lot of backup tasks spawned at the same time.

Fixes #7751

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2024-05-22 09:34:39 +01:00
Em Sharnoff
00d66e8012 compute_ctl: Fix handling of missing /neonvm/bin/resize-swap (#7832)
The logic added in the original PR (#7434) only worked before sudo was
used, because 'sudo foo' will only fail with NotFound if 'sudo' doesn't
exist; if 'foo' doesn't exist, then sudo will fail with a normal error
exit.

This means that compute_ctl may fail to restart if it exits after
successfully enabling swap.
2024-05-21 16:52:48 -07:00
Arpad Müller
679e031cf6 Add dummy lsn lease http and page service APIs (#7815)
We want to introduce a concept of temporary and expiring LSN leases.
This adds both a http API as well as one for the page service to obtain
temporary LSN leases.

This adds a dummy implementation to unblock integration work of this
API. A functional implementation of the lease feature is deferred to a
later step.

Fixes #7808

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2024-05-21 23:31:20 +02:00
Alex Chi Z
e3f6a07ca3 chore(pageserver): remove metrics for in-memory ingestion (#7823)
The metrics was added in https://github.com/neondatabase/neon/pull/7515/
to observe if https://github.com/neondatabase/neon/pull/7467 introduces
any perf regressions.

The change was deployed on 5/7 and no changes are observed in the
metrics. So it's safe to remove the metrics now.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-05-21 13:33:29 -04:00
Joonas Koivunen
a8a88ba7bc test(detach_ancestor): ensure L0 compaction in history is ok (#7813)
detaching a timeline from its ancestor can leave the resulting timeline
with more L0 layers than the compaction threshold. most of the time, the
detached timeline has made progress, and next time the L0 -> L1
compaction happens near the original branch point and not near the
last_record_lsn.

add a test to ensure that inheriting the historical L0s does not change
fullbackup. additionally:
- add `wait_until_completed` to test-only timeline checkpoint and
compact HTTP endpoints. with `?wait_until_completed=true` the endpoints
will wait until the remote client has completed uploads.
- for delta layers, describe L0-ness with the `/layer` endpoint

Cc: #6994
2024-05-21 20:08:43 +03:00
John Spray
353afe4fe7 neon_local: run controller's postgres with fsync=off (#7817)
## Problem

In `test_storage_controller_many_tenants` we
[occasionally](https://neon-github-public-dev.s3.amazonaws.com/reports/main/9155810417/index.html#/testresult/8fbdf57a0e859c2d)
see it hit the retry limit on serializable transactions. That's likely
due to a combination of relative slow fsync on the hetzner nodes running
the test, and the way the test does lots of parallel timeline creations,
putting high load on the drive.

Running the storage controller's db with fsync=off may help here.

## Summary of changes

- Set `fsync=off` in the postgres config for the database used by the
storage controller in tests
2024-05-21 18:13:54 +03:00
Tristan Partin
1988ad8db7 Extend test_unlogged to include a sequence
Unlogged sequences were added in v15, so let's just test to make sure
they work on Neon.
2024-05-21 09:18:11 -05:00
Tristan Partin
e3415706b7 Upgrade Postgres v16 to 16.3 2024-05-21 09:18:11 -05:00
Tristan Partin
9d081851ec Upgrade Postgres v15 to 15.7 2024-05-21 09:18:11 -05:00
Tristan Partin
781352bd8e Upgrade Postgres v14 to 14.12 2024-05-21 09:18:11 -05:00
Tristan Partin
8030b8e4c5 Fix test_pg_regress for unlogged relations
Previously we worked around file comparison issues by dropping unlogged
relations in the pg_regress tests, but this would lead to an unnecessary
diff when compared to upstream in our Postgres fork. Instead, we can
precompute the files that we know will be different, and ignore them.
2024-05-21 09:18:11 -05:00
Tristan Partin
9a4b896636 Use a constant for database name in test_pg_regress 2024-05-21 09:18:11 -05:00
Tristan Partin
e8b8ebfa1d Allow check_restored_datadir_content to ignore certain files
Some files may have known differences that we are okay with.
2024-05-21 09:18:11 -05:00
Tristan Partin
d9d471e3c4 Add some Python typing in a few test files 2024-05-21 09:18:11 -05:00