Commit Graph

6022 Commits

Author SHA1 Message Date
Stas Kelvich
2bf9a350ef Process chunks in parallel 2024-09-14 14:07:35 +01:00
Heikki Linnakangas
2e7e5f4f3a Refactor how the image layer partitioning is done
It got pretty ugly after the last commit. Refactor it so that we first
collect all the key ranges that need to be written out into a list of
tasks, then partition the tasks into image layers, and then write them
out. This will be much easier to parallelize, but that's not included
in this commit yet.
2024-09-13 05:41:35 +03:00
Heikki Linnakangas
9f3d5826be Fix with files > 4 GB 2024-09-13 02:27:26 +03:00
Heikki Linnakangas
4f39501641 Fix handling relations > 1 GB 2024-09-13 02:04:48 +03:00
Heikki Linnakangas
67d1606f82 Write out multiple image layers 2024-09-13 01:47:10 +03:00
Heikki Linnakangas
9351ba26ff Fake LSN
Test passes, yay!
2024-09-12 21:44:49 +03:00
Stas Kelvich
8df388330b Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import 2024-09-12 19:26:16 +01:00
Stas Kelvich
357c07dd35 track rel file import time 2024-09-12 19:26:03 +01:00
Heikki Linnakangas
7b90ec6e19 Create controlfile and checkpoint entries
XXX: untested, not sure if it works..
2024-09-12 21:01:04 +03:00
Heikki Linnakangas
85f4e966e8 Import dummy pg_twophase dir entry 2024-09-12 20:54:16 +03:00
Heikki Linnakangas
4d27048d6d Import SLRUs 2024-09-12 20:46:20 +03:00
Stas Kelvich
3a452d8f56 remove old timeline init code 2024-09-12 18:20:13 +01:00
Stas Kelvich
b81dbc887b import relation sizes 2024-09-12 18:19:25 +01:00
Stas Kelvich
80fed9cfb1 fix oder of insertion for relmaps and reldirs 2024-09-12 15:43:54 +01:00
Stas Kelvich
189386b22f Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import 2024-09-12 13:52:11 +01:00
Stas Kelvich
38dfecb026 clean imports 2024-09-12 13:51:48 +01:00
Stas Kelvich
be28bd8312 merge 2024-09-12 13:49:34 +01:00
Heikki Linnakangas
9759d6ec72 Rename the image layer to not have the temp suffix 2024-09-12 15:49:21 +03:00
Stas Kelvich
0c64d55a6b Import dbdir, relmaps, reldirs 2024-09-12 13:48:29 +01:00
Heikki Linnakangas
578da1dc02 Parse postgres version from control file 2024-09-12 15:21:59 +03:00
Stas Kelvich
842ac7cfda resolve conflicts 2024-09-12 13:13:16 +01:00
Stas Kelvich
71340e3c00 common iterators for pg data dirs 2024-09-12 13:10:35 +01:00
Heikki Linnakangas
e6e0b27dc3 Create index_part.json 2024-09-12 14:53:29 +03:00
Heikki Linnakangas
04ec8bd7de test: Attach the tenant, start endpoint on it
Doesn't work yet, I think because index_part.json is missing
2024-09-12 13:52:14 +03:00
Heikki Linnakangas
6563be1a4c Test passes now
It runs the command successfully. Doesn't try to attach it to the
pageserver on it yet

    BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py
2024-09-12 13:36:42 +03:00
Heikki Linnakangas
fe975acc71 Add --tenant-id and --timeline-id options 2024-09-12 13:28:12 +03:00
Heikki Linnakangas
abed35589b Test fix 2024-09-12 12:59:45 +03:00
Stas Kelvich
3fe8b69968 Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import 2024-09-12 10:59:24 +01:00
Stas Kelvich
0c856443c4 now it produces an image layer 2024-09-12 10:57:50 +01:00
Heikki Linnakangas
0fc584ef9a Add python test 2024-09-12 12:43:12 +03:00
Stas Kelvich
daedec65ac fix awaits 2024-09-12 10:42:08 +01:00
Stas Kelvich
94c393bf8f resolve conflicts 2024-09-12 10:37:07 +01:00
Stas Kelvich
28616b0907 compiles 2024-09-12 10:33:14 +01:00
Heikki Linnakangas
241724f3fc CLI args parsing 2024-09-12 12:31:07 +03:00
Stas Kelvich
98d128d993 first sketch 2024-09-12 09:59:36 +01:00
Erik Grinaker
b37da32c6f pageserver: reuse idempotency keys across metrics sinks (#8876)
## Problem

Metrics event idempotency keys differ across S3 and Vector. The events
should be identical.

Resolves #8605.

## Summary of changes

Pre-generate the idempotency keys and pass the same set into both
metrics sinks.

Co-authored-by: John Spray <john@neon.tech>
2024-09-03 09:05:24 +01:00
Christian Schwarz
3b317cae07 page_cache/layer load: correctly classify layer summary block reads (#8885)
Before this PR, we would classify layer summary block reads as "Unknown"
content kind.

<img width="1267" alt="image"
src="https://github.com/user-attachments/assets/508af034-5c2a-4c89-80db-2899967b337c">
2024-09-02 16:09:26 +01:00
Christian Schwarz
bf0531d107 fixup(#8839): test_forward_compatibility needs to allow lag warning as well (#8891)
Found in
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8885/10665614629/index.html#suites/0fbaeb107ef328d03993d44a1fb15690/ea10ba1c140fba1d
2024-09-02 15:10:10 +01:00
Christian Schwarz
15e90cc427 bottommost-compaction: remove dead code / rectify cfg!()s (#8884)
part of https://github.com/neondatabase/neon/issues/8002
2024-09-02 14:45:17 +01:00
Arpad Müller
9746b6ea31 Implement archival_config timeline endpoint in the storage controller (#8680)
Implement the timeline specific `archival_config` endpoint also in the
storage controller.

It's mostly a copy-paste of the detach handler: the task is the same: do
the same operation on all shards.

Part of #8088.
2024-09-02 13:51:45 +02:00
John Spray
516ac0591e storage controller: eliminate ensure_attached (#8875)
## Problem

This is a followup to #8783

- The old blocking ensure_attached function had been retained to handle
the case where a shard had a None generation_pageserver, but this wasn't
really necessary.
- There was a subtle `.1` in the code where a struct would have been
clearer

Closes #8819

## Summary of changes

- Add ShardGenerationState to represent the results of peek_generation
- Instead of calling ensure_attached when a tenant has a non-attached
shard, check the shard's policy and return 409 if it isn't Attached,
else return 503 if the shard's policy is attached but it hasn't been
reconciled yet (i.e. has a None generation_pageserver)
2024-09-02 11:36:57 +00:00
Arpad Müller
3ec785f30d Add safekeeper scrubber test (#8785)
The test is very rudimentary, it only checks that before and after
tenant deletion, we can run `scan_metadata` for the safekeeper node
kind. Also, we don't actually expect any uploaded data, for that we
don't have enough WAL (needs to create at least one S3-uploaded file,
the scrubber doesn't recognize partial files yet).

The `scan_metadata` scrubber subcommand is extended to support either
specifying a database connection string, which was previously the only
way, and required a database to be present, or specifying the timeline
information manually via json. This is ideal for testing scenarios
because in those, the number of timelines is usually limited,
but it is involved to spin up a database just to write the timeline
information.
2024-08-31 01:12:25 +02:00
Alex Chi Z.
05caaab850 fix(pageserver): fire layer eviction alert only when it's visible (#8882)
The pull request https://github.com/neondatabase/neon/pull/8679
explicitly mentioned that it will evict layers earlier than before.
Given that the eviction metrics is solely based on eviction threshold
(which is 86400s now), we should consider the early eviction and do not
fire alert if it's a covered layer.

## Summary of changes

Record eviction timer only when the layer is visible + accessed.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-08-30 17:22:26 -04:00
Yuchen Liang
cacb1ae333 pageserver: set default io_buffer_alignment to 512 bytes (#8878)
## Summary of changes

- Setting default io_buffer_alignment to 512 bytes. 
- Fix places that assumed `DEFAULT_IO_BUFFER_ALIGNMENT=0`
- Adapt unit tests to handle merge with `chunk size <= 4096`.

## Testing and Performance

We have done sufficient performance de-risking. 

Enabling it by default completes our correctness de-risking before the
next release.

Context: https://neondb.slack.com/archives/C07BZ38E6SD/p1725026845455259

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-08-30 19:53:52 +01:00
Alex Chi Z.
df971f995c feat(storage-scrubber): check layer map validity (#8867)
When implementing bottom-most gc-compaction, we analyzed the structure
of layer maps that the current compaction algorithm could produce, and
decided to only support structures without delta layer overlaps and LSN
intersections with the exception of single key layers.

## Summary of changes

This patch adds the layer map valid check in the storage scrubber.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-08-30 14:12:39 -04:00
Alexander Bayandin
e58e045ebb CI(promote-compatibility-data): fix job (#8871)
## Problem

`promote-compatibility-data` job got broken and slightly outdated after 
- https://github.com/neondatabase/neon/pull/8552 -- we don't upload
artifacts for ARM64
- https://github.com/neondatabase/neon/pull/8561 -- we don't prepare
`debug` artifacts in the release branch anymore

## Summary of changes
- Promote artifacts from release PRs to the latest version (but do it
from `release` branch)
- Upload artifacts for both X64 and ARM64
2024-08-30 13:18:30 +01:00
John Spray
20f82f9169 storage controller: sleep between compute notify retries (#8869)
## Problem

Live migration retries when it fails to notify the compute of the new
location. It should sleep between attempts.

Closes: https://github.com/neondatabase/neon/issues/8820

## Summary of changes

- Do an `exponential_backoff` in the retry loop for compute
notifications
2024-08-30 11:44:13 +01:00
Conrad Ludgate
72aa6b02da chore: speed up testing (#8874)
`safekeeper::random_test test_random_schedules` debug test takes over 2
minutes to run on our arm runners. Running it 6 times with pageserver
settings seems redundant.
2024-08-30 11:34:23 +01:00
Conrad Ludgate
022fad65eb proxy: fix password hash cancellation (#8868)
In #8863 I replaced the threadpool with tokio tasks, but there was a
behaviour I missed regarding cancellation. Adding the JoinHandle wrapper
that triggers abort on drop should fix this.

Another change, any panics that occur in password hashing will be
propagated through the resume_unwind functionality.
2024-08-29 20:16:44 +01:00
Arpad Müller
8eaa8ad358 Remove async_trait usages from safekeeper and neon_local (#8864)
Removes additional async_trait usages from safekeeper and neon_local.

Also removes now redundant dependencies of the `async_trait` crate.

cc earlier work: #6305, #6464, #7303, #7342, #7212, #8296
2024-08-29 18:24:25 +02:00