rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-16 01:42:55 +00:00

Author	SHA1	Message	Date
Stas Kelvich	8df388330b	Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import	2024-09-12 19:26:16 +01:00
Stas Kelvich	357c07dd35	track rel file import time	2024-09-12 19:26:03 +01:00
Heikki Linnakangas	7b90ec6e19	Create controlfile and checkpoint entries XXX: untested, not sure if it works..	2024-09-12 21:01:04 +03:00
Heikki Linnakangas	85f4e966e8	Import dummy pg_twophase dir entry	2024-09-12 20:54:16 +03:00
Heikki Linnakangas	4d27048d6d	Import SLRUs	2024-09-12 20:46:20 +03:00
Stas Kelvich	3a452d8f56	remove old timeline init code	2024-09-12 18:20:13 +01:00
Stas Kelvich	b81dbc887b	import relation sizes	2024-09-12 18:19:25 +01:00
Stas Kelvich	80fed9cfb1	fix oder of insertion for relmaps and reldirs	2024-09-12 15:43:54 +01:00
Stas Kelvich	189386b22f	Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import	2024-09-12 13:52:11 +01:00
Stas Kelvich	38dfecb026	clean imports	2024-09-12 13:51:48 +01:00
Stas Kelvich	be28bd8312	merge	2024-09-12 13:49:34 +01:00
Heikki Linnakangas	9759d6ec72	Rename the image layer to not have the temp suffix	2024-09-12 15:49:21 +03:00
Stas Kelvich	0c64d55a6b	Import dbdir, relmaps, reldirs	2024-09-12 13:48:29 +01:00
Heikki Linnakangas	578da1dc02	Parse postgres version from control file	2024-09-12 15:21:59 +03:00
Stas Kelvich	842ac7cfda	resolve conflicts	2024-09-12 13:13:16 +01:00
Stas Kelvich	71340e3c00	common iterators for pg data dirs	2024-09-12 13:10:35 +01:00
Heikki Linnakangas	e6e0b27dc3	Create index_part.json	2024-09-12 14:53:29 +03:00
Heikki Linnakangas	04ec8bd7de	test: Attach the tenant, start endpoint on it Doesn't work yet, I think because index_part.json is missing	2024-09-12 13:52:14 +03:00
Heikki Linnakangas	6563be1a4c	Test passes now It runs the command successfully. Doesn't try to attach it to the pageserver on it yet BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest --preserve-database-files test_runner/regress/test_pg_import.py	2024-09-12 13:36:42 +03:00
Heikki Linnakangas	fe975acc71	Add --tenant-id and --timeline-id options	2024-09-12 13:28:12 +03:00
Heikki Linnakangas	abed35589b	Test fix	2024-09-12 12:59:45 +03:00
Stas Kelvich	3fe8b69968	Merge branch 'hack/fast-import' of github.com:neondatabase/neon into hack/fast-import	2024-09-12 10:59:24 +01:00
Stas Kelvich	0c856443c4	now it produces an image layer	2024-09-12 10:57:50 +01:00
Heikki Linnakangas	0fc584ef9a	Add python test	2024-09-12 12:43:12 +03:00
Stas Kelvich	daedec65ac	fix awaits	2024-09-12 10:42:08 +01:00
Stas Kelvich	94c393bf8f	resolve conflicts	2024-09-12 10:37:07 +01:00
Stas Kelvich	28616b0907	compiles	2024-09-12 10:33:14 +01:00
Heikki Linnakangas	241724f3fc	CLI args parsing	2024-09-12 12:31:07 +03:00
Stas Kelvich	98d128d993	first sketch	2024-09-12 09:59:36 +01:00
Erik Grinaker	b37da32c6f	pageserver: reuse idempotency keys across metrics sinks (#8876 ) ## Problem Metrics event idempotency keys differ across S3 and Vector. The events should be identical. Resolves #8605. ## Summary of changes Pre-generate the idempotency keys and pass the same set into both metrics sinks. Co-authored-by: John Spray <john@neon.tech>	2024-09-03 09:05:24 +01:00
Christian Schwarz	3b317cae07	page_cache/layer load: correctly classify layer summary block reads (#8885 ) Before this PR, we would classify layer summary block reads as "Unknown" content kind. <img width="1267" alt="image" src="https://github.com/user-attachments/assets/508af034-5c2a-4c89-80db-2899967b337c">	2024-09-02 16:09:26 +01:00
Christian Schwarz	bf0531d107	fixup(#8839 ): `test_forward_compatibility` needs to allow lag warning as well (#8891 ) Found in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8885/10665614629/index.html#suites/0fbaeb107ef328d03993d44a1fb15690/ea10ba1c140fba1d	2024-09-02 15:10:10 +01:00
Christian Schwarz	15e90cc427	bottommost-compaction: remove dead code / rectify cfg!()s (#8884 ) part of https://github.com/neondatabase/neon/issues/8002	2024-09-02 14:45:17 +01:00
Arpad Müller	9746b6ea31	Implement archival_config timeline endpoint in the storage controller (#8680 ) Implement the timeline specific `archival_config` endpoint also in the storage controller. It's mostly a copy-paste of the detach handler: the task is the same: do the same operation on all shards. Part of #8088.	2024-09-02 13:51:45 +02:00
John Spray	516ac0591e	storage controller: eliminate ensure_attached (#8875 ) ## Problem This is a followup to #8783 - The old blocking ensure_attached function had been retained to handle the case where a shard had a None generation_pageserver, but this wasn't really necessary. - There was a subtle `.1` in the code where a struct would have been clearer Closes #8819 ## Summary of changes - Add ShardGenerationState to represent the results of peek_generation - Instead of calling ensure_attached when a tenant has a non-attached shard, check the shard's policy and return 409 if it isn't Attached, else return 503 if the shard's policy is attached but it hasn't been reconciled yet (i.e. has a None generation_pageserver)	2024-09-02 11:36:57 +00:00
Arpad Müller	3ec785f30d	Add safekeeper scrubber test (#8785 ) The test is very rudimentary, it only checks that before and after tenant deletion, we can run `scan_metadata` for the safekeeper node kind. Also, we don't actually expect any uploaded data, for that we don't have enough WAL (needs to create at least one S3-uploaded file, the scrubber doesn't recognize partial files yet). The `scan_metadata` scrubber subcommand is extended to support either specifying a database connection string, which was previously the only way, and required a database to be present, or specifying the timeline information manually via json. This is ideal for testing scenarios because in those, the number of timelines is usually limited, but it is involved to spin up a database just to write the timeline information.	2024-08-31 01:12:25 +02:00
Alex Chi Z.	05caaab850	fix(pageserver): fire layer eviction alert only when it's visible (#8882 ) The pull request https://github.com/neondatabase/neon/pull/8679 explicitly mentioned that it will evict layers earlier than before. Given that the eviction metrics is solely based on eviction threshold (which is 86400s now), we should consider the early eviction and do not fire alert if it's a covered layer. ## Summary of changes Record eviction timer only when the layer is visible + accessed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 17:22:26 -04:00
Yuchen Liang	cacb1ae333	pageserver: set default io_buffer_alignment to 512 bytes (#8878 ) ## Summary of changes - Setting default io_buffer_alignment to 512 bytes. - Fix places that assumed `DEFAULT_IO_BUFFER_ALIGNMENT=0` - Adapt unit tests to handle merge with `chunk size <= 4096`. ## Testing and Performance We have done sufficient performance de-risking. Enabling it by default completes our correctness de-risking before the next release. Context: https://neondb.slack.com/archives/C07BZ38E6SD/p1725026845455259 Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-30 19:53:52 +01:00
Alex Chi Z.	df971f995c	feat(storage-scrubber): check layer map validity (#8867 ) When implementing bottom-most gc-compaction, we analyzed the structure of layer maps that the current compaction algorithm could produce, and decided to only support structures without delta layer overlaps and LSN intersections with the exception of single key layers. ## Summary of changes This patch adds the layer map valid check in the storage scrubber. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 14:12:39 -04:00
Alexander Bayandin	e58e045ebb	CI(promote-compatibility-data): fix job (#8871 ) ## Problem `promote-compatibility-data` job got broken and slightly outdated after - https://github.com/neondatabase/neon/pull/8552 -- we don't upload artifacts for ARM64 - https://github.com/neondatabase/neon/pull/8561 -- we don't prepare `debug` artifacts in the release branch anymore ## Summary of changes - Promote artifacts from release PRs to the latest version (but do it from `release` branch) - Upload artifacts for both X64 and ARM64	2024-08-30 13:18:30 +01:00
John Spray	20f82f9169	storage controller: sleep between compute notify retries (#8869 ) ## Problem Live migration retries when it fails to notify the compute of the new location. It should sleep between attempts. Closes: https://github.com/neondatabase/neon/issues/8820 ## Summary of changes - Do an `exponential_backoff` in the retry loop for compute notifications	2024-08-30 11:44:13 +01:00
Conrad Ludgate	72aa6b02da	chore: speed up testing (#8874 ) `safekeeper::random_test test_random_schedules` debug test takes over 2 minutes to run on our arm runners. Running it 6 times with pageserver settings seems redundant.	2024-08-30 11:34:23 +01:00
Conrad Ludgate	022fad65eb	proxy: fix password hash cancellation (#8868 ) In #8863 I replaced the threadpool with tokio tasks, but there was a behaviour I missed regarding cancellation. Adding the JoinHandle wrapper that triggers abort on drop should fix this. Another change, any panics that occur in password hashing will be propagated through the resume_unwind functionality.	2024-08-29 20:16:44 +01:00
Arpad Müller	8eaa8ad358	Remove async_trait usages from safekeeper and neon_local (#8864 ) Removes additional async_trait usages from safekeeper and neon_local. Also removes now redundant dependencies of the `async_trait` crate. cc earlier work: #6305, #6464, #7303, #7342, #7212, #8296	2024-08-29 18:24:25 +02:00
Alex Chi Z.	653a6532a2	fix(pageserver): reject non-i128 key on the write path (#8648 ) It's better to reject invalid keys on the write path than storing it and panic-ing the pageserver. https://github.com/neondatabase/neon/issues/8636 ## Summary of changes If a key cannot be represented using i128, we don't allow writing that key into the pageserver. There are two versions of the check valid function: the normal one that simply rejects i128 keys, and the stronger one that rejects all keys that we don't support. The current behavior when a key gets rejected is that safekeeper will keep retrying streaming that key to the pageserver. And once such key gets written, no new computes can be started. Therefore, there could be a large amount of pageserver warnings if a key cannot be ingested. To validate this behavior by yourself, the reviewer can (1) use the stronger version of the valid check (2) run the following SQL. ``` set neon.regress_test_mode = true; CREATE TABLESPACE regress_tblspace LOCATION '/Users/skyzh/Work/neon-test/tablespace'; CREATE SCHEMA testschema; CREATE TABLE testschema.foo (i int) TABLESPACE regress_tblspace; insert into testschema.foo values (1), (2), (3); ``` For now, I'd like to merge the patch with only rejecting non-i128 keys. It's still unknown whether the stronger version covers all the cases that basebackup doesn't support. Furthermore, the behavior of rejecting a key will produce large amounts of warnings due to safekeeper retry. Therefore, I'd like to reject the minimum set of keys that we don't support (i128 ones) for now. (well, erroring out is better than panic on `to_compact_key`) The next step is to fix the safekeeper behavior (i.e., on such key rejections, stop streaming WAL), so that we can properly stop writing. An alternative solution is to simply drop these keys on the write path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:07:05 -04:00
Alex Chi Z.	18bfc43fa7	fix(pageserver): add dry-run to force compact API (#8859 ) Add `dry-run` flag to the compact API Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:01:54 -04:00
Conrad Ludgate	7ce49fe6e3	proxy: improve test performance (#8863 ) Some tests were very slow and some tests occasionally stalled. This PR improves some test performance and replaces the custom threadpool in order to fix the stalling of tests.	2024-08-29 13:20:15 +00:00
Christian Schwarz	a8fbc63be2	tenant background loops: periodic log message if long-running iteration (#8832 ) refs https://github.com/neondatabase/neon/issues/7524 Problem ------- When browsing Pageserver logs, background loop iterations that take a long time are hard to spot / easy to miss because they tend to not produce any log messages unless: - they overrun their period, but that's only one message when the iteration completes late - they do something that produces logs (e.g., create image layers) Further, a slow iteration that is still running does will not log nor bump the metrics of `warn_when_period_overrun`until _after_ it has finished. Again, that makes a still-running iteration hard to spot. Solution -------- This PR adds a wrapper around the per-tenant background loops that, while a slow iteration is ongoing, emit a log message every $period.	2024-08-29 15:06:13 +02:00
Arpad Müller	96b5c4d33d	Don't unarchive a timeline if its ancestor is archived (#8853 ) If a timeline unarchival request comes in, give an error if the parent timeline is archived. This prevents us from the situation of having an archived timeline with children that are not archived. Follow up of #8824 Part of #8088 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-08-29 12:54:02 +00:00
Christian Schwarz	c7481402a0	pageserver: default to 4MiB stack size and add env var to control it (#8862 ) # Motivation In https://github.com/neondatabase/neon/pull/8832 I get tokio runtime worker stack overflow errors in debug builds. In a similar vein, I had tokio runtimer worker stack overflow when trying to eliminate `async_trait` (https://github.com/neondatabase/neon/pull/8296). The 2MiB default is kind of arbitrary - so this PR bumps it to 4MiB. It also adds an env var to control it. # Risk Assessment With our 4 runtimes, the worst case stack memory usage is `4 (runtimes) * ($num_cpus (executor threads) + 512 (blocking pool threads)) * 4MiB`. On i3en.3xlarge, that's `8384 MiB`. On im4gn.2xlarge, that's `8320 MiB`. Before this change, it was half that. Looking at production metrics, we _do_ have the headroom to accomodate this worst case case. # Alternatives The problems only occur with debug builds, so technically we could only raise the stack size for debug builds. However, it would be another configuration where `debug != release`. # Future Work If we ever enable single runtime mode in prod (=> https://github.com/neondatabase/neon/issues/7312 ) then the worst case will drop to 25% of its current value. Eliminating the use of `tokio::spawn_blocking` / `tokio::fs` in favor of `tokio-epoll-uring` (=> https://github.com/neondatabase/neon/issues/7370 ) would reduce the worst case to `4 (runtimes) * $num_cpus (executor threads) * 4 MiB`.	2024-08-29 14:02:27 +02:00

1 2 3 4 5 ...

6016 Commits