rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-08 06:30:37 +00:00

Author	SHA1	Message	Date
Christian Schwarz	cb4908dedb	Merge branch 'problame/2024-02-walredo-work/prespawn/switch-to-heavier-once-cell-with-rwlock' into problame/2024-02-walredo-work/prespawn/impl	2024-02-02 18:36:19 +00:00
Christian Schwarz	f8652fc738	Merge branch 'problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch' into problame/2024-02-walredo-work/prespawn/switch-to-heavier-once-cell-with-rwlock	2024-02-02 18:36:17 +00:00
Christian Schwarz	bbf954d411	Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch	2024-02-02 18:36:16 +00:00
Christian Schwarz	efb3e7bb15	Merge branch 'problame/2024-02-walredo-work/prespawn/split-code' into problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo	2024-02-02 18:36:15 +00:00
Christian Schwarz	01688a5ce1	Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code	2024-02-02 18:36:14 +00:00
John Spray	2e5eab69c6	tests: remove test_gc_cutoff (#6587 ) This test became flaky when postgres retry handling was fixed to use backoff delays -- each iteration in this test's loop was taking much longer because pgbench doesn't fail until postgres has given up on retrying to the pageserver. We are just removing it, because the condition it tests is no longer risky: we reload all metadata from remote storage on restart, so crashing directly between making local changes and doing remote uploads isn't interesting any more. Closes: https://github.com/neondatabase/neon/issues/2856 Closes: https://github.com/neondatabase/neon/issues/5329	2024-02-02 18:20:18 +00:00
Joonas Koivunen	caf868e274	test: assert we eventually free space (#6536 ) in `test_statvfs_pressure_{usage,min_avail_bytes}` we now race against initial logical size calculation on-demand downloading the layers. first wait out the initial logical sizes, then change the final asserts to be "eventual", which is not great but it is faster than failing and retrying. this issue seems to happen only in debug mode tests. Fixes: #6510	2024-02-02 19:46:47 +02:00
Christian Schwarz	86bd14181e	Merge branch 'problame/2024-02-walredo-work/prespawn/switch-to-heavier-once-cell-with-rwlock' into problame/2024-02-walredo-work/prespawn/impl	2024-02-02 17:34:49 +00:00
Christian Schwarz	64b4b498a4	Revert "remove the walredo usage, that'll be in the next pr" This reverts commit `20e82629df`.	2024-02-02 17:25:25 +00:00
Christian Schwarz	20e82629df	remove the walredo usage, that'll be in the next pr	2024-02-02 17:21:59 +00:00
Christian Schwarz	6788bde87a	Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch	2024-02-02 17:16:26 +00:00
Christian Schwarz	283c8abc04	Merge branch 'problame/2024-02-walredo-work/prespawn/split-code' into problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo	2024-02-02 17:16:25 +00:00
Christian Schwarz	647d409f0f	Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code	2024-02-02 17:16:24 +00:00
Christian Schwarz	0a09cff816	heavier_once_cell: switch to tokio::sync::RwLock Using the RwLock reduces contention on the hot path.	2024-02-02 17:09:56 +00:00
John Spray	7e2436695d	storage controller: use AWS Secrets Manager for database URL, etc (#6585 ) ## Problem Passing secrets in via CLI/environment is awkward when using helm for deployment, and not ideal for security (secrets may show up in ps, /proc). We can bypass these issues by simply connecting directly to the AWS Secrets Manager service at runtime. ## Summary of changes - Add dependency on aws-sdk-secretsmanager - Update other aws dependencies to latest, to match transitive dependency versions - Add `Secrets` type in attachment service, using AWS SDK to load if secrets are not provided on the command line.	2024-02-02 16:57:11 +00:00
Christian Schwarz	c29532cded	Revert "Revert "[DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell"" This reverts commit `6d94d9fb19`.	2024-02-02 16:43:14 +00:00
Christian Schwarz	1102d3f0bf	Revert "switch to tokio::RwLock" This reverts commit `e8f1af5527`.	2024-02-02 16:43:08 +00:00
Christian Schwarz	e8f1af5527	switch to tokio::RwLock	2024-02-02 16:42:54 +00:00
Christian Schwarz	6d94d9fb19	Revert "[DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell" This reverts commit `2ab2608d4c`.	2024-02-02 16:15:37 +00:00
Conrad Ludgate	6506fd14c4	proxy: more refactors (#6526 ) ## Problem not really any problem, just some drive-by changes ## Summary of changes 1. move wake compute 2. move json processing 3. move handle_try_wake 4. move test backend to api provider 5. reduce wake-compute concerns 6. remove duplicate wake-compute loop	2024-02-02 16:07:35 +00:00
Christian Schwarz	84169c926a	Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch	2024-02-02 15:53:57 +00:00
Christian Schwarz	acdebf2cec	Merge branch 'problame/2024-02-walredo-work/prespawn/split-code' into problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo	2024-02-02 15:53:56 +00:00
Christian Schwarz	44cb5e5be6	Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code	2024-02-02 15:53:55 +00:00
John Spray	46fb1a90ce	pageserver: avoid calculating/sending logical sizes on shard !=0 (#6567 ) ## Problem Sharded tenants only maintain accurate relation sizes on shard 0. Therefore logical size can only be calculated on shard 0. Fortunately it is also only _needed_ on shard 0, to provide Safekeeper feedback and to send consumption metrics. Closes: #6307 ## Summary of changes - Send 0 for logical size to safekeepers on shards !=0 - Skip logical size warmup task on shards !=0 - Skip imitate_layer_accesses on shards !=0	2024-02-02 15:52:03 +00:00
Christian Schwarz	2ab2608d4c	[DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell The API is nice, exactly what we want, but we would want a more optimistic underlying sync primitive.	2024-02-02 15:36:15 +00:00
Christian Schwarz	de7d366df3	wip	2024-02-02 15:25:39 +00:00
Christian Schwarz	7c1b2dc9ef	Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/impl	2024-02-02 14:56:29 +00:00
Christian Schwarz	f73aa3eb32	refactor(walredo): avoid the need for a WalRedoManager in broken tenants When we'll later introduce a global pool of pre-spawned walredo processes (https://github.com/neondatabase/neon/issues/6581), this refactoring avoids plumbing through the reference to the pool to all the places where we create a broken tenant. Builds atop the refactoring in #6583	2024-02-02 14:52:53 +00:00
Christian Schwarz	2374e1318e	Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code	2024-02-02 14:42:30 +00:00
Christian Schwarz	8fe3c9ff55	wip	2024-02-02 14:42:00 +00:00
Christian Schwarz	8e8890530c	wip	2024-02-02 14:26:26 +00:00
John Spray	56171cbe8c	pageserver: more permissive activation timeout when testing (#6564 ) ## Problem The 5 second activation timeout is appropriate for production environments, where we want to give a prompt response to the cloud control plane, and if we fail it will retry the call. In tests however, we don't want every call to e.g. timeline create to have to come with a retry wrapper. This issue has always been there, but it is more apparent in sharding tests that concurrently attach several tenant shards. Closes: https://github.com/neondatabase/neon/issues/6563 ## Summary of changes When `testing` feature is enabled, make `ACTIVE_TENANT_TIMEOUT` 30 seconds instead of 5 seconds.	2024-02-02 15:14:42 +01:00
Arpad Müller	48b05b7c50	Add a time_travel_remote_storage http endpoint (#6533 ) Adds an endpoint to the pageserver to S3-recover an entire tenant to a specific given timestamp. Required input parameters: * `travel_to`: the target timestamp to recover the S3 state to * `done_if_after`: a timestamp that marks the beginning of the recovery process. retries of the query should keep this value constant. it must be after `travel_to`, and also after any changes we want to revert, and must represent a point in time before the endpoint is being called, all of these time points in terms of the time source used by S3. these criteria need to hold even in the face of clock differences, so I recommend waiting a specific amount of time, then taking `done_if_after`, then waiting some amount of time again, and only then issuing the request. Also important to note: the timestamps in S3 work at second accuracy, so one needs to add generous waits before and after for the process to work smoothly (at least 2-3 seconds). We ignore the added test for the mocked S3 for now due to a limitation in moto: https://github.com/getmoto/moto/issues/7300 . Part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 14:52:12 +01:00
Conrad Ludgate	0856fe6676	proxy: remove per client bytes (#5466 ) ## Problem Follow up to #5461 In my memory usage/fragmentation measurements, these metrics came up as a large source of small allocations. The replacement metric has been in use for a long time now so I think it's good to finally remove this. Per-endpoint data is still tracked elsewhere ## Summary of changes remove the per-client bytes metrics	2024-02-02 12:28:48 +00:00
Christian Schwarz	014147a644	wip	2024-02-02 11:50:43 +00:00
Christian Schwarz	aa0e9fdaef	Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code	2024-02-02 11:50:15 +00:00
Alexander Bayandin	4133d14a77	Compute: pgbouncer 1.22.0 (#6582 ) ## Problem Update pgbouncer from 1.21 (and patches[0][1]) to 1.22 (which includes these patches) - [0] https://github.com/pgbouncer/pgbouncer/pull/972 - [1] https://github.com/pgbouncer/pgbouncer/pull/998 ## Summary of changes - Build pgbouncer 1.22.0 for neonVMs from upstream	2024-02-02 11:49:11 +00:00
Alexander Bayandin	30c9e145d7	check-macos-build: switch job to macos-14 (M1) (#6539 ) ## Problem - GitHub made available `macos-14` runners, and they run on M1 processors[0] - The price is the same as Intel-based runners — "macOS \| 3 or 4 (M1 or Intel) \| $0.08"[1], but runners on Apple Silicon should be significantly faster than their Intel counterparts. - Most developers who use macOS use Apple Silicon-based Macs nowadays. - [0] https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/ - [1] https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates ## Summary of changes - Run `check-macos-build` on `macos-14`	2024-02-02 10:51:20 +00:00
John Spray	24e916d37f	pageserver: fix a syntax error in swagger (#6566 ) A description was written as a follow-on to a section line, rather than in the proper `description:` part. This caused swagger parsers to rightly reject it.	2024-02-02 10:35:09 +00:00
Andreas Scherbaum	23f58145ed	Update wording for better readability (#6559 ) Update wording, add spaces in commandline arguments Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-02-02 11:22:32 +01:00
Christian Schwarz	9b8aa270b8	cleanups	2024-02-02 10:19:18 +00:00
Christian Schwarz	4571db1750	extract NeonWalRecord apply logic	2024-02-02 10:14:50 +00:00
Christian Schwarz	6fe534fea3	move protocol ad child module of `process`, where it belongs	2024-02-02 10:05:50 +00:00
Christian Schwarz	8b258e20a0	move more stuff around	2024-02-02 10:03:40 +00:00
Christian Schwarz	29eec6c563	split off walredo process & protocol from walredo.rs	2024-02-02 09:59:31 +00:00
Heikki Linnakangas	350865392c	Print checkpoint key contents with "pagectl print-layer-file" (#6541 ) This was very useful in debugging the bugs fixed in #6410 and #6502. There's a lot more we could do. This only adds the printing to delta layers, not image layers, for example, and it might be useful to print details of more record types. But this is a good start.	2024-02-02 01:35:31 +02:00
Christian Schwarz	1be5e564ce	feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574 ) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs https://github.com/neondatabase/neon/issues/6565	2024-02-01 22:38:34 +01:00
Christian Schwarz	7a70ef991f	feat(walredo): various observability improvements (#6573 ) - log when we start walredo process - include tenant shard id in walredo argv - dump some basic walredo state in tenant details api - more suitable walredo process launch histogram buckets - avoid duplicate tracing labels in walredo launch spans	2024-02-01 21:59:40 +01:00
Sasha Krassovsky	be30388901	Add retry to fetching basebackup (#6537 ) ## Problem Currently we have no retry mechanism for fetching basebackup. If there's an unstable connection, starting compute will just fail. ## Summary of changes Adds an exponential backoff with 7 retries to get the basebackup.	2024-02-01 20:50:04 +00:00
Heikki Linnakangas	3525080031	Fix pgvector 0.6.0 with Neon. (#6571 ) The previous patch was broken. rd_smgr as not open yet, need to use RelationGetSmgr() to access it.	2024-02-01 20:48:31 +00:00

1 2 3 4 5 ...

4543 Commits