rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 13:10:37 +00:00

Author	SHA1	Message	Date
Christian Schwarz	b1e21c7705	Add trailing dot	2024-05-05 17:17:42 +00:00
Christian Schwarz	004af53035	git diff reduction & polish	2024-05-05 17:15:09 +00:00
Christian Schwarz	d8702dd819	Merge branch 'problame/test-suite-narrow-pageserver-config-override' into problame/remove-pageserver-config-overrides	2024-05-05 16:57:51 +00:00
Christian Schwarz	5f04224817	remove NEON_PAGESERVER_OVERRIDES env var	2024-05-05 16:53:36 +00:00
Christian Schwarz	9c547da6a6	reduce scope of this PR & fix naming	2024-05-05 16:49:20 +00:00
Christian Schwarz	6d343feef0	whitespace diff reduction	2024-05-05 16:20:58 +00:00
Christian Schwarz	f73c8c6bd6	Merge branch 'problame/test-suite-narrow-pageserver-config-override' into problame/remove-pageserver-config-overrides Conflicts: control_plane/src/pageserver.rs => pick ours	2024-05-05 16:19:11 +00:00
Christian Schwarz	51224c84c2	even more diff reduction	2024-05-05 16:16:59 +00:00
Christian Schwarz	6bccd64514	miimize diff by re-adding whitespace	2024-05-05 15:59:05 +00:00
Christian Schwarz	28c95e4207	Merge branch 'problame/test-suite-narrow-pageserver-config-override' into problame/remove-pageserver-config-overrides	2024-05-04 16:15:45 +00:00
Christian Schwarz	aacf8110a0	pretty up the inlined override code, long option --config-override	2024-05-04 16:12:50 +00:00
Christian Schwarz	70977afd07	ruff check & format	2024-05-04 15:14:34 +00:00
Christian Schwarz	7363b44b50	neon_local: remove --pageserver-config-overrides, `neon_local init` takes a toml tempfile	2024-05-04 15:13:28 +00:00
Christian Schwarz	cc64e1b17f	Merge branch 'problame/test-suite-narrow-pageserver-config-override' into problame/remove-pageserver-config-overrides	2024-05-04 13:17:25 +00:00
Christian Schwarz	25dfafc2df	undo the renaming, it's too much churn for review; will do in a separate PR	2024-05-04 13:14:41 +00:00
Christian Schwarz	d72fe6f5ee	no neon_local_overrides during start(); inline it into `PageServerNode::init`	2024-05-04 13:10:45 +00:00
Christian Schwarz	0bca1a5de3	Revert "neon_local: only set --pageserver-config-override=remote_storage during init, not start" This reverts commit `511f593360`.	2024-05-04 12:33:02 +00:00
Christian Schwarz	511f593360	neon_local: only set --pageserver-config-override=remote_storage during init, not start	2024-05-04 12:30:06 +00:00
Christian Schwarz	b96e0b2458	rely on `init` to store remote storage config in pageserver.toml This allows inlining append_pageserver_param_overrides into NeonCli.init()	2024-05-04 12:07:33 +00:00
Christian Schwarz	b4ed3b15b9	remove support for `pageserver -c/--config-override` and `neon_local --pageserver-config-override`	2024-05-04 11:50:59 +00:00
Christian Schwarz	ad185dd594	test_suite: remove usage of `--pageserver-config-override` Rewrite the pageserver.toml instead.	2024-05-04 11:50:59 +00:00
Christian Schwarz	58055c7a96	remove NEON_PAGESERVER_OVERRIDES env var (no committed code uses it)	2024-05-04 11:50:59 +00:00
Christian Schwarz	ec04f0f4d4	Merge branch 'problame/remove-pageserver-update-config-flag' into problame/test-suite-narrow-pageserver-config-override	2024-05-04 11:48:12 +00:00
Christian Schwarz	a52b563b59	fixups	2024-05-04 11:47:38 +00:00
Christian Schwarz	89afba066c	refactor(test_suite): rely less on `--pageserver-config-override` outside of `neon_local init` The `NeonCli.init()` persists the non-default pageserver config values for remote storage & `NeonEnvBuilder.pageserver_config_override` in `pageserver.toml`. We don't need to repeat them on each pageserver start after that.	2024-05-04 11:34:59 +00:00
Christian Schwarz	6bcb0959ad	ruff format	2024-05-04 10:25:53 +00:00
Christian Schwarz	8f3051b416	Merge branch 'main' into problame/remove-pageserver-update-config-flag	2024-05-04 10:22:05 +00:00
Christian Schwarz	998dc6255e	refactor(pageserver): remove --update-init flag	2024-05-04 10:20:41 +00:00
Heikki Linnakangas	4deb8dc52e	compute_ctl: Be more precise in how startup time is calculated (#7601 ) - On a non-pooled start, do not reset the 'start_time' after launching the HTTP service. In a non-pooled start, it's fair to include that in the total startup time. - When setting wait_for_spec_ms and resetting start_time, call Utc::now() only once. It's a waste of cycles to call it twice, but also, it ensures the time between setting wait_for_spec_ms and resetting start_time is included in one or the other time period. These differences should be insignificant in practice, in the microsecond range, but IMHO it seems more logical and readable this way too. Also fix and clarify some of the surrounding comments. (This caught my eye while reviewing PR #7577)	2024-05-04 08:44:18 +03:00
Em Sharnoff	64f0613edf	compute_ctl: Add support for swap resizing (#7434 ) Part of neondatabase/cloud#12047. Resolves #7239. In short, this PR: 1. Adds `ComputeSpec.swap_size_bytes: Option<u64>` 2. Adds a flag to compute_ctl: `--resize-swap-on-bind` 3. Implements running `/neonvm/bin/resize-swap` with the value from the compute spec before starting postgres, if both the value in the spec AND the flag are specified. 4. Adds `sudo` to the final image 5. Adds a file in `/etc/sudoers.d` to allow `compute_ctl` to resize swap Various bits of reasoning about design decisions in the added comments. In short: We have both a compute spec field and a flag to make rollout easier to implement. The flag will most likely be removed as part of cleanups for neondatabase/cloud#12047.	2024-05-03 12:57:45 -07:00
Christian Schwarz	1e7cd6ac9f	refactor: move `NodeMetadata` to `pageserver_api`; use it from `neon_local` (#7606 ) This is the first step towards representing all of Pageserver configuration as clean `serde::Serialize`able Rust structs in `pageserver_api`. The `neon_local` code will then use those structs instead of the crude `toml_edit` / string concatenation that it does today. refs https://github.com/neondatabase/neon/issues/7555 --------- Co-authored-by: Alex Chi Z <iskyzh@gmail.com>	2024-05-03 13:15:38 -04:00
Alex Chi Z	ef03b38e52	fix(pageserver): remove update_gc_info calls in tests (#7608 ) introduced by https://github.com/neondatabase/neon/pull/7468 conflicting with https://github.com/neondatabase/neon/pull/7584 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 16:01:33 +00:00
Conrad Ludgate	9b65946566	proxy: add connect compute concurrency lock (#7607 ) ## Problem Too many connect_compute attempts can overwhelm postgres, getting the connections stuck. ## Summary of changes Limit number of connection attempts that can happen at a given time.	2024-05-03 15:45:24 +00:00
Christian Schwarz	700aa96770	Merge branch 'main' into problame/move-pageserver-config-into-api-crate	2024-05-03 15:20:24 +00:00
Christian Schwarz	4a72fe0908	add requested backward-compatibility test	2024-05-03 15:19:18 +00:00
Alex Chi Z	a3fe12b6d8	feat(pageserver): add scan interface (#7468 ) This pull request adds the scan interface. Scan operates on a sparse keyspace and retrieves all the key-value pairs from the keyspaces. Currently, scan only supports the metadata keyspace, and by default do not retrieve anything from the ancestor branch. This should be fixed in the future if we need to have some keyspaces that inherits from the parent. The scan interface reuses the vectored get code path by disabling the missing key errors. This pull request also changes the behavior of vectored get on aux file v1/v2 key/keyspace: if the key is not found, it is simply not included in the result, instead of throwing a missing key error. TODOs in future pull requests: limit memory consumption, ensure the search stops when all keys are covered by the image layer, remove `#[allow(dead_code)]` once the code path is used in basebackups / aux files, remove unnecessary fine-grained keyspace tracking in vectored get (or have another code path for scan) to improve performance. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:43:30 -04:00
John Spray	b5a6e68e68	storage controller: check warmth of secondary before doing proactive migration (#7583 ) ## Problem The logic in Service::optimize_all would sometimes choose to migrate a tenant to a secondary location that was only recently created, resulting in Reconciler::live_migrate hitting its 5 minute timeout warming up the location, and proceeding to attach a tenant to a location that doesn't have a warm enough local set of layer files for good performance. Closes: #7532 ## Summary of changes - Add a pageserver API for checking download progress of a secondary location - During `optimize_all`, connect to pageservers of candidate optimization secondary locations, and check they are warm. - During shard split, do heatmap uploads and start secondary downloads, so that the new shards' secondary locations start downloading ASAP, rather than waiting minutes for background downloads to kick in. I have intentionally not implemented this by continuously reading the status of locations, to avoid dealing with the scale challenge of efficiently polling & updating 10k-100k locations status. If we implement that in the future, then this code can be simplified to act based on latest state of a location rather than fetching it inline during optimize_all.	2024-05-03 14:28:23 +00:00
Christian Schwarz	ce0ddd749c	test_runner: remove unused `NeonPageserver.config_override` field (#7605 ) refs https://github.com/neondatabase/neon/issues/7555	2024-05-03 16:05:00 +02:00
Arpad Müller	426598cf76	Update rust to 1.78.0 (#7598 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html Prior update was in #7198	2024-05-03 15:59:28 +02:00
Christian Schwarz	923cdff13d	Merge branch 'main' into problame/move-pageserver-config-into-api-crate	2024-05-03 12:36:18 +00:00
Christian Schwarz	498edfc0ff	use NodeMetadata struct for writing metadata.json from neon_local	2024-05-03 12:35:41 +00:00
Christian Schwarz	d2e2a88737	move NodeMetadata type to pageserver_api::config	2024-05-03 12:35:41 +00:00
Christian Schwarz	6f720eb38f	create `config` module inside pageserver_api crate	2024-05-03 12:35:41 +00:00
John Spray	8b4dd5dc27	pageserver: jitter secondary periods (#7544 ) ## Problem After some time the load from heatmap uploads gets rather spiky. They're unintentionally synchronising. Chart (does this make a _boing_ sound in anyone else's head?): ![image](https://github.com/neondatabase/neon/assets/944640/18829fc8-c5b7-4739-9a9b-491b5d6fcade) ## Summary of changes - Add a helper `period_jitter` and apply a 5% jitter from downloader and heatmap_uploader when updating the next runtime at the end of an interation. - Refactor existing places that we pick a startup interval into `period_warmup`, so that the intent is obvious.	2024-05-03 12:31:25 +00:00
Joonas Koivunen	ed9a114bde	fix: find gc cutoff points without holding Tenant::gc_cs (#7585 ) The current implementation of finding timeline gc cutoff Lsn(s) is done while holding `Tenant::gc_cs`. In recent incidents long create branch times were caused by holding the `Tenant::gc_cs` over extremely long `Timeline::find_lsn_by_timestamp`. The fix is to find the GC cutoff values before taking the `Tenant::gc_cs` lock. This change is safe to do because the GC cutoff values and the branch points have no dependencies on each other. In the case of `Timeline::find_gc_cutoff` taking a long time with this change, we should no longer see `Tenant::gc_cs` interfering with branch creation. Additionally, the `Tenant::refresh_gc_info` is now tolerant of timeline deletions (or any other failures to find the pitr_cutoff). This helps with the synthetic size calculation being constantly completed instead of having a break for a timely timeline deletion. Fixes: #7560 Fixes: #7587	2024-05-03 14:57:26 +03:00
John Spray	b7385bb016	storage_controller: fix non-timeline passthrough GETs (#7602 ) ## Problem We were matching on `/tenant/:tenant_id` and `/tenant/:tenant_id/timeline`, but not non-timeline tenant sub-paths. There aren't many: this was only noticeable when using the synthetic_size endpoint by hand. ## Summary of changes - Change the wildcard from `/tenant/:tenant_id/timeline` to `/tenant/:tenant_id/*` - Add test lines that exercise this	2024-05-03 12:52:43 +01:00
Vlad Lazar	37b1930b2f	tests: relax test download remote layers api (#7604 ) ## Problem This test triggers layer download failures on demand. It is possible to modify the failpoint during a `Timeline::get_vectored` right between the vectored read and it's validation read. This means that one of the reads can fail while the other one succeeds and vice versa. ## Summary of changes These errors are expected, so allow them to happen.	2024-05-03 12:40:09 +01:00
Arpad Müller	d76963691f	Increase Azure parallelism limit to 100 (#7597 ) After #5563 has been addressed we can now set the Azure strorage parallelism limit to 100 like it is for S3. Part of #5567	2024-05-03 13:23:11 +02:00
Joonas Koivunen	60f570c70d	refactor(update_gc_info): split GcInfo to compose out of GcCutoffs (#7584 ) Split `GcInfo` and replace `Timeline::update_gc_info` with a method that simply finds gc cutoffs `Timeline::find_gc_cutoffs` to be combined as `Timeline::gc_info` at the caller. This change will be followed up with a change that finds the GC cutoff values before taking the `Tenant::gc_cs` lock. Cc: #7560	2024-05-03 13:11:51 +03:00
Alex Chi Z	3582a95c87	fix(pageserver): compile warning of download_object.ctx on macos (#7596 ) fix macOS compile warning introduced in `45ec8688ea` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:55:48 +02:00

1 2 3 4 5 ...

5172 Commits