rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 09:22:55 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	b9a4326fbd	fmt	2024-05-07 10:44:33 +01:00
Conrad Ludgate	85033e05c9	hakari	2024-05-07 08:35:18 +01:00
Conrad Ludgate	ca578449e4	simplify Cache invalidate trait, reduce EndpointCacheKey	2024-05-07 08:34:21 +01:00
Conrad Ludgate	ef3a9dfafa	proxy: moka cache	2024-05-07 07:59:23 +01:00
Christian Schwarz	ac7dc82103	use less `neon_local --pageserver-config-override` / `pageserver -c` (#7613 )	2024-05-06 22:31:26 +02:00
Anna Khanova	f1b654b77d	proxy: reduce number of concurrent connections (#7620 ) ## Problem Usually, the connection itself is quite fast (bellow 10ms for p999: https://neonprod.grafana.net/goto/aOyn8vYIg?orgId=1). It doesn't make a lot of sense to wait for a lot of time for the lock, if it takes a lot of time to acquire it, probably, something goes wrong. We also spawn a lot of retries, but they are not super helpful (0 means that it was connected successfully, 1, most probably, that it was re-request of the compute node address https://neonprod.grafana.net/goto/J_8VQvLIR?orgId=1). Let's try to keep a small number of retries.	2024-05-06 19:03:25 +00:00
Sasha Krassovsky	7dd58e1449	On-demand WAL download for walsender (#6872 ) ## Problem There's allegedly a bug where if we connect a subscriber before WAL is downloaded from the safekeeper, it creates an error. ## Summary of changes Adds support for pausing safekeepers from sending WAL to computes, and then creates a compute and attaches a subscriber while it's in this paused state. Fails to reproduce the issue, but probably a good test to have --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-06 10:54:07 -07:00
Arpad Müller	f3af5f4660	Fix test_ts_of_lsn_api flakiness (#7599 ) Changes parameters to fix the flakiness of `test_ts_of_lsn_api`. Already now, the amount of flakiness of the test is pretty low. With this, it's even lower. cc #5768	2024-05-06 16:41:51 +00:00
Joonas Koivunen	a96e15cb6b	test: less flaky test_synthetic_size_while_deleting (#7622 ) #7585 introduced test case for deletions while synthetic size is being calculated. The test has a race against deletion, but we only accept one outcome. Fix it to accept 404 as well, as we cannot control from outside which outcome happens. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7456/8970595458/index.html#/testresult/32a5b2f8c4094bdb	2024-05-06 15:52:51 +00:00
Christian Schwarz	df1def7018	refactor(pageserver): remove --update-init flag (#7612 ) We don't actually use it. refs https://github.com/neondatabase/neon/issues/7555	2024-05-06 16:40:44 +02:00
Tristan Partin	69337be5c2	Fix grammar in provider.rs error message s/temporary/temporarily --------- Co-authored-by: Barry Grenon <barry_grenon@yahoo.ca>	2024-05-06 09:14:42 -05:00
John Spray	67a2215163	pageserver: label tenant_slots metric by slot type (#7603 ) ## Problem The current `tenant_slots` metric becomes less useful once we have lots of secondaries, because we can't tell how many tenants are really attached (without doing a sum() on some other metric). ## Summary of changes - Add a `mode` label to this metric - Update the metric with `slot_added` and `slot_removed` helpers that are called at all the places we mutate the tenants map. - Add a debug assertion at shutdown that checks the metrics add up to the right number, as a cheap way of validating that we're calling the metric hooks in all the right places.	2024-05-06 14:07:15 +01:00
John Spray	3764dd2e84	pageserver: call maybe_freeze_ephemeral_layer from a dedicated task (#7594 ) ## Problem In testing of the earlier fix for OOMs under heavy write load (https://github.com/neondatabase/neon/pull/7218), we saw that the limit on ephemeral layer size wasn't being reliably enforced. That was diagnosed as being due to overwhelmed compaction loops: most tenants were waiting on the semaphore for background tasks, and thereby not running the function that proactively rolls layers frequently enough. Related: https://github.com/neondatabase/neon/issues/6939 ## Summary of changes - Create a new per-tenant background loop for "ingest housekeeping", which invokes maybe_freeze_ephemeral_layer() without taking the background task semaphore. - Downgrade to DEBUG a log line in maybe_freeze_ephemeral_layer that had been INFO, but turns out to be pretty common in the field. There's some discussion on the issue (https://github.com/neondatabase/neon/issues/6939#issuecomment-2083554275) about alternatives for calling this maybe_freeze_epemeral_layer periodically without it getting stuck behind compaction. A whole task just for this feels like kind of a big hammer, but we may in future find that there are other pieces of lightweight housekeeping that we want to do here too. Why is it okay to call maybe_freeze_ephemeral_layer outside of the background tasks semaphore? - this is the same work we would do anyway if we receive writes from the safekeeper, just done a bit sooner. - The period of the new task is generously jittered (+/- 5%), so when the ephemeral layer size tips over the threshold, we shouldn't see an excessively aggressive thundering herd of layer freezes (and only layers larger than the mean layer size will be frozen) - All that said, this is an imperfect approach that relies on having a generous amount of RAM to dip into when we need to freeze somewhat urgently. It would be nice in future to also block compaction/GC when we recognize resource stress and need to do other work (like layer freezing) to reduce memory footprint.	2024-05-06 14:07:07 +01:00
Heikki Linnakangas	0115fe6cb2	Make 'neon.protocol_version = 2' the default (#7616 ) Once all the computes in production have restarted, we can remove protocol version 1 altogether. See issue #6211.	2024-05-06 14:37:55 +03:00
Arseny Sher	e6da7e29ed	Add option allowing running multiple endpoints on the same branch. This is used by safekeeper tests.	2024-05-06 11:08:51 +03:00
Arseny Sher	0353a72a00	pg_waldump segment on safekeeper in test_pg_waldump. To test it as well.	2024-05-06 07:18:38 +03:00
Arseny Sher	ce4d3da3ae	Properly initialize first WAL segment on safekeepers. Previously its segment header and page header of first record weren't initialized because compute streams data only since first record LSN. Also, fix a bug in the existing code for initialization: xlp_rem_len must not include page header. These changes make first segment pg_waldump'able.	2024-05-06 07:18:38 +03:00
Arseny Sher	5da3e2113a	Allow bad state (not active) pageserver error/warns in walcraft test. The top reason for it being flaky.	2024-05-06 06:45:27 +03:00
Heikki Linnakangas	4deb8dc52e	compute_ctl: Be more precise in how startup time is calculated (#7601 ) - On a non-pooled start, do not reset the 'start_time' after launching the HTTP service. In a non-pooled start, it's fair to include that in the total startup time. - When setting wait_for_spec_ms and resetting start_time, call Utc::now() only once. It's a waste of cycles to call it twice, but also, it ensures the time between setting wait_for_spec_ms and resetting start_time is included in one or the other time period. These differences should be insignificant in practice, in the microsecond range, but IMHO it seems more logical and readable this way too. Also fix and clarify some of the surrounding comments. (This caught my eye while reviewing PR #7577)	2024-05-04 08:44:18 +03:00
Em Sharnoff	64f0613edf	compute_ctl: Add support for swap resizing (#7434 ) Part of neondatabase/cloud#12047. Resolves #7239. In short, this PR: 1. Adds `ComputeSpec.swap_size_bytes: Option<u64>` 2. Adds a flag to compute_ctl: `--resize-swap-on-bind` 3. Implements running `/neonvm/bin/resize-swap` with the value from the compute spec before starting postgres, if both the value in the spec AND the flag are specified. 4. Adds `sudo` to the final image 5. Adds a file in `/etc/sudoers.d` to allow `compute_ctl` to resize swap Various bits of reasoning about design decisions in the added comments. In short: We have both a compute spec field and a flag to make rollout easier to implement. The flag will most likely be removed as part of cleanups for neondatabase/cloud#12047.	2024-05-03 12:57:45 -07:00
Christian Schwarz	1e7cd6ac9f	refactor: move `NodeMetadata` to `pageserver_api`; use it from `neon_local` (#7606 ) This is the first step towards representing all of Pageserver configuration as clean `serde::Serialize`able Rust structs in `pageserver_api`. The `neon_local` code will then use those structs instead of the crude `toml_edit` / string concatenation that it does today. refs https://github.com/neondatabase/neon/issues/7555 --------- Co-authored-by: Alex Chi Z <iskyzh@gmail.com>	2024-05-03 13:15:38 -04:00
Alex Chi Z	ef03b38e52	fix(pageserver): remove update_gc_info calls in tests (#7608 ) introduced by https://github.com/neondatabase/neon/pull/7468 conflicting with https://github.com/neondatabase/neon/pull/7584 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 16:01:33 +00:00
Conrad Ludgate	9b65946566	proxy: add connect compute concurrency lock (#7607 ) ## Problem Too many connect_compute attempts can overwhelm postgres, getting the connections stuck. ## Summary of changes Limit number of connection attempts that can happen at a given time.	2024-05-03 15:45:24 +00:00
Alex Chi Z	a3fe12b6d8	feat(pageserver): add scan interface (#7468 ) This pull request adds the scan interface. Scan operates on a sparse keyspace and retrieves all the key-value pairs from the keyspaces. Currently, scan only supports the metadata keyspace, and by default do not retrieve anything from the ancestor branch. This should be fixed in the future if we need to have some keyspaces that inherits from the parent. The scan interface reuses the vectored get code path by disabling the missing key errors. This pull request also changes the behavior of vectored get on aux file v1/v2 key/keyspace: if the key is not found, it is simply not included in the result, instead of throwing a missing key error. TODOs in future pull requests: limit memory consumption, ensure the search stops when all keys are covered by the image layer, remove `#[allow(dead_code)]` once the code path is used in basebackups / aux files, remove unnecessary fine-grained keyspace tracking in vectored get (or have another code path for scan) to improve performance. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:43:30 -04:00
John Spray	b5a6e68e68	storage controller: check warmth of secondary before doing proactive migration (#7583 ) ## Problem The logic in Service::optimize_all would sometimes choose to migrate a tenant to a secondary location that was only recently created, resulting in Reconciler::live_migrate hitting its 5 minute timeout warming up the location, and proceeding to attach a tenant to a location that doesn't have a warm enough local set of layer files for good performance. Closes: #7532 ## Summary of changes - Add a pageserver API for checking download progress of a secondary location - During `optimize_all`, connect to pageservers of candidate optimization secondary locations, and check they are warm. - During shard split, do heatmap uploads and start secondary downloads, so that the new shards' secondary locations start downloading ASAP, rather than waiting minutes for background downloads to kick in. I have intentionally not implemented this by continuously reading the status of locations, to avoid dealing with the scale challenge of efficiently polling & updating 10k-100k locations status. If we implement that in the future, then this code can be simplified to act based on latest state of a location rather than fetching it inline during optimize_all.	2024-05-03 14:28:23 +00:00
Christian Schwarz	ce0ddd749c	test_runner: remove unused `NeonPageserver.config_override` field (#7605 ) refs https://github.com/neondatabase/neon/issues/7555	2024-05-03 16:05:00 +02:00
Arpad Müller	426598cf76	Update rust to 1.78.0 (#7598 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html Prior update was in #7198	2024-05-03 15:59:28 +02:00
John Spray	8b4dd5dc27	pageserver: jitter secondary periods (#7544 ) ## Problem After some time the load from heatmap uploads gets rather spiky. They're unintentionally synchronising. Chart (does this make a _boing_ sound in anyone else's head?): ![image](https://github.com/neondatabase/neon/assets/944640/18829fc8-c5b7-4739-9a9b-491b5d6fcade) ## Summary of changes - Add a helper `period_jitter` and apply a 5% jitter from downloader and heatmap_uploader when updating the next runtime at the end of an interation. - Refactor existing places that we pick a startup interval into `period_warmup`, so that the intent is obvious.	2024-05-03 12:31:25 +00:00
Joonas Koivunen	ed9a114bde	fix: find gc cutoff points without holding Tenant::gc_cs (#7585 ) The current implementation of finding timeline gc cutoff Lsn(s) is done while holding `Tenant::gc_cs`. In recent incidents long create branch times were caused by holding the `Tenant::gc_cs` over extremely long `Timeline::find_lsn_by_timestamp`. The fix is to find the GC cutoff values before taking the `Tenant::gc_cs` lock. This change is safe to do because the GC cutoff values and the branch points have no dependencies on each other. In the case of `Timeline::find_gc_cutoff` taking a long time with this change, we should no longer see `Tenant::gc_cs` interfering with branch creation. Additionally, the `Tenant::refresh_gc_info` is now tolerant of timeline deletions (or any other failures to find the pitr_cutoff). This helps with the synthetic size calculation being constantly completed instead of having a break for a timely timeline deletion. Fixes: #7560 Fixes: #7587	2024-05-03 14:57:26 +03:00
John Spray	b7385bb016	storage_controller: fix non-timeline passthrough GETs (#7602 ) ## Problem We were matching on `/tenant/:tenant_id` and `/tenant/:tenant_id/timeline`, but not non-timeline tenant sub-paths. There aren't many: this was only noticeable when using the synthetic_size endpoint by hand. ## Summary of changes - Change the wildcard from `/tenant/:tenant_id/timeline` to `/tenant/:tenant_id/*` - Add test lines that exercise this	2024-05-03 12:52:43 +01:00
Vlad Lazar	37b1930b2f	tests: relax test download remote layers api (#7604 ) ## Problem This test triggers layer download failures on demand. It is possible to modify the failpoint during a `Timeline::get_vectored` right between the vectored read and it's validation read. This means that one of the reads can fail while the other one succeeds and vice versa. ## Summary of changes These errors are expected, so allow them to happen.	2024-05-03 12:40:09 +01:00
Arpad Müller	d76963691f	Increase Azure parallelism limit to 100 (#7597 ) After #5563 has been addressed we can now set the Azure strorage parallelism limit to 100 like it is for S3. Part of #5567	2024-05-03 13:23:11 +02:00
Joonas Koivunen	60f570c70d	refactor(update_gc_info): split GcInfo to compose out of GcCutoffs (#7584 ) Split `GcInfo` and replace `Timeline::update_gc_info` with a method that simply finds gc cutoffs `Timeline::find_gc_cutoffs` to be combined as `Timeline::gc_info` at the caller. This change will be followed up with a change that finds the GC cutoff values before taking the `Tenant::gc_cs` lock. Cc: #7560	2024-05-03 13:11:51 +03:00
Alex Chi Z	3582a95c87	fix(pageserver): compile warning of download_object.ctx on macos (#7596 ) fix macOS compile warning introduced in `45ec8688ea` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:55:48 +02:00
Jure Bajic	00423152c6	Store operation identifier in `IdLockMap` on exclusive lock (#7397 ) ## Problem Issues around operation and tenant locks would have been hard to debug since there was little observability around them. ## Summary of changes - As suggested in the issue, a wrapper was added around `OwnedRwLockWriteGuard` called `IdentifierLock` that removes the operation currently holding the exclusive lock when it's dropped. - The value in `IdLockMap` was extended to hold a pair of locks and operations that can be accessed and locked independently. - When requesting an exclusive lock besides returning the lock on that resource, an operation is changed if the lock is acquired. Closes https://github.com/neondatabase/neon/issues/7108	2024-05-03 09:38:19 +01:00
Anna Khanova	240efb82f9	Proxy reconnect pubsub before expiration (#7562 ) ## Problem Proxy reconnects to redis only after it's already unavailable. ## Summary of changes Reconnects every 6h.	2024-05-03 10:00:29 +02:00
Arpad Müller	5f099dc760	Use streaming downloads for Azure as well (#7579 ) The main challenge was in the second commit, as `DownloadStream` requires the inner to be Sync but the stream returned by the Azure SDK wasn't Sync. This left us with three options: * Change the Azure SDK to return Sync streams. This was abandoned after we realized that we couldn't just make `TokenCredential`'s returned future Sync: it uses the `async_trait` macro and as the `TokenCredential` trait is used in dyn form, one can't use Rust's new "async fn in Trait" feature. * Change `DownloadStream` to not require `Sync`. This was abandoned after it turned into a safekeeper refactoring project. * Put the stream into a `Mutex` and make it obtain a lock on every poll. This adds some performance overhead but locks that actually don't do anything should be comparatively cheap. We went with the third option in the end as the change still represents an improvement. Follow up of #5446 , fixes #5563	2024-05-02 20:19:00 +02:00
Arpad Müller	7a49e5d5c2	Remove tenant_id from TenantLocationConfigRequest (#7469 ) Follow-up of #7055 and #7476 to remove `tenant_id` from `TenantLocationConfigRequest` completely. All components of our system should now not specify the `tenant_id`. cc https://github.com/neondatabase/cloud/pull/11791	2024-05-02 20:18:13 +02:00
Christian Schwarz	45ec8688ea	chore(pageserver): plumb through RequestContext to VirtualFile write methods (#7566 ) This PR introduces no functional changes. The read path will be done separately. refs https://github.com/neondatabase/neon/issues/6107 refs https://github.com/neondatabase/neon/issues/7386	2024-05-02 18:58:10 +02:00
Alex Chi Z	4b55dad813	vm-image: add sqlexporter for autoscaling metrics (#7514 ) As discussed in https://github.com/neondatabase/autoscaling/pull/895, we want to have a separate sql_exporter for simple metrics to avoid overload the database because the autoscaling agent needs to scrape at a higher interval. The new exporter is exposed at port 9499. I didn't do any testing for this pull request but given it's just a configuration change I assume this works. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-02 12:43:36 -04:00
Matt Podraza	ab95942fc2	storage controller: make the initial database wait configurable (#7591 ) This allows passing a humantime string in the CLI to configure the initial wait for the database. It defaults to the previously hard-coded value of 5 seconds.	2024-05-02 15:19:51 +00:00
Alex Chi Z	f656db09a4	fix(pageserver): properly propagate missing key error for vectored get (#7569 ) Some part of the code requires missing key error to be propagated to the code path correctly (i.e., aux key range scan). Currently, it's an anyhow error. * remove `stuck_lsn` from the missing key error. * as a result, when matching missing key, we do not distinguish the case `stuck_lsn = false/true`. * vectored get now use the unified missing key error. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-02 09:19:45 -04:00
Anastasia Lubennikova	69bf1bae7d	Fix usage of pg_waldump --ignore option (#7578 ) Previously, the --ignore option was only used when reading from a single file. With this PR pg_waldump -i is enough to open any neon WAL segments	2024-05-02 11:52:30 +00:00
Anna Khanova	25af32e834	proxy: keep track on the number of events from redis by type. (#7582 ) ## Problem It's unclear what is the distribution of messages, proxy is consuming from redis. ## Summary of changes Add counter.	2024-05-02 09:50:11 +00:00
Conrad Ludgate	cb4b4750ba	update to reqwest 0.12 (#7561 ) ## Problem #7557 ## Summary of changes	2024-05-02 11:16:04 +02:00
Sasha Krassovsky	d43d77389e	Add retry loops and bump test timeout in test_pageserver_connection_stress (#7281 )	2024-05-01 21:36:50 -07:00
Alex Chi Z	5558457c84	chore(pageserver): categorize basebackup errors (#7523 ) close https://github.com/neondatabase/neon/issues/7391 ## Summary of changes Categorize basebackup error into two types: server error and client error. This makes it easier to set up alerts. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-01 16:31:59 +00:00
Alex Chi Z	26e6ff8ba6	chore(pageserver): concise error message for layer traversal (#7565 ) Instead of showing the full path of layer traversal, we now only show tenant (in tracing context)+timeline+filename. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-01 11:44:42 -04:00
Arthur Petukhovsky	50a45e67dc	Discover safekeepers via broker request (#7279 ) We had an incident where pageserver requests timed out because pageserver couldn't fetch WAL from safekeepers. This incident was caused by a bug in safekeeper logic for timeline activation, which prevented pageserver from finding safekeepers. This bug was since fixed, but there is still a chance of a similar bug in the future due to overall complexity. We add a new broker message to "signal interest" for timeline. This signal will be sent by pageservers `wait_lsn`, and safekeepers will receive this signal to start broadcasting broker messages. Then every broker subscriber will be able to find the safekeepers and connect to them (to start fetching WAL). This feature is not limited to pageservers and any service that wants to download WAL from safekeepers will be able to use this discovery request. This commit changes pageserver's connection_manager (walreceiver) to send a SafekeeperDiscoveryRequest when there is no information about safekeepers present in memory. Current implementation will send these requests only if there is an active wait_lsn() call and no more often than once per 10 seconds. Add `test_broker_discovery` to test this: safekeepers started with `--disable-periodic-broker-push` will not push info to broker so that pageserver must use a discovery to start fetching WAL. Add task_stats in safekeepers broker module to log a warning if there is no message received from the broker for the last 10 seconds. Closes #5471 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-04-30 18:50:03 +00:00
Andrew Rudenko	fcbe60f436	Makefile: DISABLE_HOMEBREW variable (#7556 ) ## Problem The current Makefile assumes that homebrew is used on macos. There are other ways to install dependencies on MacOS (nix, macports, "manually"). It would be great to allow the one who wants to use other options to disable homebrew integration. ## Summary of changes It adds DISABLE_HOMEBREW variable that if set skips extra homebrew-specific configuration steps.	2024-04-30 19:44:02 +02:00

1 2 3 4 5 ...

5156 Commits