rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-21 15:10:44 +00:00

Author	SHA1	Message	Date
Christian Schwarz	c8bee86586	in some early WIP commit we had removed the loop{} inside get(); re-establish it one level down	2025-01-14 15:12:03 +01:00
Christian Schwarz	768a867dcf	doc comment fix	2025-01-14 14:54:15 +01:00
Christian Schwarz	3b65465e10	turns out with the switch to sync Mutex there's no reason for upgrade() to be async either	2025-01-14 14:53:41 +01:00
Christian Schwarz	e4ea706424	turns out PerTimelineState::shutdown() doesn't need to be async	2025-01-14 14:48:03 +01:00
Christian Schwarz	7034f54a9e	remove the earlier-commented-out assertions on arc reference counts, they were too whiteboxy to begin with	2025-01-14 14:45:25 +01:00
Christian Schwarz	d68c5ddf7e	avoid the tokio::sync::Mutex by wrapping the GateGuard into an Arc	2025-01-14 14:23:28 +01:00
Christian Schwarz	b95365b45d	Revert "experiment: what if we make Handle !Send so it can't be held across await points" This reverts commit `b44070d0c7`.	2025-01-14 13:32:12 +01:00
Christian Schwarz	b44070d0c7	experiment: what if we make Handle !Send so it can't be held across await points Result: the whole point of having a Handle at hand is to be holding a GateGuard while performing a Timeline operation. Dont' do it.	2025-01-14 13:30:48 +01:00
Christian Schwarz	6b22acba9b	avoid cloning the Arc<Timeline> on every handle upgrade/downgade, by wrapping it in yet another Arc	2025-01-14 12:28:37 +01:00
Christian Schwarz	22058d17d1	it turns out PerTimelineState need not store a Types::Timeline at all	2025-01-14 12:22:00 +01:00
Christian Schwarz	a8d096b72c	Revert "WIP experiment: avoid upgrading" This reverts commit `f6eb6fff9f`.	2025-01-14 12:18:02 +01:00
Christian Schwarz	f6eb6fff9f	WIP experiment: avoid upgrading	2025-01-14 12:14:31 +01:00
Christian Schwarz	c868ceded0	WeakHandle should store weak ref to the GateGuard	2025-01-14 12:08:53 +01:00
Christian Schwarz	e82aa9419e	convert handles to named structs	2025-01-14 12:02:25 +01:00
Christian Schwarz	62f63275b2	fix test_connection_handler_exit	2025-01-14 11:55:45 +01:00
Christian Schwarz	5b45f03aa2	tests: comment out the strong_count / weak_count assertions,	2025-01-14 11:39:19 +01:00
Christian Schwarz	3fefa5b415	fix warnings	2025-01-14 11:31:58 +01:00
Christian Schwarz	6007a94f91	it compiles	2025-01-14 11:30:43 +01:00
Christian Schwarz	9e03dda0c3	handle downgrade during batching	2025-01-13 15:49:45 +01:00
Christian Schwarz	8a0a0d06a8	renames	2025-01-13 14:38:19 +01:00
Christian Schwarz	dda31b9cb6	adjust shutdown	2025-01-13 14:38:19 +01:00
Christian Schwarz	9591d8789c	WIP	2025-01-13 14:27:32 +01:00
Christian Schwarz	ee851d1127	Merge remote-tracking branch 'origin/main' into problame/throttle-before-batching	2025-01-10 20:38:08 +01:00
Christian Schwarz	d6a2b62cfb	grand refactor of SmgrOpTimer states	2025-01-10 20:29:44 +01:00
Christian Schwarz	ad5120197c	self-review	2025-01-10 20:28:20 +01:00
Christian Schwarz	4d496a29c2	following up to the last commit, the observation points that we use to calculate the various latency metrics are different, adjust for that	2025-01-10 20:26:09 +01:00
Christian Schwarz	8793e28ccb	throttling cancel-sensitivity	2025-01-10 20:25:24 +01:00
Christian Schwarz	9b43204893	fix(page_service): Timeline::gate held open while throttling (#10314 ) When we moved throttling up from Timeline::get into page_service, we stopped being sensitive to `Timeline::cancel`, even though we're holding a Handle and thus a guard on the `Timeline::gate` open. This PR rectifies the situation. Refs - Found while investigating #10309 (hung detach because gate kept open), but not expected to be the root cause of that issue because the affected tenants are not being throttled according to their metrics.	2025-01-10 19:21:01 +00:00
Christian Schwarz	aa8da1e621	move throttle into pagestream_read_message	2025-01-10 15:35:07 +01:00
Arpad Müller	6149ac8834	Handle race between auto-offload and unarchival (#10305 ) ## Problem Auto-offloading as requested by the compaction task is racy with unarchival, in that the compaction task might attempt to offload an unarchived timeline. By that point it will already have set the timeline to the `Stopping` state however, which makes it unusable for any purpose. For example: 1. compaction task decides to offload timeline 2. timeline gets unarchived 3. `offload_timeline` gets called by compaction task * sets timeline's state to `Stopping` * realizes that the timeline can't be unarchived, errors out 6. endpoint can't be started as the timeline is `Stopping` and thus 'can't be found'. A future iteration of the compaction task can't "heal" this state either as the timeline will still not be archived, same goes for other automatic stuff. The only way to heal this is a tenant detach+attach, or alternatively a pageserver restart. Furthermore, the compaction task is especially amenable for such races as it first stores `can_offload` into a variable, figures out whether compaction is needed (which takes some time), and only then does it attempt an offload operation: the time difference between "check" and "use" is non-trivially small. To make it even worse, we start the compaction task right after attach of a tenant, and it is a common pattern by pageserver users to attach a tenant to then immediately unarchive a timeline, so that an endpoint can be started. ## Solutions not adopted The simplest solution is to move the `can_offload` check to right before attempting of the offload. But this is not a good solution, as no lock is held between that check and timeline shutdown. So races would still be possible, just become less likely. I explored using the timeline state for this, as in adding an additional enum variant. But `Timeline::set_state` is racy (#10297). ## Adopted solution We use the lock on the timeline's upload queue as an arbiter: either unarchival gets to it first and sours the state for auto-offloading, or auto-offloading shuts it down, which stops any parallel unarchival in its tracks. The key part is not releasing the upload queue's lock between the check whether the timeline is archived or not, and shutting it down (the actual implementation only sets `shutting_down` but it has the same effect on `initialized_mut()` as a full shutdown). The rest of the patch is stuff that follows from this. We also move the part where we set the state to `Stopping` to after that arbiter has decided the fate of the timeline. For deletions, we do keep it inside `DeleteTimelineFlow::prepare` however, so that it is called with all of the the timelines locks held that the function allocates (timelines lock most importantly). This is only a precautionary measure however, as I didn't want to analyze deletion related code for possible races. ## Future changes It might make sense to move `can_offload` to right before the offload attempt. Maybe some other properties might have changed as well. Although this will not be perfect either as no lock is held. I want to keep it out of this change to emphasize that this move wasn't the main reason we are race free now. Fixes #10220	2025-01-09 20:41:49 +00:00
Alex Chi Z.	640ac4fc9e	fix(pageserver): report timestamp is in the past if the key is missing (#10210 ) ## Problem If for some reasons we already garbage-collected the data under an LSN but the caller uses a past LSN for the find_time_cutoff function, now we will report a missing key error and GC will never proceed. Note that missing key error can also happen if the key is really missing (i.e., during the past offload incidents) ## Summary of changes Make sure GC proceeds by bumping the LSN. When time_cutoff=None, we will not increase the time_cutoff (it will be set to latest_gc_cutoff). If we really need to bump the GC LSN for maintenance purpose, we need a separate API to do that. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-09 14:43:20 +00:00
Konstantin Knizhnik	20c40eb733	Add response tag to getpage request in V3 protocol version (#8686 ) ## Problem We have several serious data corruption incidents caused by mismatch of get-age requests: https://neondb.slack.com/archives/C07FJS4QF7V/p1723032720164359 We hope that the problem is fixed now. But it is better to prevent such kind of problems in future. Part of https://github.com/neondatabase/cloud/issues/16472 ## Summary of changes This PR introduce new V3 version of compute<->pageserver protocol, adding tag to getpage response. So now compute is able to check if it really gets response to the requested page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-01-09 13:12:04 +00:00
Vlad Lazar	f4739d49e3	pageserver: tweak interpreted ingest record metrics (#10291 ) ## Problem The filtered record metric doesn't make sense for interpreted ingest. ## Summary of changes While of dubious utility in the first place, this patch replaces them with records received and records observed metrics for interpreted ingest: * received records cause the pageserver to do _something_: write a key, value pair to storage, update some metadata or flush pending modifications * observed records are a shard 0 concept and contain only key metadata used in tracking relation sizes (received records include observed records)	2025-01-09 12:31:02 +00:00
Erik Grinaker	237dae71a1	Revert "pageserver,safekeeper: disable heap profiling (#10268 )" (#10303 ) This reverts commit `b33299dc37`. Heap profiles weren't the culprit after all. Touches #10225.	2025-01-07 22:49:00 +00:00
Alex Chi Z.	4a6556e269	fix(pageserver): ensure GC computes time cutoff using the same start time (#10193 ) ## Problem close https://github.com/neondatabase/neon/issues/10192 ## Summary of changes * `find_gc_time_cutoff` takes `now` parameter so that all branches compute the cutoff based on the same start time, avoiding races. * gc-compaction uses a single `get_gc_compaction_watermark` function to get the safe LSN to compact. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-06 19:29:18 +00:00
Erik Grinaker	a77e87a48a	pageserver: assert that uploads don't modify indexed layers (#10228 ) ## Problem It's not legal to modify layers that are referenced by the current layer index. Assert this in the upload queue, as preparation for upload queue reordering. Touches #10096. ## Summary of changes Add a debug assertion that the upload queue does not modify layers referenced by the current index. I could be convinced that this should be a plain assertion, but will be conservative for now.	2025-01-03 16:03:19 +00:00
Erik Grinaker	1393cc668b	Revert "pageserver: revert flush backpressure (#8550 ) (#10135 )" (#10270 ) This reverts commit `f3ecd5d76a`. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](https://github.com/neondatabase/neon/issues/10096) before properly addressing [compaction backpressure](https://github.com/neondatabase/neon/issues/8390).	2025-01-03 15:38:51 +00:00
Erik Grinaker	b33299dc37	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 15:21:31 +00:00
John Spray	e9d30edc7f	pageserver: fix a 500 during timeline creation + shutdown (#10259 ) ## Problem The test_create_churn_during_restart test fails if timeline creation calls return 500 errors (because the API shouldn't do it), and it's sometimes failing, for example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/3ce2e7045465012e ## Summary of changes - Avoid handling UploadQueueShutDownOrStopped case as an Other (i.e. 500)	2025-01-03 13:13:22 +00:00
Arpad Müller	1303cd5d05	Fix defusing race between Tenant::shutdown and offload_timeline (#10150 ) There is a race condition between `Tenant::shutdown`'s `defuse_for_drop` loop and `offload_timeline`, where timeline offloading can insert into a tenant that is in the process of shutting down, in fact so far progressed that the `defuse_for_drop` has already been called. This prevents warn log lines of the form: ``` offloaded timeline <hash> was dropped without having cleaned it up at the ancestor ``` The solution piggybacks on the `offloaded_timelines` lock: both the defuse loop and the offloaded timeline insertion need to acquire the lock, and we know that the defuse loop only runs after the tenant has set its `TenantState` to `Stopping`. So if we hold the `offloaded_timelines` lock, and know that the `TenantState` is not `Stopping`, then we know that the defuse loop has not ran yet, and holding the lock ensures that it doesn't start running while we are inserting the offloaded timeline. Fixes #10070	2025-01-03 12:36:01 +00:00
Alex Chi Z.	9c53b41245	fix(pageserver): update remote latest_gc_cutoff after gc-compaction (#10209 ) ## Problem close https://github.com/neondatabase/neon/issues/10208 part of #9114 ## Summary of changes * Ensure remote `latest_gc_cutoff` is up-to-date before removing any files for gc-compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-19 18:40:20 +00:00
Alex Chi Z.	b89e02f3e8	fix(pageserver): consider partial compaction layer map in layer check (#10044 ) ## Problem In https://github.com/neondatabase/neon/pull/9897 we temporarily disabled the layer valid check because the current one only considers the end result of all compaction algorithms, but partial gc-compaction would temporarily produce an "invalid" layer map. part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Allow LSN splits to overlap in the slow path check. Currently, the valid check is only used in storage scrubber (background job) and during gc-compaction (without taking layer lock). Therefore, it's fine for such checks to be a little bit inefficient but more accurate. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-19 18:04:53 +00:00
Alex Chi Z.	3d1c3a80ae	feat(pageserver): add compact queue http endpoint (#10173 ) ## Problem We cannot get the size of the compaction queue and access the info. Part of #9114 ## Summary of changes * Add an API endpoint to get the compaction queue. * gc_compaction test case now waits until the compaction finishes. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 18:09:02 +00:00
Alex Chi Z.	1d12efc428	fix(pageserver): allow repartition errors during gc-compaction smoke tests (#10164 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 In https://github.com/neondatabase/neon/pull/10127 we fixed the race, but we didn't add the errors to the allowlist. ## Summary of changes * Allow repartition errors in the gc-compaction smoke test. I think it might be worth to refactor the code to allow multiple threads getting a copy of repartition status (i.e., using Rcu) in the future. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 15:37:26 +00:00
Erik Grinaker	3d30a7a934	pageserver: make `RemoteTimelineClient::schedule_index_upload` infallible (#10155 ) Remove an unnecessary `Result` and address a `FIXME`.	2024-12-16 15:54:47 +00:00
Conrad Ludgate	6565fd4056	chore: fix clippy lints 2024-12-06 (#10138 )	2024-12-16 15:33:21 +00:00
John Spray	ebcbc1a482	pageserver: tighten up code around SLRU dir key handling (#10082 ) ## Problem Changes in #9786 were functionally complete but missed some edges that made testing less robust than it should have been: - `is_key_disposable` didn't consider SLRU dir keys disposable - Timeline `init_empty` was always creating SLRU dir keys on all shards The result was that when we had a bug (https://github.com/neondatabase/neon/pull/10080), it wasn't apparent in tests, because one would only encounter the issue if running on a long-lived timeline with enough compaction to drop the initially created empty SLRU dir keys, _and_ some CLog truncation going on. Closes: https://github.com/neondatabase/cloud/issues/21516 ## Summary of changes - Update is_key_global and init_empty to handle SLRU dir keys properly -- the only functional impact is that we avoid writing some spurious keys in shards >0, but this makes testing much more robust. - Make `test_clog_truncate` explicitly use a sharded tenant The net result is that if one reverts #10080, then tests fail (i.e. this PR is a reproducer for the issue)	2024-12-16 10:06:08 +00:00
Erik Grinaker	f3ecd5d76a	pageserver: revert flush backpressure (#8550 ) (#10135 ) ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure WAL ingestion (only via `disk_consistent_lsn`), and will build up in-memory layers. * It does not directly backpressure based on compaction debt and read amplification. An alternative solution to these problems is proposed in #8390. In the meanwhile, we revert the change to reduce the impact on ingest throughput. This does reintroduce some risk of unbounded upload/compaction buildup. Until https://github.com/neondatabase/neon/issues/8390, this can be addressed in other ways: * Use `max_replication_apply_lag` (aka `remote_consistent_lsn`), which will more directly limit upload debt. * Shard the tenant, which will spread the flush/upload work across more Pageservers and move the bottleneck to Safekeeper. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2024-12-15 09:45:12 +00:00
Alex Chi Z.	7ee5dca752	fix(pageserver): race between gc-compaction and repartition (#10127 ) ## Problem close https://github.com/neondatabase/neon/issues/10124 gc-compaction split_gc_jobs is holding the repartition lock for too long time. ## Summary of changes * Ensure split_gc_compaction_jobs drops the repartition lock once it finishes cloning the structures. * Update comments. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-13 18:22:25 +00:00
Alex Chi Z.	5ff4b991c7	feat(pageserver): gc-compaction split over LSN (#9900 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over https://github.com/neondatabase/neon/pull/9897, partially refactored to help with https://github.com/neondatabase/neon/issues/10031 ## Summary of changes * gc-compaction takes `above_lsn` parameter. We only compact the layers above this LSN, and all data below the LSN are treated as if they are on the ancestor branch. * refactored gc-compaction to take `GcCompactJob` that describes the rectangular range to be compacted. * Added unit test for this case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-12 20:23:24 +00:00

1 2 3 4 5 ...

2643 Commits