rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-07 06:00:38 +00:00

Author	SHA1	Message	Date
Yuchen Liang	d4ebd5ccd3	use CheapCloneForRead trait to prevent efficiency bugs Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-25 05:01:23 +00:00
Yuchen Liang	76f0e4fd1d	review: remove save_buf_for_read Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-25 04:42:02 +00:00
Yuchen Liang	28718bfadc	review: simplify FlushControl by using ZST for not(test) Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-25 04:39:13 +00:00
Yuchen Liang	e5bf2bec49	remove write_buffered; add notes for bypass-aligned-part-of-write Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-25 04:30:42 +00:00
Yuchen Liang	0f63c957a6	document and reorder flush background task invokation sequence Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-20 16:18:21 +00:00
Yuchen Liang	77801fe3be	Merge branch 'main' into yuchen/double-buffered-writer	2024-11-19 17:19:42 -05:00
Alex Chi Z.	b22a84a7bf	feat(pageserver): support key range for manual compaction trigger (#9723 ) part of https://github.com/neondatabase/neon/issues/9114, we want to be able to run partial gc-compaction in tests. In the future, we can also expand this functionality to legacy compaction, so that we can trigger compaction for a specific key range. ## Summary of changes * Support passing compaction key range through pageserver routes. * Refactor input parameters of compact related function to take the new `CompactOptions`. * Add tests for partial compaction. Note that the test may or may not trigger compaction based on GC horizon. We need to improve the test case to ensure things always get below the gc_horizon and the gc-compaction can be triggered. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 19:38:41 +00:00
Alex Chi Z.	5e3fbef721	fix(pageserver): queue stopped error should be ignored during create timeline (#9767 ) close https://github.com/neondatabase/neon/issues/9730 The test case tests if anything goes wrong during pageserver restart + during timeline creation not complete. Therefore, queue is stopped error is normal in this case, except that it should be categorized as a shutdown error instead of a real error. ## Summary of changes * More comments for the test case. * Queue stopped error will now be forwarded as CreateTimelineError::ShuttingDown. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 14:10:09 -05:00
Yuchen Liang	78a17a7051	improve FullSlice semantics Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-19 18:38:59 +00:00
Yuchen Liang	826e2395a8	add comments Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-19 16:48:29 +00:00
Yuchen Liang	9db6b1e3c8	fix clippy Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-19 14:22:28 +00:00
Yuchen Liang	5acc61bdbc	move duplex to utils; make flush behavior controllable in test Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-18 23:52:52 +00:00
Arpad Müller	4fc3af15dd	Remove at most one retain_lsn entry from (possibly offloaded) timelne's parent (#9791 ) There is a potential data corruption issue, not one I've encountered, but it's still not hard to hit with some correct looking code given our current architecture. It has to do with the timeline's memory object storage via reference counted `Arc`s, and the removal of `retain_lsn` entries at the drop of the last `Arc` reference. The corruption steps are as follows: 1. timeline gets offloaded. timeline object A doesn't get dropped though, because some long-running task accesses it 2. the same timeline gets unoffloaded again. timeline object B gets created for it, timeline object A still referenced. both point to the same timeline. 3. the task keeping the reference to timeline object A exits. destructor for object A runs, removing `retain_lsn` in the timeline's parent. 4. the timeline's parent runs gc without the `retain_lsn` of the still exant timleine's child, leading to data corruption. In general we are susceptible each time when we recreate a `Timeline` object in the same process, which happens both during a timeline offload/unoffload cycle, as well as during an ancestor detach operation. The solution this PR implements is to make the destructor for a timeline as well as an offloaded timeline remove at most one `retain_lsn`. PR #9760 has added a log line to print the refcounts at timeline offload, but this only detects one of the places where we do such a recycle operation. Plus it doesn't prevent the actual issue. I doubt that this occurs in practice. It is more a defense in depth measure. Usually I'd assume that the timeline gets dropped immediately in step 1, as there is no background tasks referencing it after its shutdown. But one never knows, and reducing the stakes of step 1 actually occurring is a really good idea, from potential data corruption to waste of CPU time. Part of #8088	2024-11-18 21:42:19 +01:00
Vlad Lazar	d7662fdc7b	feat(page_service): timeout-based batching of requests (#9321 ) ## Problem We don't take advantage of queue depth generated by the compute on the pageserver. We can process getpage requests more efficiently by batching them. ## Summary of changes Batch up incoming getpage requests that arrive within a configurable time window (`server_side_batch_timeout`). Then process the entire batch via one `get_vectored` timeline operation. By default, no merging takes place. ## Testing * Functional: https://github.com/neondatabase/neon/pull/9792 * Performance: will be done in staging/pre-prod # Refs * https://github.com/neondatabase/neon/issues/9377 * https://github.com/neondatabase/neon/issues/9376 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-11-18 20:24:03 +00:00
Alex Chi Z.	e5c89f3da3	feat(pageserver): drop disposable keys during gc-compaction (#9765 ) close https://github.com/neondatabase/neon/issues/9552, close https://github.com/neondatabase/neon/issues/8920, part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Drop keys not belonging to this shard during gc-compaction to avoid constructing history that might have been truncated during shard compaction. * Run gc-compaction at the end of shard compaction test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-18 19:27:52 +00:00
John Spray	261d065e6f	pageserver: respect no_sync in `VirtualFile` (#9772 ) ## Problem `no_sync` initially just skipped syncfs on startup (#9677). I'm also interested in flaky tests that time out during pageserver shutdown while flushing l0s, so to eliminate disk throughput as a source of issues there, ## Summary of changes - Drive-by change for test timeouts: add a couple more ::info logs during pageserver startup so it's obvious which part got stuck. - Add a SyncMode enum to configure VirtualFile and respect it in sync_all and sync_data functions - During pageserver startup, set SyncMode according to `no_sync`	2024-11-18 08:59:05 +00:00
Vlad Lazar	ac689ab014	wal_decoder: rename end_lsn to next_record_lsn (#9776 ) ## Problem It turns out that `WalStreamDecoder::poll_decode` returns the start LSN of the next record and not the end LSN of the current record. They are not always equal. For example, they're not equal when the record in question is an XLOG SWITCH record. ## Summary of changes Rename things to reflect that.	2024-11-15 21:53:11 +00:00
Yuchen Liang	990bc65a20	review: https://github.com/neondatabase/neon/pull/9693#discussion_r1840293759 Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-15 15:42:58 +00:00
Arpad Müller	7880c246f1	Correct mistakes in offloaded timeline retain_lsn management (#9760 ) PR #9308 has modified tenant activation code to take offloaded child timelines into account for populating the list of `retain_lsn` values. However, there is more places than just tenant activation where one needs to update the `retain_lsn`s. This PR fixes some bugs of the current code that could lead to corruption in the worst case: 1. Deleting of an offloaded timeline would not get its `retain_lsn` purged from its parent. With the patch we now do it, but as the parent can be offloaded as well, the situatoin is a bit trickier than for non-offloaded timelines which can just keep a pointer to their parent. Here we can't keep a pointer because the parent might get offloaded, then unoffloaded again, creating a dangling pointer situation. Keeping a pointer to the tenant is not good either, because we might drop the offloaded timeline in a context where a `offloaded_timelines` lock is already held: so we don't want to acquire a lock in the drop code of OffloadedTimeline. 2. Unoffloading a timeline would not get its `retain_lsn` values populated, leading to it maybe garbage collecting values that its children might need. We now call `initialize_gc_info` on the parent. 3. Offloading of a timeline would not get its `retain_lsn` values registered as offloaded at the parent. So if we drop the `Timeline` object, and its registration is removed, the parent would not have any of the child's `retain_lsn`s around. Also, before, the `Timeline` object would delete anything related to its timeline ID, now it only deletes `retain_lsn`s that have `MaybeOffloaded::No` set. Incorporates Chi's reproducer from #9753. cc https://github.com/neondatabase/cloud/issues/20199 The `test_timeline_retain_lsn` test is extended: 1. it gains a new dimension, duplicating each mode, to either have the "main" branch be the direct parent of the timeline we archive, or the "test_archived_parent" branch intermediary, creating a three timeline structure. This doesn't test anything fixed by this PR in particular, just explores the vast space of possible configurations a little bit more. 2. it gains two new modes, `offload-parent`, which tests the second point, and `offload-no-restart` which tests the third point. It's easy to verify the test actually is "sharp" by removing one of the respective `self.initialize_gc_info()`, `gc_info.insert_child()` or `ancestor_children.push()`. Part of #8088 --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-11-15 14:22:29 +01:00
John Spray	93939f123f	tests: add test_timeline_archival_chaos (#9609 ) ## Problem - We lack test coverage of cases where multiple timelines fight for updates to the same manifest (https://github.com/neondatabase/neon/pull/9557), and in timeline archival changes while dual-attached (https://github.com/neondatabase/neon/pull/9555) ## Summary of changes - Add a chaos test for timeline creation->archival->offload->deletion	2024-11-14 17:31:35 +00:00
Konstantin Knizhnik	f70611c8df	Correctly truncate VM (#9342 ) ## Problem https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Correctly truncate VM page instead just replacing it with zero page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-11-14 17:19:13 +02:00
John Spray	b4e00b8b22	pageserver: refuse to load tenants with suspiciously old indices in old generations (#9719 ) ## Problem Historically, if a control component passed a pageserver "generation: 1" this could be a quick way to corrupt a tenant by loading a historic index. Follows https://github.com/neondatabase/neon/pull/9383 Closes #6951 ## Summary of changes - Introduce a Fatal variant to DownloadError, to enable index downloads to signal when they have encountered a scary enough situation that we shouldn't proceed to load the tenant. - Handle this variant by putting the tenant into a broken state (no matter which timeline within the tenant reported it) - Add a test for this case In the event that this behavior fires when we don't want it to, we have ways to intervene: - "Touch" an affected index to update its mtime (download+upload S3 object) - If this behavior is triggered, it indicates we're attaching in some old generation, so we should be able to fix that by manually bumping generation numbers in the storage controller database (this should never happen, but it's an option if it does)	2024-11-13 18:07:39 +00:00
Yuchen Liang	6844b5f460	add comments; make read buffering works with write_buffered (owned version) Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 16:52:46 +00:00
Yuchen Liang	ffd88ede38	fix clippy Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 15:59:24 +00:00
Alex Chi Z.	cef165818c	test(pageserver): add gc-compaction tests with delta will_init (#9724 ) I had an impression that gc-compaction didn't test the case where the first record of the key history is will_init because of there are some code path that will panic in this case. Luckily it got fixed in https://github.com/neondatabase/neon/pull/9026 so we can now implement such tests. Part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Randomly changed some images into will_init neon wal record * Split `test_simple_bottom_most_compaction_deltas` into two test cases, one of them has the bottom layer as delta layer with will_init flags, while the other is the original one with image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-12 10:37:31 -05:00
Yuchen Liang	d6d8a16dbc	Merge branch 'main' into yuchen/double-buffered-writer	2024-11-11 20:27:09 -05:00
Yuchen Liang	20e6a0c8a2	use open_with_options_v2 (O_DIRECT) for ephemeral file Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 01:26:29 +00:00
Yuchen Liang	ce7cd36100	add IoBufAligned marker Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 01:17:52 +00:00
Yuchen Liang	b0d7fc7564	fix IoBufferMut::extend_from_slice Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 23:49:52 +00:00
Yuchen Liang	7b34e73c15	fix tests Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 22:35:40 +00:00
Yuchen Liang	e5bb85d407	fix clippy Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 21:33:33 +00:00
Alex Chi Z.	5a138d08a3	feat(pageserver): support partial gc-compaction for delta layers (#9611 ) The final patch for partial compaction, part of https://github.com/neondatabase/neon/issues/9114, close https://github.com/neondatabase/neon/issues/8921 (note that we didn't implement parallel compaction or compaction scheduler for partial compaction -- currently this needs to be scheduled by using a Python script to split the keyspace, and in the future, automatically split based on the key partitioning when the pageserver wants to trigger a gc-compaction) ## Summary of changes * Update the layer selection algorithm to use the same selection as full compaction (everything intersect/below gc horizon) * Update the layer selection algorithm to also generate a list of delta layers that need to be rewritten * Add the logic to rewrite delta layers and add them back to the layer map * Update test case to do partial compaction on deltas --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 20:30:32 +00:00
Tristan Partin	2d9652c434	Clean up C.UTF-8 locale changes Removes some unnecessary initdb arguments, and fixes Neon for MacOS since it doesn't seem to ship a C.UTF-8 locale. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 13:53:12 -06:00
Alex Chi Z.	48c06d9f7b	fix(pageserver): increase frozen layer warning threshold; ignore in tests (#9705 ) Perf benchmarks produce a lot of layers. ## Summary of changes Bumping the threshold and ignore the warning. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:13:46 -05:00
Vlad Lazar	ceaa80ffeb	storcon: add peer token for peer to peer communication (#9695 ) ## Problem We wish to stop using admin tokens in the infra repo, but step down requests use the admin token. ## Summary of Changes Introduce a new "ControllerPeer" scope and use it for step-down requests.	2024-11-11 09:58:41 +00:00
Yuchen Liang	e0848c28d9	make InMemory read aware of mutable & maybe_flushed Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 03:22:01 +00:00
Yuchen Liang	bdffc352e7	use background flush for write path; read path broken Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-10 20:01:52 +00:00
Yuchen Liang	45998046f3	make flush handle & task generic Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-10 19:07:45 +00:00
Yuchen Liang	26c8b50451	implement non-generic flush handle & bg task Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-09 18:41:41 +00:00
Alex Chi Z.	af8238ae52	fix(pageserver): drain upload queue before offloading timeline (#9682 ) It is possible at the point we shutdown the timeline, there are still layer files we did not upload. ## Summary of changes * If the queue is not empty, avoid offloading. * Shutdown the timeline gracefully using the flush mode to ensure all local files are uploaded before deleting the timeline directory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 14:28:55 -05:00
Alex Chi Z.	ecca62a45d	feat(pageserver): more log lines around frozen layers (#9697 ) We saw pageserver OOMs https://github.com/neondatabase/cloud/issues/19715 for tenants doing large writes. Add log lines around in-memory layers to hopefully collect some info during my on-call shift next week. ## Summary of changes * Estimate in-memory size of an in-mem layer. * Print frozen layer number if there are too many layers accumulated in memory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 18:44:00 +00:00
Tristan Partin	34a4eb6f2a	Switch compute-related locales to C.UTF-8 by default Right now, our environments create databases with the C locale, which is really unfortunate for users who have data stored in other languages that they want to analyze. For instance, show_trgm on Hebrew text currently doesn't work in staging or production. I don't envision this being the final solution. I think this is just a way to set a known value so the pageserver doesn't use its parent environment. The final solution to me is exposing initdb parameters to users in the console. Then they could use a different locale or encoding if they so chose. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 12:19:18 -06:00
Alex Chi Z.	f561cbe1c7	fix(pageserver): drain upload queue before detaching ancestor (#9651 ) In INC-317 https://neondb.slack.com/archives/C033RQ5SPDH/p1730815677932209, we saw an interesting series of operations that would remove valid layer files existing in the layer map. * Timeline A starts compaction and generates an image layer Z but not uploading it yet. * Timeline B/C starts ancestor detaching (which should not affect timeline A) * The tenant gets restarted as part of the ancestor detaching process, without increasing the generation number. * Timeline A reloads, discovering the layer Z is a future layer, and schedules a deletion into the deletion queue. This means that the file will be deleted any time in the future. * Timeline A starts compaction and generates layer Z again, adding it to the layer map. Note that because we don't bump generation number during ancestor detach, it has the same filename + generation number as the original Z. * Timeline A deletes layer Z from s3 + disk, and now we have a dangling reference in the layer map, blocking all compaction/logical_size_calculation process. ## Summary of changes * We wait until all layers to be uploaded before shutting down the tenants in `Flush` mode. * Ancestor detach restarts now use this mode. * Ancestor detach also waits for remote queue completion before starting the detaching process. * The patch ensures that we don't have any future image layer (or something similar) after restart, but not fixing the underlying problem around generation numbers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 10:35:27 -05:00
Yuchen Liang	f0efc908d7	use Arc around W: OwnedAsyncWriter Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-08 15:19:36 +00:00
Yuchen Liang	224cbb4025	change OwnedAsyncWriter trait to use write_all_at Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-08 15:09:33 +00:00
John Spray	aa9112efce	pageserver: add `no_sync` for use in regression tests (1/2) (#9677 ) ## Problem In test environments, the `syncfs` that the pageserver does on startup can take a long time, as other tests running concurrently might have many gigabytes of dirty pages. ## Summary of changes - Add a `no_sync` option to the pageserver's config. - Skip syncfs on startup if this is set - A subsequent PR (https://github.com/neondatabase/neon/pull/9678) will enable this by default in tests. We need to wait until after the next release to avoid breaking compat tests, which would fail if we set no_sync & use an old pageserver binary. Q: Why is this a different mechanism than safekeeper, which as a --no-sync CLI? A: Because the way we manage pageservers in neon_local depends on the pageserver.toml containing the full configuration, whereas safekeepers have a config file which is neon-local-specific and can drive a CLI flag. Q: Why is the option no_sync rather than sync? A: For boolean configs with a dangerous value, it's preferable to make "false" the safe option, so that any downstream future config tooling that might have a "booleans are false by default" behavior (e.g. golang structs) is safe by default. Q: Why only skip the syncfs, and not all fsyncs? A: Skipping all fsyncs would require more code changes, and the most acute problem isn't fsyncs themselves (these just slow down a running test), it's the syncfs (which makes a pageserver startup slow as a result of _other_ tests)	2024-11-08 10:16:04 +00:00
Yuchen Liang	dd1c45e896	eliminate size_tracking_writer Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-07 20:44:21 +00:00
Arpad Müller	75aa19aa2d	Don't attach is_archived to debug output (#9679 ) We are in branches where we know its value already.	2024-11-07 16:13:50 +00:00
Alex Chi Z.	a8d9939ea9	fix(pageserver): reduce aux compaction threshold (#9647 ) ref https://github.com/neondatabase/neon/issues/9441 The metrics from LR publisher testing project: ~300KB aux key deltas per 256MB files. Therefore, I think we can do compaction more aggressively as these deltas are small and compaction can reduce layer download latency. We also have a read path perf fix https://github.com/neondatabase/neon/pull/9631 but I'd still combine the read path fix with the reduce of the compaction threshold. ## Summary of changes * reduce metadata compaction threshold * use num of L1 delta layers as an indicator for metadata compaction * dump more logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-07 10:38:15 -05:00
Arpad Müller	011c0a175f	Support copying layers in detach_ancestor from before shard splits (#9669 ) We need to use the shard associated with the layer file, not the shard associated with our current tenant shard ID. Due to shard splits, the shard IDs can refer to older files. close https://github.com/neondatabase/neon/issues/9667	2024-11-07 01:53:58 +01:00

1 2 3 4 5 ...

2486 Commits