rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-25 00:50:36 +00:00

Author	SHA1	Message	Date
Vlad Lazar	9957c6a9a0	pageserver: drop the layer map lock after planning reads (#7215 ) ## Problem The vectored read path holds the layer map lock while visiting a timeline. ## Summary of changes * Rework the fringe order to hold `Layer` on `Arc<InMemoryLayer>` handles instead of descriptions that are resolved by the layer map at the time of read. Note that previously `get_values_reconstruct_data` was implemented for the layer description which already knew the lsn range for the read. Now it is implemented on the new `ReadableLayer` handle and needs to get the lsn range as an argument. * Drop the layer map lock after updating the fringe. Related https://github.com/neondatabase/neon/issues/6833	2024-04-02 17:16:15 +01:00
Vlad Lazar	871977f14c	pageserver: fix early bail out in vectored get (#7038 ) ## Problem When vectored get encountered a portion of the key range that could not be mapped to any layer in the current timeline it would incorrectly bail out of the current timeline. This is incorrect since we may have had layers queued for a visit in the fringe. ## Summary of changes * Add a repro unit test * Remove the early bail out path * Simplify range search return value	2024-03-07 16:02:20 +00:00
Vlad Lazar	5d6083bfc6	pageserver: add vectored get implementation (#6576 ) This PR introduces a new vectored implementation of the read path. The search is basically a DFS if you squint at it long enough. LayerFringe tracks the next layers to visit and acts as our stack. Vertices are tuples of (layer, keyspace, lsn range). Continuously pop the top of the stack (most recent layer) and do all the reads for one layer at once. The search maintains a fringe (`LayerFringe`) which tracks all the layers that intersect the current keyspace being searched. Continuously pop the top of the fringe (layer with highest LSN) and get all the data required from the layer in one go. Said search is done on one timeline at a time. If data is still required for some keys, then search the ancestor timeline. Apart from the high level layer traversal, vectored variants have been introduced for grabbing data from each layer type. They still suffer from read amplification issues and that will be addressed in a different PR. You might notice that in some places we duplicate the code for the existing read path. All of that code will be removed when we switch the non-vectored read path to proxy into the vectored read path. In the meantime, we'll have to contend with the extra cruft for the sake of testing and gentle releasing.	2024-02-21 09:49:46 +00:00
Vlad Lazar	0c7b89235c	pageserver: add range layer map search implementation (#6469 ) ## Problem There's no efficient way of querying the layer map for a range. ## Summary of changes Introduce a range query for the layer map (`LayerMap::range_search`). There's two broad steps to it: 1. Find all coverage changes for layers that intersect the queried range (see `LayerCoverage::range_overlaps`). The slightly tricky part is dealing with the start of the range. We can either be aligned with a layer or not and we need to treat these cases differently. 2. Iterate over the coverage changes and collect the result. For this we use a two pointer approach: the trailing pointer tracks the start of the current range (current location in the key space) and the forward pointer tracks the next coverage change. Plugging the range search into the read path is deferred to a future PR. ## Performance I adapted the layer map benchmarks on a local branch. Range searches are between 2x and 2.5x slower than point searches. That's in line with what I expected since we query thelayer map twice. Since `Timeline::get` will proxy to `Timeline::get_vectored` we can special case the one element layer map range search at that point.	2024-01-29 09:47:12 +00:00
Vlad Lazar	37638fce79	pageserver: introduce vectored Timeline::get interface (#6372 ) 1. Introduce a naive `Timeline::get_vectored` implementation The return type is intended to be flexible enough for various types of callers. We return the pages in a map keyed by `Key` such that the caller doesn't have to map back to the key if it needs to know it. Some callers can ignore errors for specific pages, so we return a separate `Result<Bytes, PageReconstructError>` for each page and an overarching `GetVectoredError` for API misuse. The overhead of the mapping will be small and bounded since we enforce a maximum key count for the operation. 2. Use the `get_vectored` API for SLRU segment reconstruction and image layer creation.	2024-01-23 14:23:53 +00:00
Joonas Koivunen	c508d3b5fa	reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938 ) Implement a new `struct Layer` abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new `struct Layer` provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this `RemoteTimelineClient::wait_completion` which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value `Drop`. With a single `struct Layer` the closed open ended `trait Layer`, `trait PersistentLayer` and `struct RemoteLayer` are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839. The new `struct Layer` is a preliminary to remove `Timeline::layer_removal_cs` documented in #4745. Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.	2023-10-26 12:36:38 +03:00
Joonas Koivunen	533a92636c	refactor: pre-cleanup Layer, PersistentLayer and impls (#5059 ) Remove pub but dead code, move trait methods as inherent methods, remove unnecessary. Split off from #4938.	2023-08-22 21:14:28 +03:00
Joonas Koivunen	cbd04f5140	remove_remote_layer: uninteresting refactorings (#4936 ) In the quest to solve #4745 by moving the download/evictedness to be internally mutable factor of a Layer and get rid of `trait PersistentLayer` at least for prod usage, `layer_removal_cs`, we present some misc cleanups. --------- Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-08-09 14:35:56 +03:00
arpad-m	e5b7ddfeee	Preparatory pageserver async conversions (#4773 ) In #4743, we'd like to convert the read path to use `async` rust. In preparation of that, this PR switches some functions that are calling lower level functions like `BlockReader::read_blk`, `BlockCursor::read_blob`, etc into `async`. The PR does not switch all functions however, and only focuses on the ones which are easy to switch. This leaves around some async functions that are (currently) unnecessarily `async`, but on the other hand it makes future changes smaller in diff. Part of #4743 (but does not completely address it).	2023-07-24 14:01:54 +02:00
arpad-m	982fce1e72	Fix rustdoc warnings and test cargo doc in CI (#4711 ) ## Problem `cargo +nightly doc` is giving a lot of warnings: broken links, naked URLs, etc. ## Summary of changes * update the `proc-macro2` dependency so that it can compile on latest Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and https://github.com/dtolnay/proc-macro2/issues/398 * allow the `private_intra_doc_links` lint, as linking to something that's private is always more useful than just mentioning it without a link: if the link breaks in the future, at least there is a warning due to that. Also, one might enable [`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options) in the future and make these links work in general. * fix all the remaining warnings given by `cargo +nightly doc` * make it possible to run `cargo doc` on stable Rust by updating `opentelemetry` and associated crates to version 0.19, pulling in a fix that previously broke `cargo doc` on stable: https://github.com/open-telemetry/opentelemetry-rust/pull/904 * Add `cargo doc` to CI to ensure that it won't get broken in the future. Fixes #2557 ## Future work * Potentially, it might make sense, for development purposes, to publish the generated rustdocs somewhere, like for example [how the rust compiler does it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html). I will file an issue for discussion.	2023-07-15 05:11:25 +03:00
Alex Chi Z	c76b74c50d	semantic layer map operations (#4618 ) ## Problem ref https://github.com/neondatabase/neon/issues/4373 ## Summary of changes A step towards immutable layer map. I decided to finish the refactor with this new approach and apply https://github.com/neondatabase/neon/pull/4455 on this patch later. In this PR, we moved all modifications of the layer map to one place with semantic operations like `initialize_local_layers`, `finish_compact_l0`, `finish_gc_timeline`, etc, which is now part of `LayerManager`. This makes it easier to build new features upon this PR: * For immutable storage state refactor, we can simply replace the layer map with `ArcSwap<LayerMap>` and remove the `layers` lock. Moving towards it requires us to put all layer map changes in a single place as in https://github.com/neondatabase/neon/pull/4455. * For manifest, we can write to manifest in each of the semantic functions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-13 17:35:27 +03:00
Alex Chi Z	08bfe1c826	remove `LayerDescriptor` and use `LayerObject` for tests (#4637 ) ## Problem part of https://github.com/neondatabase/neon/pull/4340 ## Summary of changes Remove LayerDescriptor and remove `todo!`. At the same time, this PR adds `AsLayerDesc` trait for all persistent layers and changed `LayerFileManager` to have a generic type. For tests, we are now using `LayerObject`, which is a wrapper around `PersistentLayerDesc`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-10 19:40:37 +03:00
arpad-m	4f280c2953	Small pageserver cleanups (#4657 ) ## Problem I was reading the code of the page server today and found these minor things that I thought could be cleaned up. ## Summary of changes * remove a redundant indentation layer and continue in the flushing loop * use the builtin `PartialEq` check instead of hand-rolling a `range_eq` function * Add a missing `>` to a prominent doc comment	2023-07-07 16:53:14 +02:00
Alex Chi Z	7e20b49da4	refactor: use LayerDesc in LayerMap (part 2) (#4437 ) ## Problem part of https://github.com/neondatabase/neon/issues/4392, continuation of https://github.com/neondatabase/neon/pull/4408 ## Summary of changes This PR removes all layer objects from LayerMap and moves it to the timeline struct. In timeline struct, LayerFileManager maps a layer descriptor to a layer object, and it is stored in the same RwLock as LayerMap to avoid behavior difference. Key changes: * LayerMap now does not have generic, and only stores descriptors. * In Timeline, we add a new struct called layer mapping. * Currently, layer mapping is stored in the same lock with layer map. Every time we retrieve data from the layer map, we will need to map the descriptor to the actual object. * Replace_historic is moved to layer mapping's replace, and the return value behavior is different from before. I'm a little bit unsure about this part and it would be good to have some comments on that. * Some test cases are rewritten to adapt to the new interface, and we can decide whether to remove it in the future because it does not make much sense now. * LayerDescriptor is moved to `tests` module and should only be intended for unit testing / benchmarks. * Because we now have a usage pattern like "take the guard of lock, then get the reference of two fields", we want to avoid dropping the incorrect object when we intend to unlock the lock guard. Therefore, a new set of helper function `drop_r/wlock` is added. This can be removed in the future when we finish the refactor. TODOs after this PR: fully remove RemoteLayer, and move LayerMapping to a separate LayerCache. all refactor PRs: ``` #4437 --- #4479 ------------ #4510 (refactor done at this point) \-- #4455 -- #4502 --/ ``` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-06-29 15:06:07 -04:00
Alex Chi Z	2e687bca5b	refactor: use LayerDesc in layer map (part 1) (#4408 ) ## Problem part of https://github.com/neondatabase/neon/issues/4392 ## Summary of changes This PR adds a new HashMap that maps persistent layer desc to the layer object inside LayerMap. Originally I directly went towards adding such layer cache in Timeline, but the changes are too many and cannot be reviewed as a reasonably-sized PR. Therefore, we take this intermediate step to change part of the codebase to use persistent layer desc, and come up with other PRs to move this hash map of layer desc to the timeline struct. Also, file_size is now part of the layer desc. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-07 18:28:18 +03:00
Joonas Koivunen	ec53c5ca2e	revert: "Add check for duplicates of generated image layers" (#4104 ) This reverts commit `732acc5`. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088	2023-04-28 17:20:18 +03:00
Joonas Koivunen	850f6b1cb9	refactor: drop pageserver_ondisk_layers (#4071 ) I didn't get through #3775 fast enough so we wanted to remove this metric. Fixes #3705.	2023-04-26 11:49:29 +03:00
Konstantin Knizhnik	732acc54c1	Add check for duplicates of generated image layers (#3869 ) ## Describe your changes ## Issue ticket number and link #3673 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-04-13 10:19:34 +03:00
Joonas Koivunen	d6bb8caad4	refactor: correct return value for not found L0's on LayerMap::replace (#3805 ) in prev implementation, an `ok_or_else(...)?` is used to cause a "precondition error" on LayerMap::replace, however we only see this particular error if an L0 for which replace fails is not in the layermap because it is not in `l0_delta_layers`. changes or fixes this to be Replacement::NotFound instead, making it more clear that an error would only be raised for actual preconditions, like trying to replace layer with completly unrelated layer.	2023-03-14 13:18:26 +02:00
Joonas Koivunen	8e6b27bf7c	fix: avoid busy loop on replacement failure (#3613 ) Add an AtomicBool per RemoteLayer, use it to mark together with closed semaphore that remotelayer is unusable until restart or ignore+load. https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554	2023-02-17 14:15:29 +02:00
Joonas Koivunen	f07d6433b6	fix: one leftover Arc::ptr_eq (#3573 ) @knizhnik noticed that one instance of `Arc::<dyn PersistentLayer>::ptr_eq` was missed in #3558. Now all `ptr_eq` which remain are in comments.	2023-02-09 13:02:07 +02:00
Joonas Koivunen	a6dffb6ef9	fix: stop using Arc::ptr_eq with dyn Trait (#3558 ) This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's `LayerMap` not to use `Arc::ptr_eq` which has been witnessed in development of #3557 to yield wrong results. It gives wrong results because it compares fat pointers, which are `(object, vtable)` tuples for `dyn Trait` and there are no guarantees that the `vtable`s are unique. As in there were multiple vtables for `RemoteLayer` which is why the comparison failed in #3557. This is a known issue in rust, clippy warns against it and rust std might be moving to the solution which has been reproduced on this PR: compare only object pointers by "casting out" the vtable pointer.	2023-02-08 12:25:25 +00:00
Joonas Koivunen	f2d89761c2	feat: LayerMap::replace (#3513 ) Cc: #3486 Adds a method to replace a particular layer from the LayerMap for the purposes of remote layer download and layer eviction. In those use cases read lock on layer map needs to be released after initial search, but other operations could modify layermap before replacing thread gets to run. Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-02-03 15:33:46 +02:00
Christian Schwarz	f1aece1ba0	add RequestContext plumbing for layer access stats In preparation for #3496 plumb through RequestContext to the data access methods of `PersistentLayer`. This is PR https://github.com/neondatabase/neon/pull/3504	2023-02-01 15:29:01 +02:00
bojanserafimov	a3d7ad2d52	Implement layer map using immutable BST (#2998 )	2023-01-20 16:10:12 -05:00
bojanserafimov	a9bd05760f	Improve layer map docstrings (#3382 )	2023-01-19 10:29:15 -05:00
Christian Schwarz	3526323bc4	prepare Timeline::get_reconstruct_data for becoming async (#3271 ) This patch restructures the code so that PR https://github.com/neondatabase/neon/pull/3228 can seamlessly replace the return PageReconstructResult::NeedsDownload with a download_remote_layer().await. Background: PR https://github.com/neondatabase/neon/pull/3228 will turn get_reconstruct_data() async and do the on-demand download right in place, instead of returning a PageReconstructResult::NeedsDownload. Current rustc requires that the layers lock guard be not in scope across an await point. For on-demand download inside get_reconstruct_data(), we need to do download_remote_layer().await. Supersedes https://github.com/neondatabase/neon/pull/3260 See my comment there: https://github.com/neondatabase/neon/pull/3260#issuecomment-1370752407 Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-01-06 19:42:25 +02:00
Konstantin Knizhnik	1137b58b4d	Fix LayerMap::search to not return delta layer preceeding image layer (#3197 ) While @bojanserafimov is still working on best replacement of R-Tree in layer_map.rs there is obvious pitfall in the current `search` method implementation: is returns delta layer even if there is image layer if greater LSN. I think that it should be fixed.	2022-12-26 18:21:41 +02:00
Kirill Bulatov	b77c33ee06	Move tenant-related modules below `tenant` module (#3190 ) No real code changes besides moving code around and adjusting the imports.	2022-12-23 15:40:37 +02:00
Christian Schwarz	a3f0111726	LayerMap::search is actually infallible Found this while investigating failure modes of on-demand download. I think it's a nice cleanup.	2022-12-21 12:14:55 +01:00
Christian Schwarz	22ae67af8d	refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath (#3026 ) refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath Before this patch, we would sometimes carry around plain file names in `Path` types and/or awkwardly "rebase" paths to have a unified representation of the layer file name between local and remote. This patch introduces a new type `LayerFileName` which replaces the use of `Path` / `PathBuf` / `RemotePath` in the `storage_sync2` APIs. Instead of holding a string, it contains the parsed representation of the image and delta file name. When we need the file name, e.g., to construct a local path or remote object key, we construct the name ad-hoc. `LayerFileName` is also serde {Dese,Se}rializable, and in an initial version of this patch, it was supposed to be used directly inside `IndexPart`, replacing `RemotePath`. However, commit `3122f3282f` Ignore backup files (ones with .n.old suffix) in download_missing fixed handling of `.old` backup file names in IndexPart, and we need to carry that behavior forward. The solution is to remove `.old` backup files names during deserialization. When we re-serialize the IndexPart, the `*.old` file will be gone. This leaks the `.old` file in the remote storage, but makes it safe to clean it up later. There is additional churn by a preliminary refactoring that got squashed into this change: split off LayerMap's needs from trait Layer into super trait That refactoring renames `Layer` to `PersistentLayer` and splits off a subset of the functions into a super-trait called `Layer`. The upser trait implements just the functions needed by `LayerMap`, whereas `PersisentLayer` adds the context of the pageserver. The naming is imperfect as some functions that reside in `PersistentLayer` have nothing persistence-specific to it. But it's a step in the right direction.	2022-12-13 01:27:59 +02:00
Heikki Linnakangas	2418e72649	Speed up layer_map::search, by remembering the "envelope" for each layer. Lookups in the R-tree call the "envelope" function for every comparison, and our envelope function isn't very cheap, so that overhead adds up. Create the envelope once, when the layer is inserted into the tree, and store it along with the layer. That uses some more memory per layer, but that's not very significant. Speeds up the search operation 2x	2022-10-18 15:00:10 +03:00
Konstantin Knizhnik	f3073a4db9	R-Tree layer map (#2317 ) Replace the layer array and linear search with R-tree So far, the in-memory layer map that holds information about layer files that exist, has used a simple Vec, in no particular order, to hold information about all the layers. That obviously doesn't scale very well; with thousands of layer files the linear search was consuming a lot of CPU. Replace it with a two-dimensional R-tree, with Key and LSN ranges as the dimensions. For the R-tree, use the 'rstar' crate. To be able to use that, we convert the Keys and LSNs into 256-bit integers. 64 bits would be enough to represent LSNs, and 128 bits would be enough to represent Keys. However, we use 256 bits, because rstar internally performs multiplication to calculate the area of rectangles, and the result of multiplying two 128 bit integers doesn't necessarily fit in 128 bits, causing integer overflow and, if overflow-checks are enabled, panic. To avoid that, we use 256 bit integers. Add a performance test that creates a lot of layer files, to demonstrate the benefit.	2022-09-22 08:35:06 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Kirill Bulatov	1a8c8b04d7	Merge Repository and Tenant entities, rework tenant background jobs	2022-09-13 15:39:39 +03:00

35 Commits