rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-21 07:00:38 +00:00

Author	SHA1	Message	Date
John Spray	18159b7695	deletion queue: expose errors from push/flush	2023-08-22 10:01:10 +01:00
John Spray	c1bc9c0f70	Various test fixes + tweaks to flushing	2023-08-18 12:44:35 +01:00
John Spray	d330eac4bc	clippy	2023-08-18 12:44:35 +01:00
John Spray	3ebceeda71	pageserver: refactor timeline args into TimelineResources This sidesteps clippy complaining about function arg counts, and will enable introducing more shared structures in future without the noise of adding extra args to all the functions involved in timeline setup.	2023-08-18 12:44:35 +01:00
John Spray	31729d6f4d	pageserver: refactor tenant args into a structure This way, when we add some new shared structure that the tenants need a reference to, we do not have to add it individually as an extra argument to the various functions.	2023-08-18 12:44:35 +01:00
John Spray	7e0e3517c1	clippy	2023-08-18 12:44:35 +01:00
John Spray	c36cba28d6	pageserver: generalize flush API	2023-08-18 12:44:35 +01:00
John Spray	8eaa4015de	deletion queue: versions in keys	2023-08-18 12:44:35 +01:00
John Spray	10e927ee3e	Add encoding versions to deletion queue structs	2023-08-18 12:44:35 +01:00
John Spray	bb3a59f275	clippy	2023-08-18 12:44:35 +01:00
John Spray	a0ed43cc12	deletion queue: add DeletionHeader for sequence numbers	2023-08-18 12:44:35 +01:00
John Spray	99dc5a5c27	Deletion queue: implement recovery on startup	2023-08-18 12:44:35 +01:00
John Spray	54db1f5d8a	remote_storage: add a helper for downloading full objects This is only for use with small objects that we will deserialize in a non-streaming way. Also add a strip_prefix method to RemotePath.	2023-08-18 12:44:35 +01:00
John Spray	404b25e45f	Remove vestigial remote_timeline_client deletion paths	2023-08-18 12:44:35 +01:00
John Spray	3edd7ece40	deletion queue: improve frontend retry	2023-08-18 12:44:35 +01:00
John Spray	504fe9c2b0	pageserver: send timeline deletions through the deletion queue	2023-08-18 12:44:35 +01:00
John Spray	10df237a81	deletion queue: add push for generic objects (layers and garbage)	2023-08-18 12:44:35 +01:00
John Spray	d40f8475a5	Error metric and retries	2023-08-18 12:44:35 +01:00
John Spray	164f916a40	Spawn deletion workers with info spans	2023-08-18 12:44:35 +01:00
John Spray	4ebc29768c	Add failpoint for deletion execution	2023-08-18 12:44:35 +01:00
John Spray	bae62916dc	pageserver/http: add /v1/deletion_queue/flush_execute This is principally for tesing, but might be useful in the field if we want to e.g. flush a deletion queue before running an external scrub tool	2023-08-18 12:44:35 +01:00
John Spray	54ec7919b8	pageserver: add deletion queue submitted/executed metrics	2023-08-18 12:44:35 +01:00
John Spray	e0bed0732c	Tweak deletion queue constants	2023-08-18 12:44:35 +01:00
John Spray	9e92121cc3	pageserver: flush deletion queue on clean shutdown	2023-08-18 12:44:35 +01:00
John Spray	50a9508f4f	clippy	2023-08-18 12:44:35 +01:00
John Spray	f61402be24	pageserver: testing for deletion queue	2023-08-18 12:44:35 +01:00
John Spray	975e4f2235	Refactor deletion worker construction	2023-08-18 12:44:35 +01:00
John Spray	537eca489e	Implement flush_execute() in deletion queue	2023-08-18 12:44:35 +01:00
John Spray	de4882886e	pageserver: implement batching in deletion queue	2023-08-18 12:44:35 +01:00
John Spray	6982288426	pageserver: implement frontend of deletion queue	2023-08-18 12:44:35 +01:00
John Spray	e2c793c897	Use deletion queue in schedule_layer_file_deletion	2023-08-18 12:44:33 +01:00
John Spray	0fdc492aa4	Add MockDeletionQueue for unit tests	2023-08-18 11:25:40 +01:00
John Spray	787b099541	wire deletion queue into timeline	2023-08-18 11:25:40 +01:00
John Spray	3af693749d	pageserver: wire deletion queue through to Tenant	2023-08-18 11:25:40 +01:00
John Spray	6f9ae6bb5f	pageserver: instantiate deletion queue at process scope	2023-08-18 11:25:40 +01:00
John Spray	16d77dcb73	Initial stub implementation of deletion queue	2023-08-18 11:25:40 +01:00
Joonas Koivunen	67af24191e	test: cleanup remote_timeline_client tests (#5013 ) I will have to change these as I change remote_timeline_client api in #4938. So a bit of cleanup, handle my comments which were just resolved during initial review. Cleanup: - use unwrap in tests instead of mixed `?` and `unwrap` - use `Handle` instead of `&'static Reactor` to make the RemoteTimelineClient more natural - use arrays in tests - use plain `#[tokio::test]`	2023-08-17 19:27:30 +03:00
Joonas Koivunen	6af5f9bfe0	fix: format context (#5022 ) We return an error with unformatted `{timeline_id}`.	2023-08-17 14:30:25 +00:00
Dmitry Rodionov	d8b0a298b7	Do not attach deleted tenants (#5008 ) Rather temporary solution before proper: https://github.com/neondatabase/neon/issues/5006 It requires more plumbing so lets not attach deleted tenants first and then implement resume. Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation, and since we always asserted for absence of stuff it worked. Here I started to assert for presence of stuff too and it failed. Added more "presence" asserts to other places to be confident that it works. Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)	2023-08-17 13:46:49 +03:00
Christian Schwarz	957af049c2	ephemeral file: refactor write_blob impl to concentrate mutable state (#5004 ) Before this patch, we had the `off` and `blknum` as function-wide mutable state. Now it's contained in the `Writer` struct. The use of `push_bytes` instead of index-based filling of the buffer also makes it easier to reason about what's going on. This is prep for https://github.com/neondatabase/neon/pull/4994	2023-08-17 13:07:25 +03:00
Joonas Koivunen	d3612ce266	delta_layer: Restore generic from last week (#5014 ) Restores #4937 work relating to the ability to use `ResidentDeltaLayer` (which is an Arc wrapper) in #4938 for the ValueRef's by removing the borrow from `ValueRef` and providing it from an upper layer. This should not have any functional changes, most importantly, the `main` will continue to use the borrowed `DeltaLayerInner`. It might be that I can change #4938 to be like this. If that is so, I'll gladly rip out the `Ref` and move the borrow back. But I'll first want to look at the current test failures.	2023-08-17 11:47:31 +03:00
Christian Schwarz	994411f5c2	page cache: newtype the blob_io and ephemeral_file file ids (#5005 ) This makes it more explicit that these are different u64-sized namespaces. Re-using one in place of the other would be catastrophic. Prep for https://github.com/neondatabase/neon/pull/4994 which will eliminate the ephemeral_file::FileId and move the blob_io::FileId into page_cache. It makes sense to have this preliminary commit though, to minimize amount of new concept in #4994 and other preliminaries that depend on that work.	2023-08-16 18:33:47 +02:00
Arpad Müller	0bdbc39cb1	Compaction: unify key and value reference vecs (#4888 ) ## Problem PR #4839 has already reduced the number of b-tree traversals and vec creations from 3 to 2, but as pointed out in https://github.com/neondatabase/neon/pull/4839#discussion_r1279167815 , we would ideally just traverse the b-tree once during compaction. Afer #4836, the two vecs created are one for the list of keys, lsns and sizes, and one for the list of `(key, lsn, value reference)`. However, they are not equal, as pointed out in https://github.com/neondatabase/neon/pull/4839#issuecomment-1660418012 and the following comment: the key vec creation combines multiple entries for which the lsn is changing but the key stays the same into one, with the size being the sum of the sub-sizes. In SQL, this would correspond to something like `SELECT key, lsn, SUM(size) FROM b_tree GROUP BY key;` and `SELECT key, lsn, val_ref FROM b_tree;`. Therefore, the join operation is non-trivial. ## Summary of changes This PR merges the two lists of keys and value references into one. It's not a trivial change and affects the size pattern of the resulting files, which is why this is in a separate PR from #4839 . The key vec is used in compaction for determining when to start a new layer file. The loop uses various thresholds to come to this conclusion, but the grouping via the key has led to the behaviour that regardless of the threshold, it only starts a new file when either a new key is encountered, or a new delta file. The new code now does the combination after the merging and sorting of the various keys from the delta files. This mostly does the same as the old code, except for a detail: with the grouping done on a per-delta-layer basis, the sorted and merged vec would still have multiple entries for multiple delta files, but now, we don't have an easy way to tell when a new input delta layer file is encountered, so we cannot create multiple entries on that basis easily. To prevent possibly infinite growth, our new grouping code compares the combined size with the threshold, and if it is exceeded, it cuts a new entry so that the downstream code can cut a new output file. Here, we perform a tradeoff however, as if the threshold is too small, we risk putting entries for the same key into multiple layer files, but if the threshold is too big, we can in some instances exceed the target size. Currently, we set the threshold to the target size, so in theory we would stay below or roughly at double the `target_file_size`. We also fix the way the size was calculated for the last key. The calculation was wrong and accounted for the old layer's btree, even though we already account for the overhead of the in-construction btree. Builds on top of #4839 .	2023-08-16 18:27:18 +03:00
Dmitry Rodionov	96b84ace89	Correctly remove orphaned objects in RemoteTimelineClient::delete_all (#5000 ) Previously list_prefixes was incorrectly used for that purpose. Change to use list_files. Add a test. Some drive by refactorings on python side to move helpers out of specific test file to be widely accessible resolves https://github.com/neondatabase/neon/issues/4499	2023-08-16 17:31:16 +03:00
Christian Schwarz	368b783ada	ephemeral_file: remove FileExt impl (was only used by tests) (#5003 ) Extracted from https://github.com/neondatabase/neon/pull/4994	2023-08-16 15:41:25 +02:00
Dmitry Rodionov	52c2c69351	fsync directory before mark file removal (#4986 ) ## Problem Deletions can be possibly reordered. Use fsync to avoid the case when mark file doesnt exist but other tenant/timeline files do. See added comments. resolves #4987	2023-08-15 19:24:23 +03:00
Arpad Müller	baf395983f	Turn BlockLease associated type into an enum (#4982 ) ## Problem The `BlockReader` trait is not ready to be asyncified, as associated types are not supported by asyncification strategies like via the `async_trait` macro, or via adopting enums. ## Summary of changes Remove the `BlockLease` associated type from the `BlockReader` trait and turn it into an enum instead, bearing the same name. The enum has two variants, one of which is gated by `#[cfg(test)]`. Therefore, outside of test settings, the enum has zero overhead over just having the `PageReadGuard`. Using the enum allows us to impl `BlockReader` without needing the page cache. Part of https://github.com/neondatabase/neon/issues/4743	2023-08-14 18:48:09 +02:00
Arpad Müller	ce7efbe48a	Turn BlockCursor::{read_blob,read_blob_into_buf} async fn (#4905 ) ## Problem The `BlockCursor::read_blob` and `BlockCursor::read_blob_into_buf` functions are calling `read_blk` internally, so if we want to make that function async fn, they need to be async themselves. ## Summary of changes * We first turn `ValueRef::load` into an async fn. * Then, we switch the `RwLock` implementation in `InMemoryLayer` to use the one from `tokio`. * Last, we convert the `read_blob` and `read_blob_into_buf` functions into async fn. In three instances we use `Handle::block_on`: * one use is in compaction code, which currently isn't async. We put the entire loop into an `async` block to prevent the potentially hot loop from doing cross-thread operations. * one use is in dumping code for `DeltaLayer`. The "proper" way to address this would be to enable the visit function to take async closures, but then we'd need to be generic over async fs non async, which [isn't supported by rust right now](https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html). The other alternative would be to do a first pass where we cache the data into memory, and only then to dump it. * the third use is in writing code, inside a loop that copies from one file to another. It is is synchronous and we'd like to keep it that way (for now?). Part of #4743	2023-08-14 17:20:37 +02:00
Dmitry Rodionov	4626d89eda	Harden retries on tenant/timeline deletion path. (#4973 ) Originated from test failure where we got SlowDown error from s3. The patch generalizes `download_retry` to not be download specific. Resulting `retry` function is moved to utils crate. `download_retries` is now a thin wrapper around this `retry` function. To ensure that all needed retries are in place test code now uses `test_remote_failures=1` setting. Ref https://neondb.slack.com/archives/C059ZC138NR/p1691743624353009	2023-08-14 17:16:49 +03:00
John Spray	d3a97fdf88	pageserver: avoid incrementing access time when reading layers for compaction (#4971 ) ## Problem Currently, image generation reads delta layers before writing out subsequent image layers, which updates the access time of the delta layers and effectively puts them at the back of the queue for eviction. This is the opposite of what we want, because after a delta layer is covered by a later image layer, it's likely that subsequent reads of latest data will hit the image rather than the delta layer, so the delta layer should be quite a good candidate for eviction. ## Summary of changes `RequestContext` gets a new `ATimeBehavior` field, and a `RequestContextBuilder` helper so that we can optionally add the new field without growing `RequestContext::new` every time we add something like this. Request context is passed into the `record_access` function, and the access time is not updated if `ATimeBehavior::Skip` is set. The compaction background task constructs its request context with this skip policy. Closes: https://github.com/neondatabase/neon/issues/4969	2023-08-14 10:18:22 +01:00

1 2 3 4 5 ...

1503 Commits