rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 17:02:56 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	eefb1d46f4	Replace Timeline::checkpoint with Timeline::freeze_and_flush The new Timeline::freeze_and_flush function is equivalent to calling Timeline::checkpoint(CheckpointConfig::Flush). There were only one non-test caller that used CheckpointConfig::Forced, so replace that with a call to the new Timeline::freeze_and_flush, followed by an explicit call to Timeline::compact. That only caller was to handle the mgmt API's 'checkpoint' endpoint. Perhaps we should split that into separate 'flush' and 'compact' endpoints too, but I didn't go that far yet.	2022-12-20 13:45:47 +02:00
Heikki Linnakangas	7b0d28bbdc	Update outdated comment on Tenant::gc_iteration. Commit `6dec85b19d` remove the `checkpoint_before_gc` argument, but failed to update the comment. Remove its description, and while we're at it, try to explain better how the `horizon` and `pitr` arguments are used.	2022-12-20 12:52:26 +02:00
Heikki Linnakangas	6ac9ecb074	Remove a few unnecessary checkpoint calls from unit tests. The `make_some_layers' function performs a checkpoint already.	2022-12-20 12:52:26 +02:00
Heikki Linnakangas	39f58038d1	Don't upload index file in compaction, if there was nothing to do. (#3149 ) This splits the storage_sync2::schedule_index_file into two (public) functions: 1. `schedule_index_upload_for_metadata_update`, for when the metadata (e.g. disk_consistent_lsn or last_gc_cutoff) has changed, and 2. `schedule_index_upload_for_file_changes`, for when layer file uploads or deletions have been scheduled. We now keep track of whether there have been any uploads or deletions since the last index-file upload, and skip the upload in `schedule_index_upload_for_file_changes` if there haven't been any changes. That allows us to call the function liberally in timeline.rs, whenever layer file uploads or deletions might've been scheduled, without starting a lot of unnecessary index file uploads. GC was covered earlier by commit `c262390214`, but that missed that we have the same problem with compaction.	2022-12-19 23:58:24 +02:00
Christian Schwarz	7db018e147	[4/4] the fix: do not leak spawn_blocking() tasks from logical size calculation code - Refactor logical_size_calculation_task, moving the pieces that are specific to try_spawn_size_init_task into that function. This allows us to spawn additional size calculation tasks that are not init size calculation tasks. - As part of this refactoring, stop logging cancellations as errors. They are part of regular operations. Logging them as errors was inadvertently introduced in earlier commit 427c1b2e9661161439e65aabc173d695cfc03ab4 initial logical size calculation: if it fails, retry on next call - Change tenant size model request code to spawn task_mgr tasks using the refactored logical_size_calculation_task function. Using a task_mgr task ensures that the calculation cannot outlive the timeline. - There are presumably still some subtle race conditions if a size requests comes in at exactly the same time as a detach / delete request. - But that's the concern of diferent area of the code (e.g., tenant_mgr) and requires holistic solutions, such as the proposed TenantGuard. - Make size calculation cancellable using CancellationToken. This is more of a cherry on top. NB: the test code doesn't use this because we _must_ return from the failpoint, because the failpoint lib doesn't allow to just continue execution in combination with executing the closure. This commit fixes the tests introduced earlier in this patch series.	2022-12-19 16:14:58 +01:00
Christian Schwarz	38ebd6e7a0	[3/4] make initial size estimation task sensitive to task_mgr shutdown requests This exacerbates the problem pointed out in the previous commit. Why? Because with this patch, deleting a timeline also exposes the issue. Extend the test to expose the problem.	2022-12-19 16:14:58 +01:00
Christian Schwarz	40a3d50883	[2/4] add test to show that tenant detach makes us leak running size calculation task	2022-12-19 16:14:58 +01:00
Christian Schwarz	ee2b5dc9ac	[1/4] initial logical size calculation: if it fails, retry on next call Before this patch, if the task fails, we would not reset self.initial_size_computation_started. So, if it fails, we will return the approximate value forever. In practice, it probably never failed because the local filesystem is quite reliable. But with on-demand download, the logical size calculation may need to download layers, which is more likely to fail at times. There will be internal retires with a timeout, but eventually, the downloads will give up. We want to retry in those cases. While we're at it, also change the handling of the timeline state watch so that we treat it as an error. Most likely, we'll not be called again, but if we are, retrying is the right thing.	2022-12-19 16:14:58 +01:00
Christian Schwarz	c785a516aa	remove TimelineInfo.{Remote,Local} along with their types follow-up of https://github.com/neondatabase/neon/pull/2615 which is neon.git: `538876650a` must be deployed after cloud.git change https://github.com/neondatabase/cloud/issues/3232 fixes https://github.com/neondatabase/neon/issues/3041	2022-12-19 14:37:40 +01:00
Heikki Linnakangas	e23d5da51c	Tidy up and add comments to the pageserver startup code. To make it more readable.	2022-12-19 14:03:22 +02:00
Dmitry Ivanov	61194ab2f4	Update rust-postgres everywhere I've rebased[1] Neon's fork of rust-postgres to incorporate latest upstream changes (including dependabot's fixes), so we need to advance revs here as well. [1] https://github.com/neondatabase/rust-postgres/commits/neon	2022-12-17 00:26:10 +03:00
Dmitry Ivanov	83baf49487	[proxy] Forward compute connection params to client This fixes all kinds of problems related to missing params, like broken timestamps (due to `integer_datetimes`). This solution is not ideal, but it will help. Meanwhile, I'm going to dedicate some time to improving connection machinery. Note that this does not fix problems with passing certain parameters in a reverse direction, i.e. from client to compute. This is a separate matter and will be dealt with in an upcoming PR.	2022-12-16 21:37:50 +03:00
Joonas Koivunen	c86c0c08ef	task_mgr: use CancellationToken instead of shutdown_rx (#3124 ) this should help us in the future to have more freedom with spawning tasks and cancelling things, most importantly blocking tasks (assuming the CancellationToken::is_cancelled is performant enough). CancellationToken allows creation of hierarchical cancellations, which would also simplify the task_mgr shutdown operation, rendering it unnecessary.	2022-12-16 17:19:47 +02:00
Arseny Sher	e14bbb889a	Enable broker client keepalives. (#3127 ) Should fix stale connections. ref https://github.com/neondatabase/neon/issues/3108	2022-12-16 11:55:12 +02:00
Heikki Linnakangas	c262390214	Don't upload index file when GC doesn't remove anything. I saw an excessive number of index file upload operations in production, even when nothing on the timeline changes. It was because our GC schedules index file upload if the GC cutoff LSN is advanced, even if the GC had nothing else to do. The GC cutoff LSN marches steadily forwards, even when there is no user activity on the timeline, when the cutoff is determined by the time-based PITR interval setting. To dial that down, only schedule index file upload when GC is about to actually remove something.	2022-12-16 11:05:55 +02:00
Heikki Linnakangas	6dec85b19d	Redefine the timeline_gc API to not perform a forced compaction Previously, the /v1/tenant/:tenant_id/timeline/:timeline_id/do_gc API call performed a flush and compaction on the timeline before GC. Change it not to do that, and change all the tests that used that API to perform compaction explicitly. The compaction happens at a slightly different point now. Previously, the code performed the `refresh_gc_info_internal` step first, and only then did compaction on all the timelines. I don't think that was what was originally intended here. Presumably the idea with compaction was to make some old layer files available for GC. But if we're going to flush the current in-memory layer to disk, surely you would want to include the newly-written layer in the compaction too. I guess this didn't make any difference to the tests in practice, but in any case, the tests now perform the flush and compaction before any of the GC steps. Some of the tests might not need the compaction at all, but I didn't try hard to determine which ones might need it. I left it out from a few tests that intentionally tested calling do_gc with an invalid tenant or timeline ID, though.	2022-12-16 11:05:55 +02:00
Christian Schwarz	10cd64cf8d	make TaskHandle::next_task_event cancellation-safe If we get cancelled before jh.await returns we've take()n the join handle but drop the result on the floor. Fix it by setting self.join_handle = None after the .await fixes https://github.com/neondatabase/neon/issues/3104	2022-12-15 10:26:17 +01:00
Christian Schwarz	bf3ac2be2d	add remote_physical_size metric We do the accounting exclusively after updating remote IndexPart successfully. This is cleaner & more robust than doing it upon completion of individual layer file uploads / deletions since we can uset .set() insteaf of add()/sub(). NB: Originally, this work was intended to be part of #3013 but it turns out that it's completely orthogonal. So, spin it out into this PR for easier review. Since this change is additive, it won't break anything.	2022-12-15 09:48:35 +01:00
Christian Schwarz	4132ae9dfe	always remove RemoteTimelineClient's metrics when dropping it	2022-12-14 19:25:29 +01:00
Dmitry Rodionov	df09d0375b	ignore metadata_backup files in index_part	2022-12-14 19:00:19 +03:00
Shany Pozin	ada5b7158f	Fix Issue #3014 (#3059 ) * TenantConfigRequest now supports tenant_id as hex string input instead of bytes array * Config file is truncated in each creation/update	2022-12-14 14:09:16 +02:00
Christian Schwarz	0c915dcb1d	Timeline::download_missing: fix handling of mismatched layer size Before this patch, when we decide to rename a layer file to backup because of layer file size mismatch, we would not remove the layer from the layer map, but remote the on-disk file. Because we re-download the file immediately after, we simply end up with two layer objects in memory that reference the same file in the layer map. So, GetPage() would work fine until one of the layers gets delete()'d. The other layer's delete() would then fail. Future work: prevent insertion of the same layer at LayerMap level so that we notice such bugs sooner.	2022-12-13 15:53:08 +01:00
Kirill Bulatov	02c1c351dc	Create initial timeline without remote storage (#3077 ) Removes the race during pageserver initial timeline creation that lead to partial layer uploads. This race is only reproducible in test code, we do not create initial timelines in cloud (yet, at least), but still nice to remove the non-deterministic behavior.	2022-12-13 15:42:59 +02:00
Christian Schwarz	22ae67af8d	refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath (#3026 ) refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath Before this patch, we would sometimes carry around plain file names in `Path` types and/or awkwardly "rebase" paths to have a unified representation of the layer file name between local and remote. This patch introduces a new type `LayerFileName` which replaces the use of `Path` / `PathBuf` / `RemotePath` in the `storage_sync2` APIs. Instead of holding a string, it contains the parsed representation of the image and delta file name. When we need the file name, e.g., to construct a local path or remote object key, we construct the name ad-hoc. `LayerFileName` is also serde {Dese,Se}rializable, and in an initial version of this patch, it was supposed to be used directly inside `IndexPart`, replacing `RemotePath`. However, commit `3122f3282f` Ignore backup files (ones with .n.old suffix) in download_missing fixed handling of `.old` backup file names in IndexPart, and we need to carry that behavior forward. The solution is to remove `.old` backup files names during deserialization. When we re-serialize the IndexPart, the `*.old` file will be gone. This leaks the `.old` file in the remote storage, but makes it safe to clean it up later. There is additional churn by a preliminary refactoring that got squashed into this change: split off LayerMap's needs from trait Layer into super trait That refactoring renames `Layer` to `PersistentLayer` and splits off a subset of the functions into a super-trait called `Layer`. The upser trait implements just the functions needed by `LayerMap`, whereas `PersisentLayer` adds the context of the pageserver. The naming is imperfect as some functions that reside in `PersistentLayer` have nothing persistence-specific to it. But it's a step in the right direction.	2022-12-13 01:27:59 +02:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Kirill Bulatov	861dc8e64e	Remove redundant once_cell usages	2022-12-09 22:14:32 +02:00
Dmitry Rodionov	3122f3282f	Ignore backup files (ones with .n.old suffix) in download_missing This is rather a hack to resolve immediate issue: https://github.com/neondatabase/neon/issues/3024 Properly cleaning this file from index part requires changes to initialization of remote queue. Because we need to clean it up earlier than we start warking around files. With on-demand there will be no walk around layer files becase download_missing is no longer needed, so I believe it will be natural to unify this with load_layer_map	2022-12-09 12:07:50 +03:00
Konstantin Knizhnik	e1ef62f086	Print more information about context of failed walredo requests (#3003 )	2022-12-08 09:12:38 +02:00
Kirill Bulatov	b50e0793cf	Rework remote_storage interface (#2993 ) Changes: * Remove `RemoteObjectId` concept from remote_storage. Operate directly on /-separated names instead. These names are now represented by struct `RemotePath` which was renamed from struct `RelativePath` * Require remote storage to operate on relative paths for its contents, thus simplifying the way to derive them in pageserver and safekeeper * Make `IndexPart` to use `String` instead of `RelativePath` for its entries, since those are just the layer names	2022-12-07 23:11:02 +02:00
Christian Schwarz	ac0c167a85	improve pidfile handling This patch centralize the logic of creating & reading pid files into the new pid_file module and improves upon / makes explicit a few race conditions that existed with the previous code. Starting Processes / Creating Pidfiles ====================================== Before this patch, we had three places that had very similar-looking match lock_file::create_lock_file { ... } blocks. After this change, they can use a straight-forward call provided by the pid_file: pid_file::claim_pid_file_for_pid() Stopping Processes / Reading Pidfiles ===================================== The new pid_file module provides a function to read a pidfile, called read_pidfile(), that returns a pub enum PidFileRead { NotExist, NotHeldByAnyProcess(PidFileGuard), LockedByOtherProcess(Pid), } If we get back NotExist, there is nothing to kill. If we get back NotHeldByAnyProcess, the pid file is stale and we must ignore its contents. If it's LockedByOtherProcess, it's either another pidfile reader or, more likely, the daemon that is still running. In this case, we can read the pid in the pidfile and kill it. There's still a small window where this is racy, but it's not a regression compared to what we have before. The NotHeldByAnyProcess is an improvement over what we had before this patch. Before, we would blindly read the pidfile contents and kill, even if no other process held the flock. If the pidfile was stale (NotHeldByAnyProcess), then that kill would either result in ESRCH or hit some other unrelated process on the system. This patch avoids the latter cacse by grabbing an exclusive flock before reading the pidfile, and returning the flock to the caller in the form of a guard object, to avoid concurrent reads / kills. It's hopefully irrelevant in practice, but it's a little robustness that we get for free here. Maintain flock on Pidfile of ETCD / any InitialPidFile::Create() ================================================================ Pageserver and safekeeper create their pidfiles themselves. But for etcd, neon_local creates the pidfile (InitialPidFile::Create()). Before this change, we would unlock the etcd pidfile as soon as `neon_local start` exits, simply because no-one else kept the FD open. During `neon_local stop`, that results in a stale pid file, aka, NotHeldByAnyProcess, and it would henceforth not trust that the PID stored in the file is still valid. With this patch, we make the etcd process inherit the pidfile FD, thereby keeping the flock held until it exits.	2022-12-07 18:24:12 +01:00
Heikki Linnakangas	a46a81b5cb	Fix updating "trace_read_requests" with /v1/tenant/config mgmt API. The new "trace_read_requests" option was missing from the parse_toml_tenant_conf function that reads the config file. Because of that, the option was ignored, which caused the test_read_trace.py test to fail. It used to work before commit `9a6c0be823`, because the TenantConfigOpt struct was constructed directly in tenant_create_handler, but now it is saved and read back from disk even for a newly created tenant. The abovementioned bug was fixed in commit `09393279c6` already, which added the missing code to parse_toml_tenant_conf() to parse the new "trace_read_requests" option. This commit fixes one more function that was missed earlier, and adds more detail to the error message if parsing the config file fails.	2022-12-07 15:03:39 +02:00
Heikki Linnakangas	b513619503	Remove obsolete 'awaits_download' field. It used to be a separate piece of state, but after `9a6c0be823` it's just an alias for the Tenant being in Attaching state. It was only used in one assertion in a test, but that check doesn't make sense anymore, so just remove it. Fixes https://github.com/neondatabase/neon/issues/2930	2022-12-07 13:13:54 +02:00
Kirill Bulatov	09393279c6	Fix tenant config parsing	2022-12-06 23:52:16 +02:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Christian Schwarz	4530544bb8	draw_timeline_dirs: accept paths as input	2022-12-06 18:17:48 +01:00
Dmitry Rodionov	98ff0396f8	tone down error log for successful process termination	2022-12-06 18:44:07 +03:00
Kirill Bulatov	d6bfe955c6	Add commands to unload and load the tenant in memory (#2977 ) Closes https://github.com/neondatabase/neon/issues/2537 Follow-up of https://github.com/neondatabase/neon/pull/2950 With the new model that prevents attaching without the remote storage, it has started to be even more odd to add attach-with-files functionality (in addition to the issues raised previously). Adds two separate commands: * `POST {tenant_id}/ignore` that places a mark file to skip such tenant on every start and removes it from memory * `POST {tenant_id}/schedule_load` that tries to load a tenant from local FS similar to what pageserver does now on startup, but without directory removals	2022-12-06 15:30:02 +00:00
Alexander Bayandin	61825dfb57	Update chrono to 0.4.23; use only clock feature from it	2022-12-06 15:45:58 +01:00
Kirill Bulatov	c0480facc1	Rename RelativePath to RemotePath Improve rustdocs a bit	2022-12-05 22:52:42 +02:00
Kirill Bulatov	b38473d367	Remove RelativePath conversions Function was unused, but publicly exported from the module lib, so not reported by rustc as unused	2022-12-05 22:52:42 +02:00
Kirill Bulatov	38af453553	Use async RwLock around tenants (#3009 ) A step towards more async code in our repo, to help avoid most of the odd blocking calls, that might deadlock, as mentioned in https://github.com/neondatabase/neon/issues/2975	2022-12-05 22:48:45 +02:00
Shany Pozin	79fdd3d51b	Fix #2907 : Change missing_layers property to optional in the IndexPart struct (#3005 ) Move missing_layers property to Option<HashSet<RelativePath>> This will allow the safe removal of it once the upgrade of all page servers is done with this new code	2022-12-05 13:56:04 +02:00
Kirill Bulatov	4f443c339d	Tone down retry error logs (#2999 ) Closes https://github.com/neondatabase/neon/issues/2990	2022-12-03 15:30:55 +00:00
Alexander Bayandin	788823ebe3	Fix named_arguments_used_positionally warnings (#2987 ) ``` warning: named argument `file` is not used by name --> pageserver/src/tenant/timeline.rs:1078:54 \| 1078 \| trace!("downloading image file: {}", file = path.display()); \| -- ^^^^ this named argument is referred to by position in formatting string \| \| \| this formatting argument uses named argument `file` by position \| = note: `#[warn(named_arguments_used_positionally)]` on by default help: use the named argument by name to avoid ambiguity \| 1078 \| trace!("downloading image file: {file}", file = path.display()); \| ++++ ``` Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-12-02 17:59:26 +00:00
bojanserafimov	fe280f70aa	Add synthetic layer map bench (#2979 )	2022-12-01 13:29:21 -05:00
bojanserafimov	b9544adcb4	Add layer map search benchmark (#2957 )	2022-11-30 13:48:07 -05:00
Heikki Linnakangas	33834c01ec	Rename Paused states to Stopping. I'm not a fan of "Paused", for two reasons: - Paused implies that the tenant/timeline with no activity on it. That's not true; the tenant/timeline can still have active tasks working on it. - Paused implies that it can be resumed later. It can not. A tenant or timeline in this state cannot be switched back to Active state anymore. A completely new Tenant or Timeline struct can be constructed for the same tenant or timeline later, e.g. if you detach and later re-attach the same tenant, but that's a different thing. Stopping describes the state better. I also considered "ShuttingDown", but Stopping is simpler as it's a single word.	2022-11-30 01:10:16 +02:00
Heikki Linnakangas	9a6c0be823	storage_sync2 The code in this change was extracted from PR #2595, i.e., Heikki’s draft PR for on-demand download. High-Level Changes - storage_sync module rewrite - Changes to Tenant Loading - Changes to Timeline States - Crash-safe & Resumable Tenant Attach There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 Metadata: closes https://github.com/neondatabase/neon/pull/2785 unsquashed history of this patch: archive/pr-2785-storage-sync2/pre-squash Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> =============================================================================== storage_sync module rewrite =========================== The storage_sync code is rewritten. New module name is storage_sync2, mostly to make a more reasonable git diff. The updated block comment in storage_sync2.rs describes the changes quite well, so, we will not reproduce that comment here. TL;DR: - Global sync queue and RemoteIndex are replaced with per-timeline `RemoteTimelineClient` structure that contains a queue for UploadOperations to ensure proper ordering and necessary metadata. - Before deleting local layer files, wait for ongoing UploadOps to finish (wait_completion()). - Download operations are not queued and executed immediately. Changes to Tenant Loading ========================= Initial sync part was rewritten as well and represents the other major change that serves as a foundation for on-demand downloads. Routines for attaching and loading shifted directly to Tenant struct and now are asynchronous and spawned into the background. Since this patch doesn’t introduce on-demand download of layers we fully synchronize with the remote during pageserver startup. See details in `Timeline::reconcile_with_remote` and `Timeline::download_missing`. Changes to Tenant States ======================== The “Active” state has lost its “background_jobs_running: bool” member. That variable indicated whether the GC & Compaction background loops are spawned or not. With this patch, they are now always spawned. Unit tests (#[test]) use the TenantConf::{gc_period,compaction_period} to disable their effect (`15db566`). This patch introduces a new tenant state, “Attaching”. A tenant that is being attached starts in this state and transitions to “Active” once it finishes download. The `GET /tenant` endpoints returns `TenantInfo::has_in_progress_downloads`. We derive the value for that field from the tenant state now, to remain backwards-compatible with cloud.git. We will remove that field when we switch to on-demand downloads. Changes to Timeline States ========================== The TimelineInfo::awaits_download field is now equivalent to the tenant being in Attaching state. Previously, download progress was tracked per timeline. With this change, it’s only tracked per tenant. When on-demand downloads arrive, the field will be completely obsolete. Deprecation is tracked in isuse #2930. Crash-safe & Resumable Tenant Attach ==================================== Previously, the attach operation was not persistent. I.e., when tenant attach was interrupted by a crash, the pageserver would not continue attaching after pageserver restart. In fact, the half-finished tenant directory on disk would simply be skipped by tenant_mgr because it lacked the metadata file (it’s written last). This patch introduces an “attaching” marker file inside that is present inside the tenant directory while the tenant is attaching. During pageserver startup, tenant_mgr will resume attach if that file is present. If not, it assumes that the local tenant state is consistent and tries to load the tenant. If that fails, the tenant transitions into Broken state.	2022-11-29 18:55:20 +01:00
Heikki Linnakangas	fbd5f65938	Misc cosmetic fixes in comments, messages. Most of these were extracted from PR #2785.	2022-11-29 14:10:45 +02:00
Heikki Linnakangas	1f1324ebed	Require tenant to be active when calculating tenant size. It's not clear if the calculation would work or make sense, if the tenant is only partially loaded. Let's play it safe, and require it to be Active.	2022-11-29 14:10:45 +02:00

1 2 3 4 5 ...

1054 Commits