Commit Graph

1067 Commits

Author SHA1 Message Date
Heikki Linnakangas
db1ea8b430 WIP: get rid of layer_removal_cs 2022-11-28 11:47:46 +02:00
Heikki Linnakangas
2dad3a6445 Make schedule_layer_file_deletion atomic.
Make sure that we don't mutate the upload queue, until we're sure that
we will complete all the changse. Otherwise, we could bail out after
removing some files from upload_queue.latest_files, but not all.

This is purely theoretical: there were only two ? calls in the
function, and neither of them should actually return an error:

1. `RelativePath::from_local_path` only returns error if the argument
  path is not in the base directory. I.e. in this case, if the argument
  path was outside the timeline directory. Shouldn't happen.

2. latest_metadata.to_bytes() only returns an error if the serde
   serialization fails. It really shouldn't fail.

I considered turning those into panics instead, but as long as the
function as whole can return an Err, the callers need to be prepared
for that anyway, so there's little difference. Nevertheless, I
refactored the code a little to make it atomic even one of those
can't-happen errors happen after all. And I used a closure to make it
harder to introduce new dangerous ?-operators in the future.
2022-11-25 22:58:05 +02:00
Christian Schwarz
d36fd4f141 merge_local_remote_metadata: bail out in case where local has metadata file but remote is newer than local 2022-11-25 15:45:21 -05:00
Heikki Linnakangas
a8a6603e66 rustfmt 2022-11-25 22:20:23 +02:00
Christian Schwarz
87c82f5151 storage_sync/delete.rs: convert FIXME into follow-up https://github.com/neondatabase/neon/issues/2934 2022-11-25 14:57:05 -05:00
Christian Schwarz
c9a2b1fd10 startup: transition tenant into Broken state if config load fails (instead of skipping over it and not adding it to the tenants map) 2022-11-25 14:55:51 -05:00
Heikki Linnakangas
fdfa86b5b0 Put back check that a timeline must have ancestor or some layer files. 2022-11-25 21:49:12 +02:00
Heikki Linnakangas
3eb85957df Little cleanup around measure_remote_op calls
- Previously, the functions in download.rs did the measurement themselves,
  whereas for functions in delete.rs and upload.rs, it was the caller's
  responsibility. Move the measure_remote_op calls from download.rs to
  the callers, for consistency.
- Remove pointless async blocks in upload.rs and delete.rs. They would've
  been useful for inserting the measure_remote_op calls, but since the
  caller's are responsible for measure_remote_op now, they're not neeed.
- tiny cosmetic cleanup around imports
2022-11-25 21:05:06 +02:00
Christian Schwarz
6b4a28bf7f remove TODO on missing tests that will be covered later by https://github.com/neondatabase/neon/pull/2928 2022-11-25 13:27:07 -05:00
Christian Schwarz
22d6c1dda6 Merge remote-tracking branch 'origin/dkr/on-demand-split/per-tenant-remote-sync' into problame/for-dkr/on-demand-split/per-tenant-remote-sync 2022-11-25 13:22:31 -05:00
Christian Schwarz
abfac6ef2a Merge remote-tracking branch 'origin/main' into HEAD
Conflicts:
	libs/pageserver_api/src/models.rs
	pageserver/src/lib.rs
	pageserver/src/tenant_mgr.rs

There was a merge conflict following attach_tenant() where
I didn't understand why Git called out a conflict.
I went through the changes in `origin/main` since the last
merge done by Heikki, couldn't find anything that would
conflict there.

Original git diff right after after `git merge` follows:

   diff --cc libs/pageserver_api/src/models.rs
   index 750585b58,aefd79336..000000000
   --- a/libs/pageserver_api/src/models.rs
   +++ b/libs/pageserver_api/src/models.rs
   @@@ -15,17 -15,13 +15,27 @@@ use bytes::{BufMut, Bytes, BytesMut}
     /// A state of a tenant in pageserver's memory.
     #[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
     pub enum TenantState {
   ++<<<<<<< HEAD
    +    // This tenant is being loaded from local disk
    +    Loading,
    +    // This tenant is being downloaded from cloud storage.
    +    Attaching,
    +    /// Tenant is fully operational
    +    Active,
    +    /// A tenant is recognized by pageserver, but it is being detached or the system is being
    +    /// shut down.
    +    Paused,
    +    /// A tenant is recognized by the pageserver, but can no longer used for any operations,
    +    /// because it failed to get activated.
   ++=======
   +     /// Tenant is fully operational, its background jobs might be running or not.
   +     Active { background_jobs_running: bool },
   +     /// A tenant is recognized by pageserver, but it is being detached or the
   +     /// system is being shut down.
   +     Paused,
   +     /// A tenant is recognized by the pageserver, but can no longer be used for
   +     /// any operations, because it failed to be activated.
   ++>>>>>>> origin/main
         Broken,
     }

   diff --cc pageserver/src/lib.rs
   index 2d5b66f57,e3112223e..000000000
   --- a/pageserver/src/lib.rs
   +++ b/pageserver/src/lib.rs
   @@@ -22,7 -23,11 +23,13 @@@ pub mod walreceiver
     pub mod walrecord;
     pub mod walredo;

   ++<<<<<<< HEAD
   ++=======
   + use std::collections::HashMap;
   + use std::path::Path;
   +
   ++>>>>>>> origin/main
     use tracing::info;
    -use utils:🆔:{TenantId, TimelineId};

     use crate::task_mgr::TaskKind;

   @@@ -103,14 -108,51 +110,64 @@@ fn exponential_backoff_duration_seconds
         }
     }

   ++<<<<<<< HEAD
    +/// A suffix to be used during file sync from the remote storage,
    +/// to ensure that we do not leave corrupted files that pretend to be layers.
    +const TEMP_FILE_SUFFIX: &str = "___temp";
   ++=======
   + /// A newtype to store arbitrary data grouped by tenant and timeline ids.
   + /// One could use [`utils:🆔:TenantTimelineId`] for grouping, but that would
   + /// not include the cases where a certain tenant has zero timelines.
   + /// This is sometimes important: a tenant could be registered during initial load from FS,
   + /// even if he has no timelines on disk.
   + #[derive(Debug)]
   + pub struct TenantTimelineValues<T>(HashMap<TenantId, HashMap<TimelineId, T>>);
   +
   + impl<T> TenantTimelineValues<T> {
   +     fn new() -> Self {
   +         Self(HashMap::new())
   +     }
   + }
   +
   + /// The name of the metadata file pageserver creates per timeline.
   + /// Full path: `tenants/<tenant_id>/timelines/<timeline_id>/metadata`.
   + pub const METADATA_FILE_NAME: &str = "metadata";
   +
   + /// Per-tenant configuration file.
   + /// Full path: `tenants/<tenant_id>/config`.
   + pub const TENANT_CONFIG_NAME: &str = "config";
   +
   + /// A suffix used for various temporary files. Any temporary files found in the
   + /// data directory at pageserver startup can be automatically removed.
   + pub const TEMP_FILE_SUFFIX: &str = "___temp";
   +
   + /// A marker file to mark that a timeline directory was not fully initialized.
   + /// If a timeline directory with this marker is encountered at pageserver startup,
   + /// the timeline directory and the marker file are both removed.
   + /// Full path: `tenants/<tenant_id>/timelines/<timeline_id>___uninit`.
   + pub const TIMELINE_UNINIT_MARK_SUFFIX: &str = "___uninit";
   +
   + pub fn is_temporary(path: &Path) -> bool {
   +     match path.file_name() {
   +         Some(name) => name.to_string_lossy().ends_with(TEMP_FILE_SUFFIX),
   +         None => false,
   +     }
   + }
   +
   + pub fn is_uninit_mark(path: &Path) -> bool {
   +     match path.file_name() {
   +         Some(name) => name
   +             .to_string_lossy()
   +             .ends_with(TIMELINE_UNINIT_MARK_SUFFIX),
   +         None => false,
   ++    }
   ++}
   ++>>>>>>> origin/main
    +
    +pub fn is_temporary(path: &std::path::Path) -> bool {
    +    match path.file_name() {
    +        Some(name) => name.to_string_lossy().ends_with(TEMP_FILE_SUFFIX),
    +        None => false,
         }
     }

   diff --cc pageserver/src/tenant_mgr.rs
   index 73593bc48,061d7fa19..000000000
   --- a/pageserver/src/tenant_mgr.rs
   +++ b/pageserver/src/tenant_mgr.rs
   @@@ -13,11 -13,18 +13,22 @@@ use tracing::*
     use remote_storage::GenericRemoteStorage;

     use crate::config::PageServerConf;
   ++<<<<<<< HEAD
   ++=======
   + use crate::http::models::TenantInfo;
   + use crate::storage_sync::index::{LayerFileMetadata, RemoteIndex, RemoteTimelineIndex};
   + use crate::storage_sync::{self, LocalTimelineInitStatus, SyncStartupData, TimelineLocalFiles};
   ++>>>>>>> origin/main
     use crate::task_mgr::{self, TaskKind};
    -use crate::tenant::{
    -    ephemeral_file::is_ephemeral_file, metadata::TimelineMetadata, Tenant, TenantState,
    -};
    +use crate::tenant::{Tenant, TenantState};
     use crate::tenant_config::TenantConfOpt;
   ++<<<<<<< HEAD
   ++=======
   + use crate::walredo::PostgresRedoManager;
   + use crate::{is_temporary, is_uninit_mark, METADATA_FILE_NAME, TEMP_FILE_SUFFIX};
   ++>>>>>>> origin/main

    -use utils::crashsafe::{self, path_with_suffix_extension};
    +use utils::fs_ext::PathExt;
     use utils:🆔:{TenantId, TimelineId};

     mod tenants_state {
   @@@ -341,87 -521,334 +352,247 @@@ pub fn list_tenants() -> Vec<(TenantId
             .collect()
     }

    -#[derive(Debug)]
    -pub enum TenantAttachData {
    -    Ready(HashMap<TimelineId, TimelineLocalFiles>),
    -    Broken(anyhow::Error),
    -}
    -/// Attempts to collect information about all tenant and timelines, existing on the local FS.
    -/// If finds any, deletes all temporary files and directories, created before. Also removes empty directories,
    -/// that may appear due to such removals.
    -/// Does not fail on particular timeline or tenant collection errors, rather logging them and ignoring the entities.
    -fn local_tenant_timeline_files(
    -    config: &'static PageServerConf,
    -) -> anyhow::Result<HashMap<TenantId, TenantAttachData>> {
    -    let _entered = info_span!("local_tenant_timeline_files").entered();
    -
    -    let mut local_tenant_timeline_files = HashMap::new();
    -    let tenants_dir = config.tenants_path();
    -    for tenants_dir_entry in fs::read_dir(&tenants_dir)
    -        .with_context(|| format!("Failed to list tenants dir {}", tenants_dir.display()))?
    -    {
    -        match &tenants_dir_entry {
    -            Ok(tenants_dir_entry) => {
    -                let tenant_dir_path = tenants_dir_entry.path();
    -                if is_temporary(&tenant_dir_path) {
    -                    info!(
    -                        "Found temporary tenant directory, removing: {}",
    -                        tenant_dir_path.display()
    -                    );
    -                    if let Err(e) = fs::remove_dir_all(&tenant_dir_path) {
    -                        error!(
    -                            "Failed to remove temporary directory '{}': {:?}",
    -                            tenant_dir_path.display(),
    -                            e
    -                        );
    -                    }
    -                } else {
    -                    match collect_timelines_for_tenant(config, &tenant_dir_path) {
    -                        Ok((tenant_id, TenantAttachData::Broken(e))) => {
    -                            local_tenant_timeline_files.entry(tenant_id).or_insert(TenantAttachData::Broken(e));
    -                        },
    -                        Ok((tenant_id, TenantAttachData::Ready(collected_files))) => {
    -                            if collected_files.is_empty() {
    -                                match remove_if_empty(&tenant_dir_path) {
    -                                    Ok(true) => info!("Removed empty tenant directory {}", tenant_dir_path.display()),
    -                                    Ok(false) => {
    -                                        // insert empty timeline entry: it has some non-temporary files inside that we cannot remove
    -                                        // so make obvious for HTTP API callers, that something exists there and try to load the tenant
    -                                        let _ = local_tenant_timeline_files.entry(tenant_id).or_insert_with(|| TenantAttachData::Ready(HashMap::new()));
    -                                    },
    -                                    Err(e) => error!("Failed to remove empty tenant directory: {e:?}"),
    -                                }
    -                            } else {
    -                                match local_tenant_timeline_files.entry(tenant_id) {
    -                                    hash_map::Entry::Vacant(entry) => {
    -                                        entry.insert(TenantAttachData::Ready(collected_files));
    -                                    }
    -                                    hash_map::Entry::Occupied(entry) =>{
    -                                        if let TenantAttachData::Ready(old_timelines) = entry.into_mut() {
    -                                            old_timelines.extend(collected_files);
    -                                        }
    -                                    },
    -                                }
    -                            }
    -                        },
    -                        Err(e) => error!(
    -                            "Failed to collect tenant files from dir '{}' for entry {:?}, reason: {:#}",
    -                            tenants_dir.display(),
    -                            tenants_dir_entry,
    -                            e
    -                        ),
    -                    }
    +/// Execute Attach mgmt API command.
    +///
    +/// Downloading all the tenant data is performed in the background, this merely
    +/// spawns the background task and returns quickly.
    +pub async fn attach_tenant(
    +    conf: &'static PageServerConf,
    +    tenant_id: TenantId,
    +    remote_storage: &GenericRemoteStorage,
    +) -> anyhow::Result<()> {
    +    match tenants_state::write_tenants().entry(tenant_id) {
    +        hash_map::Entry::Occupied(e) => {
    +            // Cannot attach a tenant that already exists. The error message depends on
    +            // the state it's in.
    +            match e.get().current_state() {
    +                TenantState::Attaching => {
    +                    anyhow::bail!("tenant {tenant_id} attach is already in progress")
                     }
   ++<<<<<<< HEAD
    +                current_state => {
    +                    anyhow::bail!("tenant already exists, current state: {current_state:?}")
   ++=======
   +             }
   +             Err(e) => error!(
   +                 "Failed to list tenants dir entry {:?} in directory {}, reason: {:?}",
   +                 tenants_dir_entry,
   +                 tenants_dir.display(),
   +                 e
   +             ),
   +         }
   +     }
   +
   +     info!(
   +         "Collected files for {} tenants",
   +         local_tenant_timeline_files.len(),
   +     );
   +     Ok(local_tenant_timeline_files)
   + }
   +
   + fn remove_if_empty(tenant_dir_path: &Path) -> anyhow::Result<bool> {
   +     let directory_is_empty = tenant_dir_path
   +         .read_dir()
   +         .with_context(|| {
   +             format!(
   +                 "Failed to read directory '{}' contents",
   +                 tenant_dir_path.display()
   +             )
   +         })?
   +         .next()
   +         .is_none();
   +
   +     if directory_is_empty {
   +         fs::remove_dir_all(&tenant_dir_path).with_context(|| {
   +             format!(
   +                 "Failed to remove empty directory '{}'",
   +                 tenant_dir_path.display(),
   +             )
   +         })?;
   +
   +         Ok(true)
   +     } else {
   +         Ok(false)
   +     }
   + }
   +
   + fn collect_timelines_for_tenant(
   +     config: &'static PageServerConf,
   +     tenant_path: &Path,
   + ) -> anyhow::Result<(TenantId, TenantAttachData)> {
   +     let tenant_id = tenant_path
   +         .file_name()
   +         .and_then(OsStr::to_str)
   +         .unwrap_or_default()
   +         .parse::<TenantId>()
   +         .context("Could not parse tenant id out of the tenant dir name")?;
   +     let timelines_dir = config.timelines_path(&tenant_id);
   +
   +     if !timelines_dir.as_path().is_dir() {
   +         return Ok((
   +             tenant_id,
   +             TenantAttachData::Broken(anyhow::anyhow!(
   +                 "Tenant {} has no timelines directory at {}",
   +                 tenant_id,
   +                 timelines_dir.display()
   +             )),
   +         ));
   +     }
   +
   +     let mut tenant_timelines = HashMap::new();
   +     for timelines_dir_entry in fs::read_dir(&timelines_dir)
   +         .with_context(|| format!("Failed to list timelines dir entry for tenant {tenant_id}"))?
   +     {
   +         match timelines_dir_entry {
   +             Ok(timelines_dir_entry) => {
   +                 let timeline_dir = timelines_dir_entry.path();
   +                 if is_temporary(&timeline_dir) {
   +                     info!(
   +                         "Found temporary timeline directory, removing: {}",
   +                         timeline_dir.display()
   +                     );
   +                     if let Err(e) = fs::remove_dir_all(&timeline_dir) {
   +                         error!(
   +                             "Failed to remove temporary directory '{}': {:?}",
   +                             timeline_dir.display(),
   +                             e
   +                         );
   +                     }
   +                 } else if is_uninit_mark(&timeline_dir) {
   +                     let timeline_uninit_mark_file = &timeline_dir;
   +                     info!(
   +                         "Found an uninit mark file {}, removing the timeline and its uninit mark",
   +                         timeline_uninit_mark_file.display()
   +                     );
   +                     let timeline_id = timeline_uninit_mark_file
   +                         .file_stem()
   +                         .and_then(OsStr::to_str)
   +                         .unwrap_or_default()
   +                         .parse::<TimelineId>()
   +                         .with_context(|| {
   +                             format!(
   +                                 "Could not parse timeline id out of the timeline uninit mark name {}",
   +                                 timeline_uninit_mark_file.display()
   +                             )
   +                         })?;
   +                     let timeline_dir = config.timeline_path(&timeline_id, &tenant_id);
   +                     if let Err(e) =
   +                         remove_timeline_and_uninit_mark(&timeline_dir, timeline_uninit_mark_file)
   +                     {
   +                         error!("Failed to clean up uninit marked timeline: {e:?}");
   +                     }
   +                 } else {
   +                     let timeline_id = timeline_dir
   +                         .file_name()
   +                         .and_then(OsStr::to_str)
   +                         .unwrap_or_default()
   +                         .parse::<TimelineId>()
   +                         .with_context(|| {
   +                             format!(
   +                                 "Could not parse timeline id out of the timeline dir name {}",
   +                                 timeline_dir.display()
   +                             )
   +                         })?;
   +                     let timeline_uninit_mark_file =
   +                         config.timeline_uninit_mark_file_path(tenant_id, timeline_id);
   +                     if timeline_uninit_mark_file.exists() {
   +                         info!("Found an uninit mark file for timeline {tenant_id}/{timeline_id}, removing the timeline and its uninit mark");
   +                         if let Err(e) = remove_timeline_and_uninit_mark(
   +                             &timeline_dir,
   +                             &timeline_uninit_mark_file,
   +                         ) {
   +                             error!("Failed to clean up uninit marked timeline: {e:?}");
   +                         }
   +                     } else {
   +                         match collect_timeline_files(&timeline_dir) {
   +                             Ok((metadata, timeline_files)) => {
   +                                 tenant_timelines.insert(
   +                                     timeline_id,
   +                                     TimelineLocalFiles::collected(metadata, timeline_files),
   +                                 );
   +                             }
   +                             Err(e) => {
   +                                 error!(
   +                                     "Failed to process timeline dir contents at '{}', reason: {:?}",
   +                                     timeline_dir.display(),
   +                                     e
   +                                 );
   +                                 match remove_if_empty(&timeline_dir) {
   +                                     Ok(true) => info!(
   +                                         "Removed empty timeline directory {}",
   +                                         timeline_dir.display()
   +                                     ),
   +                                     Ok(false) => (),
   +                                     Err(e) => {
   +                                         error!("Failed to remove empty timeline directory: {e:?}")
   +                                     }
   +                                 }
   +                             }
   +                         }
   +                     }
   ++>>>>>>> origin/main
                     }
                 }
    -            Err(e) => {
    -                error!("Failed to list timelines for entry tenant {tenant_id}, reason: {e:?}")
    -            }
    +        }
    +        hash_map::Entry::Vacant(v) => {
    +            let tenant = Tenant::spawn_attach(conf, tenant_id, remote_storage)?;
    +            v.insert(tenant);
    +            Ok(())
             }
         }
    -
    -    if tenant_timelines.is_empty() {
    -        // this is normal, we've removed all broken, empty and temporary timeline dirs
    -        // but should allow the tenant to stay functional and allow creating new timelines
    -        // on a restart, we require tenants to have the timelines dir, so leave it on disk
    -        debug!("Tenant {tenant_id} has no timelines loaded");
    -    }
    -
    -    Ok((tenant_id, TenantAttachData::Ready(tenant_timelines)))
     }

    -fn remove_timeline_and_uninit_mark(timeline_dir: &Path, uninit_mark: &Path) -> anyhow::Result<()> {
    -    fs::remove_dir_all(&timeline_dir)
    -        .or_else(|e| {
    -            if e.kind() == std::io::ErrorKind::NotFound {
    -                // we can leave the uninit mark without a timeline dir,
    -                // just remove the mark then
    -                Ok(())
    -            } else {
    -                Err(e)
    -            }
    -        })
    -        .with_context(|| {
    -            format!(
    -                "Failed to remove unit marked timeline directory {}",
    -                timeline_dir.display()
    -            )
    -        })?;
    -    fs::remove_file(&uninit_mark).with_context(|| {
    -        format!(
    -            "Failed to remove timeline uninit mark file {}",
    -            uninit_mark.display()
    -        )
    -    })?;
    +#[cfg(feature = "testing")]
    +use {
    +    crate::repository::GcResult, pageserver_api::models::TimelineGcRequest,
    +    utils::http::error::ApiError,
    +};

    -    Ok(())
    -}
    +#[cfg(feature = "testing")]
    +pub fn immediate_gc(
    +    tenant_id: TenantId,
    +    timeline_id: TimelineId,
    +    gc_req: TimelineGcRequest,
    +) -> Result<tokio::sync::oneshot::Receiver<Result<GcResult, anyhow::Error>>, ApiError> {
    +    let guard = tenants_state::read_tenants();

    -// discover timeline files and extract timeline metadata
    -//  NOTE: ephemeral files are excluded from the list
    -fn collect_timeline_files(
    -    timeline_dir: &Path,
    -) -> anyhow::Result<(TimelineMetadata, HashMap<PathBuf, LayerFileMetadata>)> {
    -    let mut timeline_files = HashMap::new();
    -    let mut timeline_metadata_path = None;
    -
    -    let timeline_dir_entries =
    -        fs::read_dir(&timeline_dir).context("Failed to list timeline dir contents")?;
    -    for entry in timeline_dir_entries {
    -        let entry_path = entry.context("Failed to list timeline dir entry")?.path();
    -        let metadata = entry_path.metadata()?;
    -
    -        if metadata.is_file() {
    -            if entry_path.file_name().and_then(OsStr::to_str) == Some(METADATA_FILE_NAME) {
    -                timeline_metadata_path = Some(entry_path);
    -            } else if is_ephemeral_file(&entry_path.file_name().unwrap().to_string_lossy()) {
    -                debug!("skipping ephemeral file {}", entry_path.display());
    -                continue;
    -            } else if is_temporary(&entry_path) {
    -                info!("removing temp timeline file at {}", entry_path.display());
    -                fs::remove_file(&entry_path).with_context(|| {
    -                    format!(
    -                        "failed to remove temp download file at {}",
    -                        entry_path.display()
    -                    )
    -                })?;
    -            } else {
    -                let layer_metadata = LayerFileMetadata::new(metadata.len());
    -                timeline_files.insert(entry_path, layer_metadata);
    +    let tenant = guard
    +        .get(&tenant_id)
    +        .map(Arc::clone)
    +        .with_context(|| format!("Tenant {tenant_id} not found"))
    +        .map_err(ApiError::NotFound)?;
    +
    +    let gc_horizon = gc_req.gc_horizon.unwrap_or_else(|| tenant.get_gc_horizon());
    +    // Use tenant's pitr setting
    +    let pitr = tenant.get_pitr_interval();
    +
    +    // Run in task_mgr to avoid race with detach operation
    +    let (task_done, wait_task_done) = tokio::sync::oneshot::channel();
    +    task_mgr::spawn(
    +        &tokio::runtime::Handle::current(),
    +        TaskKind::GarbageCollector,
    +        Some(tenant_id),
    +        Some(timeline_id),
    +        &format!("timeline_gc_handler garbage collection run for tenant {tenant_id} timeline {timeline_id}"),
    +        false,
    +        async move {
    +            fail::fail_point!("immediate_gc_task_pre");
    +            let result = tenant
    +                .gc_iteration(Some(timeline_id), gc_horizon, pitr, true)
    +                .instrument(info_span!("manual_gc", tenant = %tenant_id, timeline = %timeline_id))
    +                .await;
    +                // FIXME: `gc_iteration` can return an error for multiple reasons; we should handle it
    +                // better once the types support it.
    +            match task_done.send(result) {
    +                Ok(_) => (),
    +                Err(result) => error!("failed to send gc result: {result:?}"),
                 }
    +            Ok(())
             }
    -    }
    -
    -    // FIXME (rodionov) if attach call succeeded, and then pageserver is restarted before download is completed
    -    //   then attach is lost. There would be no retries for that,
    -    //   initial collect will fail because there is no metadata.
    -    //   We either need to start download if we see empty dir after restart or attach caller should
    -    //   be aware of that and retry attach if awaits_download for timeline switched from true to false
    -    //   but timelinne didn't appear locally.
    -    //   Check what happens with remote index in that case.
    -    let timeline_metadata_path = match timeline_metadata_path {
    -        Some(path) => path,
    -        None => anyhow::bail!("No metadata file found in the timeline directory"),
    -    };
    -    let metadata = TimelineMetadata::from_bytes(
    -        &fs::read(&timeline_metadata_path).context("Failed to read timeline metadata file")?,
    -    )
    -    .context("Failed to parse timeline metadata file bytes")?;
    -
    -    anyhow::ensure!(
    -        metadata.ancestor_timeline().is_some() || !timeline_files.is_empty(),
    -        "Timeline has no ancestor and no layer files"
         );

    -    Ok((metadata, timeline_files))
    +    // drop the guard until after we've spawned the task so that timeline shutdown will wait for the task
    +    drop(guard);
    +
    +    Ok(wait_task_done)
     }
   diff --git a/vendor/postgres-v14 b/vendor/postgres-v14
   index da50d99db..360ff1c63 160000
   --- a/vendor/postgres-v14
   +++ b/vendor/postgres-v14
   @@ -1 +1 @@
   -Subproject commit da50d99db54848f7a3e910f920aaad7dc6915d36
   +Subproject commit 360ff1c637a57d351a7a5a391d8e8afd8fde8c3a
   diff --git a/vendor/postgres-v15 b/vendor/postgres-v15
   index 780c3f8e3..d31b3f7c6 160000
   --- a/vendor/postgres-v15
   +++ b/vendor/postgres-v15
   @@ -1 +1 @@
   -Subproject commit 780c3f8e3524c2e32a2e28884c7b647fcebf71d7
   +Subproject commit d31b3f7c6d108e52c8bb11e812ce4e266501ea3d
2022-11-25 12:19:53 -05:00
Heikki Linnakangas
e51f2be3d0 Fix physical size tracking when files are downloaded.
Includes a test.
2022-11-25 19:07:53 +02:00
Christian Schwarz
4e4bc8fbde Tenant::load remove obsolete FIXME, add .with_context(): see https://github.com/neondatabase/neon/pull/2785#discussion_r1030581703 2022-11-25 11:43:50 -05:00
Christian Schwarz
dd2a77c2ef Abort uploads if the tenant/timeline is requested to shut down
- Introduce another UploadQueue::Stopped enum variant to indicate the state
  where the UploadQueue is shut down.
- In perform_upload_task, wait concurrently for tenant/timeline
  shutdown. If we are requested to shut down, the first in-progress tasks
  that notices the shutdown request transitions the queue from
  UploadQueue::Initialized to UploadQueue::Stopped state.
  This involves dropping all the queued ops that are not yet
  in progress, which conveniently unblocks wait_completion() calls
  that are waiting for their barrier to be executed.
  They will receive an Err(), and do something sensible.

Right now, wait_completion() is only used by tests, but I
suspect that we should be using it in wherever we
delete layer files, e.g., GC and compaction, as explained
in the storage_sync.rs block comment section "Consistency".

This change also fixes
test_timeline_deletion_with_files_stuck_in_upload_queue
which I added in the previous commit.
Before, timeline delete would wait until all in-progress
tasks and queued tasks were done.
If, like in the test, a task was stuck due to upload error,
timeline deletion would wait forever. Now it gets an error.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-11-25 16:32:31 +02:00
Egor Suvorov
ae53dc3326 Add authentication between Safekeeper and Pageserver/Compute
* Fix https://github.com/neondatabase/neon/issues/1854
* Never log Safekeeper::conninfo in walproposer as it now contains a secret token
* control_panel, test_runner: generate and pass JWT tokens for Safekeeper to compute and pageserver
* Compute: load JWT token for Safekepeer from the environment variable. Do not reuse the token from
  pageserver_connstring because it's embedded in there weirdly.
* Pageserver: load JWT token for Safekeeper from the environment variable.
* Rewrite docs/authentication.md
2022-11-25 04:17:42 +03:00
Egor Suvorov
1ca76776d0 pageserver: require management permissions on HTTP /status 2022-11-25 04:17:42 +03:00
Egor Suvorov
2ce5d8137d Separate permission checks for Pageserver and Safekeeper
There will be different scopes for those two, so authorization code should be different.

The `check_permission` function is now not in the shared library. Its implementation
is very similar to the one which will be added for Safekeeper. In fact, we may reuse
the same existing root-like 'PageServerApi' scope, but I would prefer to have separate
root-like scopes for services.

Also, generate_management_token in tests is generate_pageserver_token now.
2022-11-25 04:17:42 +03:00
Christian Schwarz
206b5d2ada remove obsolete TODO (see github PR discussion https://github.com/neondatabase/neon/pull/2785#discussion_r1030547179 ) 2022-11-24 13:20:38 -05:00
Christian Schwarz
77fea61fcc address FIXME in tenant_attach regarding tenant config 2022-11-24 13:20:38 -05:00
Christian Schwarz
e3f4c0e4ac remove FIXME addressed by 'On tenant load, start WAL receivers only after all timelines have been loaded.' 2022-11-24 13:20:38 -05:00
Christian Schwarz
2302ecda04 don't run background loops in unit tests 2022-11-24 13:20:38 -05:00
Heikki Linnakangas
5257bbe2b9 Remove information-free ".context" messages
We capture stack traces of all errors, so these don't really add any
value. As a thought experiment, if we had to add a line like this,
with the function name in it, every time we use the ?-operator, we're
doing something wrong.

test_tenants.py::test_tenant_creation_fails creates a failpoint and
checks that the error returned by the pageserver contains the
failpoint name, and that was failing because it wasn't on the first
line of the error. We should probably improve our error-scraping logic
in the tests to not rely so heavily on string matching, but that's a
different topic.

FWIW, these are also pretty unlikely to fail in practice.
2022-11-24 19:20:17 +01:00
Heikki Linnakangas
6b61ed5fab Fix clippy warning and some typos 2022-11-24 19:20:17 +01:00
Christian Schwarz
f28bf70596 tenant creation: re-use load_local_tenant() 2022-11-24 19:20:17 +01:00
Heikki Linnakangas
dc9c33139b On tenant load, start WAL receivers only after all timelines have been loaded.
And similarly on attach. This way, if the tenant load/attach fails
halfway through, we don't have any leftover WAL receivers still
running on the broken tenant.
2022-11-24 19:20:17 +01:00
Heikki Linnakangas
0260ee23b9 Start background loops on create_tenant correctly.
This was caught by the test_gc_cutoff test.
2022-11-24 19:20:17 +01:00
Christian Schwarz
1022b4b98f use utils::failpoint_sleep_millis_async!("attach-before-activate")
The detach_while_attaching test still passes, presumably because
the two request execute on different OS threads.
2022-11-24 19:20:17 +01:00
Heikki Linnakangas
791eebefe2 Silence clippy warning 2022-11-24 19:20:17 +01:00
Heikki Linnakangas
50b686c3e4 rustfmt 2022-11-24 19:20:17 +01:00
Heikki Linnakangas
39b10696e9 Fix unit tests.
`activate` is now more strict and errors out if the tenant is already
Active.
2022-11-24 19:20:17 +01:00
Heikki Linnakangas
264b0ada9f Handle concurrent detach and attach more gracefully.
If tenant detach is requested while the tenant is still in Attaching
state, we set the state to Paused, but when the attach completed, it
changed it to Active again, and worse, it started the background jobs.
To fix, rewrite the set_state() function so that when you activate a
tenant that is already in Paused state, it stays in Paused state and
we don't start the background loops.
2022-11-24 19:20:17 +01:00
Heikki Linnakangas
78338f7b94 Remove background_jobs_enabled, move code from tenant_mgr.rs to tenant.rs 2022-11-24 19:20:17 +01:00
Heikki Linnakangas
0d533ce840 Test detach while attach is still in progress 2022-11-24 19:20:17 +01:00
Christian Schwarz
978f1879b9 fix typo in storage_sync.rs module comment
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2022-11-24 19:17:41 +01:00
Egor Suvorov
b6989e8928 pageserver: make wal_source_connstring: String a 'wal_source_connconf: PgConnectionConfig` 2022-11-24 14:02:23 +03:00
Heikki Linnakangas
99d9c23df5 Gather path-related consts and functions to one place.
Feels more organized this way.
2022-11-24 12:26:15 +02:00
Konstantin Knizhnik
a6e4a3c3ef Implement corrent truncation of FSM/VM forks on arbitrary position (#2609)
refer #2601

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2022-11-23 18:46:07 +02:00
Christian Schwarz
c863d679f8 storage_sync: update module doc comment to reflect our changes 2022-11-23 11:46:03 -05:00
Alexey Kondratov
4bf3087aed [pageserver] list latest_gc_cutoff_lsn in the OpenAPI spec (#2894)
It seems that it's present in the API response for quite a while. It's
just not listed in the spec, fix it.
2022-11-22 21:10:49 +01:00
Christian Schwarz
c61731a31f metric pageserver_remote_upload_queue_unfinished_tasks: add labels for file & op kind, and check for them in the test
Fails right now because turns out we don't actually generate layer
removal tasks with the current test code. That will be the next commit.
2022-11-22 12:13:38 -05:00
Heikki Linnakangas
86e483f87b Fix tenant size modeling code to include WAL at end of branch
Imagine that you have a tenant with a single branch like this:

---------------==========>
               ^
	    gc horizon
where:

----  is the portion of the branch that is older than retention period
====  is the portion of the branch that is newer than retention period.

Before this commit, the sizing model included the logical size at the
GC horizon, but not the WAL after that. In particular, that meant that
on a newly created tenant with just one timeline, where the retention
period covered the whole history of the timeline, i.e. gc_cutoff was 0,
the calculated tenant size was always zero.

We now include the WAL after the GC horizon in the size. So in the
above example, the calculated tenant size would be the logical size
of the database the GC horizon, plus all the WAL after it (marked with
===).

This adds a new `insert_point` function to the sizing model, alongside
`modify_branch`, and changes the code in size.rs to use the new
function. The new function takes an absolute lsn and logical size as
argument, so we no longer need to calculate the difference to the
previous point. Also, the end-size is now optional, because we now
need to add a point to represent the end of each branch to the model,
but we don't want to or need to calculate the logical size at that
point.
2022-11-22 17:11:27 +02:00
Heikki Linnakangas
d1b92e976a When a new layer file is created in compaction, also upload it.
I can't believe this was missing..
2022-11-22 16:35:31 +02:00
Heikki Linnakangas
d6c1a9aa18 Fix formatting 2022-11-22 15:02:09 +02:00
Heikki Linnakangas
3c4680b718 Turn upload_queue_items_metric into a regular function 2022-11-22 14:45:22 +02:00
Christian Schwarz
6b7cbec9b3 WIP: add test for storage_sync upload retries 2022-11-22 05:49:46 -05:00
Christian Schwarz
6de9a33c05 metric REMOTE_UPLOAD_QUEUE_UNFINISHED_TASKS 2022-11-22 05:49:46 -05:00
Christian Schwarz
77f883c95c prometheus metrics for storage_sync2
Extracted from https://github.com/neondatabase/neon/pull/2595

Use pub(crate) to highlight unused metrics.
2022-11-22 05:49:25 -05:00
Alexander Stanovoy
a5b898a31c Fix the order of checks in LSN (#2882)
We should check if LSN is in the lower range because it's constant and
only after wait for LSN to arrive if needed.
2022-11-22 02:28:41 +02:00
Heikki Linnakangas
e6557f4f91 Silence test failures, where we operate on a tenant before it's loaded
Saw a failure like this, from 'test_tenants_attached_after_download' and
'test_tenant_redownloads_truncated_file_on_startup':

> test_runner/fixtures/neon_fixtures.py:1064: in verbose_error
>     res.raise_for_status()
> /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/lib/python3.9/site-packages/requests/models.py:1021: in raise_for_status
>     raise HTTPError(http_error_msg, response=self)
> E   requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:18150/v1/tenant/2334c9c113a82b5dd1651a0a23c53448/timeline
>
> The above exception was the direct cause of the following exception:
> test_runner/regress/test_tenants_with_remote_storage.py:185: in test_tenants_attached_after_download
>     restored_timelines = client.timeline_list(tenant_id)
> test_runner/fixtures/neon_fixtures.py:1148: in timeline_list
>     self.verbose_error(res)
> test_runner/fixtures/neon_fixtures.py:1070: in verbose_error
>     raise PageserverApiException(msg) from e
> E   fixtures.neon_fixtures.PageserverApiException: NotFound: Tenant 2334c9c113a82b5dd1651a0a23c53448 is not active. Current state: Loading

These tests starts the pageserver, wait until
assert_no_in_progress_downloads_for_tenant says that
has_downloads_in_progress is false, and then call timeline_list on the
tenant. But has_downloads_in_progress was only returned as true when
the tenant was being attached, not when it was being loaded at
pageserver startup. Change tenant_status API endpoint
(/v1/tenant/:tenant_id) so that it returns
has_downloads_in_progress=true also for tenants that are still in
Loading state.
2022-11-21 20:50:42 +01:00
Heikki Linnakangas
e8db20eb26 On connection from compute, wait for tenant to become Active.
If a connection from compute arrives while a tenant is still in
Loading state, wait for it to become Active instead of throwing an
error to the client. This should fix the errors from test_gc_cutoff
test that repeatedly restarts the pageserver and immediately tries to
connect to it.
2022-11-21 20:50:42 +01:00