Conflicts:
libs/pageserver_api/src/models.rs
pageserver/src/lib.rs
pageserver/src/tenant_mgr.rs
There was a merge conflict following attach_tenant() where
I didn't understand why Git called out a conflict.
I went through the changes in `origin/main` since the last
merge done by Heikki, couldn't find anything that would
conflict there.
Original git diff right after after `git merge` follows:
diff --cc libs/pageserver_api/src/models.rs
index 750585b58,aefd79336..000000000
--- a/libs/pageserver_api/src/models.rs
+++ b/libs/pageserver_api/src/models.rs
@@@ -15,17 -15,13 +15,27 @@@ use bytes::{BufMut, Bytes, BytesMut}
/// A state of a tenant in pageserver's memory.
#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
pub enum TenantState {
++<<<<<<< HEAD
+ // This tenant is being loaded from local disk
+ Loading,
+ // This tenant is being downloaded from cloud storage.
+ Attaching,
+ /// Tenant is fully operational
+ Active,
+ /// A tenant is recognized by pageserver, but it is being detached or the system is being
+ /// shut down.
+ Paused,
+ /// A tenant is recognized by the pageserver, but can no longer used for any operations,
+ /// because it failed to get activated.
++=======
+ /// Tenant is fully operational, its background jobs might be running or not.
+ Active { background_jobs_running: bool },
+ /// A tenant is recognized by pageserver, but it is being detached or the
+ /// system is being shut down.
+ Paused,
+ /// A tenant is recognized by the pageserver, but can no longer be used for
+ /// any operations, because it failed to be activated.
++>>>>>>> origin/main
Broken,
}
diff --cc pageserver/src/lib.rs
index 2d5b66f57,e3112223e..000000000
--- a/pageserver/src/lib.rs
+++ b/pageserver/src/lib.rs
@@@ -22,7 -23,11 +23,13 @@@ pub mod walreceiver
pub mod walrecord;
pub mod walredo;
++<<<<<<< HEAD
++=======
+ use std::collections::HashMap;
+ use std::path::Path;
+
++>>>>>>> origin/main
use tracing::info;
-use utils:🆔:{TenantId, TimelineId};
use crate::task_mgr::TaskKind;
@@@ -103,14 -108,51 +110,64 @@@ fn exponential_backoff_duration_seconds
}
}
++<<<<<<< HEAD
+/// A suffix to be used during file sync from the remote storage,
+/// to ensure that we do not leave corrupted files that pretend to be layers.
+const TEMP_FILE_SUFFIX: &str = "___temp";
++=======
+ /// A newtype to store arbitrary data grouped by tenant and timeline ids.
+ /// One could use [`utils:🆔:TenantTimelineId`] for grouping, but that would
+ /// not include the cases where a certain tenant has zero timelines.
+ /// This is sometimes important: a tenant could be registered during initial load from FS,
+ /// even if he has no timelines on disk.
+ #[derive(Debug)]
+ pub struct TenantTimelineValues<T>(HashMap<TenantId, HashMap<TimelineId, T>>);
+
+ impl<T> TenantTimelineValues<T> {
+ fn new() -> Self {
+ Self(HashMap::new())
+ }
+ }
+
+ /// The name of the metadata file pageserver creates per timeline.
+ /// Full path: `tenants/<tenant_id>/timelines/<timeline_id>/metadata`.
+ pub const METADATA_FILE_NAME: &str = "metadata";
+
+ /// Per-tenant configuration file.
+ /// Full path: `tenants/<tenant_id>/config`.
+ pub const TENANT_CONFIG_NAME: &str = "config";
+
+ /// A suffix used for various temporary files. Any temporary files found in the
+ /// data directory at pageserver startup can be automatically removed.
+ pub const TEMP_FILE_SUFFIX: &str = "___temp";
+
+ /// A marker file to mark that a timeline directory was not fully initialized.
+ /// If a timeline directory with this marker is encountered at pageserver startup,
+ /// the timeline directory and the marker file are both removed.
+ /// Full path: `tenants/<tenant_id>/timelines/<timeline_id>___uninit`.
+ pub const TIMELINE_UNINIT_MARK_SUFFIX: &str = "___uninit";
+
+ pub fn is_temporary(path: &Path) -> bool {
+ match path.file_name() {
+ Some(name) => name.to_string_lossy().ends_with(TEMP_FILE_SUFFIX),
+ None => false,
+ }
+ }
+
+ pub fn is_uninit_mark(path: &Path) -> bool {
+ match path.file_name() {
+ Some(name) => name
+ .to_string_lossy()
+ .ends_with(TIMELINE_UNINIT_MARK_SUFFIX),
+ None => false,
++ }
++}
++>>>>>>> origin/main
+
+pub fn is_temporary(path: &std::path::Path) -> bool {
+ match path.file_name() {
+ Some(name) => name.to_string_lossy().ends_with(TEMP_FILE_SUFFIX),
+ None => false,
}
}
diff --cc pageserver/src/tenant_mgr.rs
index 73593bc48,061d7fa19..000000000
--- a/pageserver/src/tenant_mgr.rs
+++ b/pageserver/src/tenant_mgr.rs
@@@ -13,11 -13,18 +13,22 @@@ use tracing::*
use remote_storage::GenericRemoteStorage;
use crate::config::PageServerConf;
++<<<<<<< HEAD
++=======
+ use crate::http::models::TenantInfo;
+ use crate::storage_sync::index::{LayerFileMetadata, RemoteIndex, RemoteTimelineIndex};
+ use crate::storage_sync::{self, LocalTimelineInitStatus, SyncStartupData, TimelineLocalFiles};
++>>>>>>> origin/main
use crate::task_mgr::{self, TaskKind};
-use crate::tenant::{
- ephemeral_file::is_ephemeral_file, metadata::TimelineMetadata, Tenant, TenantState,
-};
+use crate::tenant::{Tenant, TenantState};
use crate::tenant_config::TenantConfOpt;
++<<<<<<< HEAD
++=======
+ use crate::walredo::PostgresRedoManager;
+ use crate::{is_temporary, is_uninit_mark, METADATA_FILE_NAME, TEMP_FILE_SUFFIX};
++>>>>>>> origin/main
-use utils::crashsafe::{self, path_with_suffix_extension};
+use utils::fs_ext::PathExt;
use utils:🆔:{TenantId, TimelineId};
mod tenants_state {
@@@ -341,87 -521,334 +352,247 @@@ pub fn list_tenants() -> Vec<(TenantId
.collect()
}
-#[derive(Debug)]
-pub enum TenantAttachData {
- Ready(HashMap<TimelineId, TimelineLocalFiles>),
- Broken(anyhow::Error),
-}
-/// Attempts to collect information about all tenant and timelines, existing on the local FS.
-/// If finds any, deletes all temporary files and directories, created before. Also removes empty directories,
-/// that may appear due to such removals.
-/// Does not fail on particular timeline or tenant collection errors, rather logging them and ignoring the entities.
-fn local_tenant_timeline_files(
- config: &'static PageServerConf,
-) -> anyhow::Result<HashMap<TenantId, TenantAttachData>> {
- let _entered = info_span!("local_tenant_timeline_files").entered();
-
- let mut local_tenant_timeline_files = HashMap::new();
- let tenants_dir = config.tenants_path();
- for tenants_dir_entry in fs::read_dir(&tenants_dir)
- .with_context(|| format!("Failed to list tenants dir {}", tenants_dir.display()))?
- {
- match &tenants_dir_entry {
- Ok(tenants_dir_entry) => {
- let tenant_dir_path = tenants_dir_entry.path();
- if is_temporary(&tenant_dir_path) {
- info!(
- "Found temporary tenant directory, removing: {}",
- tenant_dir_path.display()
- );
- if let Err(e) = fs::remove_dir_all(&tenant_dir_path) {
- error!(
- "Failed to remove temporary directory '{}': {:?}",
- tenant_dir_path.display(),
- e
- );
- }
- } else {
- match collect_timelines_for_tenant(config, &tenant_dir_path) {
- Ok((tenant_id, TenantAttachData::Broken(e))) => {
- local_tenant_timeline_files.entry(tenant_id).or_insert(TenantAttachData::Broken(e));
- },
- Ok((tenant_id, TenantAttachData::Ready(collected_files))) => {
- if collected_files.is_empty() {
- match remove_if_empty(&tenant_dir_path) {
- Ok(true) => info!("Removed empty tenant directory {}", tenant_dir_path.display()),
- Ok(false) => {
- // insert empty timeline entry: it has some non-temporary files inside that we cannot remove
- // so make obvious for HTTP API callers, that something exists there and try to load the tenant
- let _ = local_tenant_timeline_files.entry(tenant_id).or_insert_with(|| TenantAttachData::Ready(HashMap::new()));
- },
- Err(e) => error!("Failed to remove empty tenant directory: {e:?}"),
- }
- } else {
- match local_tenant_timeline_files.entry(tenant_id) {
- hash_map::Entry::Vacant(entry) => {
- entry.insert(TenantAttachData::Ready(collected_files));
- }
- hash_map::Entry::Occupied(entry) =>{
- if let TenantAttachData::Ready(old_timelines) = entry.into_mut() {
- old_timelines.extend(collected_files);
- }
- },
- }
- }
- },
- Err(e) => error!(
- "Failed to collect tenant files from dir '{}' for entry {:?}, reason: {:#}",
- tenants_dir.display(),
- tenants_dir_entry,
- e
- ),
- }
+/// Execute Attach mgmt API command.
+///
+/// Downloading all the tenant data is performed in the background, this merely
+/// spawns the background task and returns quickly.
+pub async fn attach_tenant(
+ conf: &'static PageServerConf,
+ tenant_id: TenantId,
+ remote_storage: &GenericRemoteStorage,
+) -> anyhow::Result<()> {
+ match tenants_state::write_tenants().entry(tenant_id) {
+ hash_map::Entry::Occupied(e) => {
+ // Cannot attach a tenant that already exists. The error message depends on
+ // the state it's in.
+ match e.get().current_state() {
+ TenantState::Attaching => {
+ anyhow::bail!("tenant {tenant_id} attach is already in progress")
}
++<<<<<<< HEAD
+ current_state => {
+ anyhow::bail!("tenant already exists, current state: {current_state:?}")
++=======
+ }
+ Err(e) => error!(
+ "Failed to list tenants dir entry {:?} in directory {}, reason: {:?}",
+ tenants_dir_entry,
+ tenants_dir.display(),
+ e
+ ),
+ }
+ }
+
+ info!(
+ "Collected files for {} tenants",
+ local_tenant_timeline_files.len(),
+ );
+ Ok(local_tenant_timeline_files)
+ }
+
+ fn remove_if_empty(tenant_dir_path: &Path) -> anyhow::Result<bool> {
+ let directory_is_empty = tenant_dir_path
+ .read_dir()
+ .with_context(|| {
+ format!(
+ "Failed to read directory '{}' contents",
+ tenant_dir_path.display()
+ )
+ })?
+ .next()
+ .is_none();
+
+ if directory_is_empty {
+ fs::remove_dir_all(&tenant_dir_path).with_context(|| {
+ format!(
+ "Failed to remove empty directory '{}'",
+ tenant_dir_path.display(),
+ )
+ })?;
+
+ Ok(true)
+ } else {
+ Ok(false)
+ }
+ }
+
+ fn collect_timelines_for_tenant(
+ config: &'static PageServerConf,
+ tenant_path: &Path,
+ ) -> anyhow::Result<(TenantId, TenantAttachData)> {
+ let tenant_id = tenant_path
+ .file_name()
+ .and_then(OsStr::to_str)
+ .unwrap_or_default()
+ .parse::<TenantId>()
+ .context("Could not parse tenant id out of the tenant dir name")?;
+ let timelines_dir = config.timelines_path(&tenant_id);
+
+ if !timelines_dir.as_path().is_dir() {
+ return Ok((
+ tenant_id,
+ TenantAttachData::Broken(anyhow::anyhow!(
+ "Tenant {} has no timelines directory at {}",
+ tenant_id,
+ timelines_dir.display()
+ )),
+ ));
+ }
+
+ let mut tenant_timelines = HashMap::new();
+ for timelines_dir_entry in fs::read_dir(&timelines_dir)
+ .with_context(|| format!("Failed to list timelines dir entry for tenant {tenant_id}"))?
+ {
+ match timelines_dir_entry {
+ Ok(timelines_dir_entry) => {
+ let timeline_dir = timelines_dir_entry.path();
+ if is_temporary(&timeline_dir) {
+ info!(
+ "Found temporary timeline directory, removing: {}",
+ timeline_dir.display()
+ );
+ if let Err(e) = fs::remove_dir_all(&timeline_dir) {
+ error!(
+ "Failed to remove temporary directory '{}': {:?}",
+ timeline_dir.display(),
+ e
+ );
+ }
+ } else if is_uninit_mark(&timeline_dir) {
+ let timeline_uninit_mark_file = &timeline_dir;
+ info!(
+ "Found an uninit mark file {}, removing the timeline and its uninit mark",
+ timeline_uninit_mark_file.display()
+ );
+ let timeline_id = timeline_uninit_mark_file
+ .file_stem()
+ .and_then(OsStr::to_str)
+ .unwrap_or_default()
+ .parse::<TimelineId>()
+ .with_context(|| {
+ format!(
+ "Could not parse timeline id out of the timeline uninit mark name {}",
+ timeline_uninit_mark_file.display()
+ )
+ })?;
+ let timeline_dir = config.timeline_path(&timeline_id, &tenant_id);
+ if let Err(e) =
+ remove_timeline_and_uninit_mark(&timeline_dir, timeline_uninit_mark_file)
+ {
+ error!("Failed to clean up uninit marked timeline: {e:?}");
+ }
+ } else {
+ let timeline_id = timeline_dir
+ .file_name()
+ .and_then(OsStr::to_str)
+ .unwrap_or_default()
+ .parse::<TimelineId>()
+ .with_context(|| {
+ format!(
+ "Could not parse timeline id out of the timeline dir name {}",
+ timeline_dir.display()
+ )
+ })?;
+ let timeline_uninit_mark_file =
+ config.timeline_uninit_mark_file_path(tenant_id, timeline_id);
+ if timeline_uninit_mark_file.exists() {
+ info!("Found an uninit mark file for timeline {tenant_id}/{timeline_id}, removing the timeline and its uninit mark");
+ if let Err(e) = remove_timeline_and_uninit_mark(
+ &timeline_dir,
+ &timeline_uninit_mark_file,
+ ) {
+ error!("Failed to clean up uninit marked timeline: {e:?}");
+ }
+ } else {
+ match collect_timeline_files(&timeline_dir) {
+ Ok((metadata, timeline_files)) => {
+ tenant_timelines.insert(
+ timeline_id,
+ TimelineLocalFiles::collected(metadata, timeline_files),
+ );
+ }
+ Err(e) => {
+ error!(
+ "Failed to process timeline dir contents at '{}', reason: {:?}",
+ timeline_dir.display(),
+ e
+ );
+ match remove_if_empty(&timeline_dir) {
+ Ok(true) => info!(
+ "Removed empty timeline directory {}",
+ timeline_dir.display()
+ ),
+ Ok(false) => (),
+ Err(e) => {
+ error!("Failed to remove empty timeline directory: {e:?}")
+ }
+ }
+ }
+ }
+ }
++>>>>>>> origin/main
}
}
- Err(e) => {
- error!("Failed to list timelines for entry tenant {tenant_id}, reason: {e:?}")
- }
+ }
+ hash_map::Entry::Vacant(v) => {
+ let tenant = Tenant::spawn_attach(conf, tenant_id, remote_storage)?;
+ v.insert(tenant);
+ Ok(())
}
}
-
- if tenant_timelines.is_empty() {
- // this is normal, we've removed all broken, empty and temporary timeline dirs
- // but should allow the tenant to stay functional and allow creating new timelines
- // on a restart, we require tenants to have the timelines dir, so leave it on disk
- debug!("Tenant {tenant_id} has no timelines loaded");
- }
-
- Ok((tenant_id, TenantAttachData::Ready(tenant_timelines)))
}
-fn remove_timeline_and_uninit_mark(timeline_dir: &Path, uninit_mark: &Path) -> anyhow::Result<()> {
- fs::remove_dir_all(&timeline_dir)
- .or_else(|e| {
- if e.kind() == std::io::ErrorKind::NotFound {
- // we can leave the uninit mark without a timeline dir,
- // just remove the mark then
- Ok(())
- } else {
- Err(e)
- }
- })
- .with_context(|| {
- format!(
- "Failed to remove unit marked timeline directory {}",
- timeline_dir.display()
- )
- })?;
- fs::remove_file(&uninit_mark).with_context(|| {
- format!(
- "Failed to remove timeline uninit mark file {}",
- uninit_mark.display()
- )
- })?;
+#[cfg(feature = "testing")]
+use {
+ crate::repository::GcResult, pageserver_api::models::TimelineGcRequest,
+ utils::http::error::ApiError,
+};
- Ok(())
-}
+#[cfg(feature = "testing")]
+pub fn immediate_gc(
+ tenant_id: TenantId,
+ timeline_id: TimelineId,
+ gc_req: TimelineGcRequest,
+) -> Result<tokio::sync::oneshot::Receiver<Result<GcResult, anyhow::Error>>, ApiError> {
+ let guard = tenants_state::read_tenants();
-// discover timeline files and extract timeline metadata
-// NOTE: ephemeral files are excluded from the list
-fn collect_timeline_files(
- timeline_dir: &Path,
-) -> anyhow::Result<(TimelineMetadata, HashMap<PathBuf, LayerFileMetadata>)> {
- let mut timeline_files = HashMap::new();
- let mut timeline_metadata_path = None;
-
- let timeline_dir_entries =
- fs::read_dir(&timeline_dir).context("Failed to list timeline dir contents")?;
- for entry in timeline_dir_entries {
- let entry_path = entry.context("Failed to list timeline dir entry")?.path();
- let metadata = entry_path.metadata()?;
-
- if metadata.is_file() {
- if entry_path.file_name().and_then(OsStr::to_str) == Some(METADATA_FILE_NAME) {
- timeline_metadata_path = Some(entry_path);
- } else if is_ephemeral_file(&entry_path.file_name().unwrap().to_string_lossy()) {
- debug!("skipping ephemeral file {}", entry_path.display());
- continue;
- } else if is_temporary(&entry_path) {
- info!("removing temp timeline file at {}", entry_path.display());
- fs::remove_file(&entry_path).with_context(|| {
- format!(
- "failed to remove temp download file at {}",
- entry_path.display()
- )
- })?;
- } else {
- let layer_metadata = LayerFileMetadata::new(metadata.len());
- timeline_files.insert(entry_path, layer_metadata);
+ let tenant = guard
+ .get(&tenant_id)
+ .map(Arc::clone)
+ .with_context(|| format!("Tenant {tenant_id} not found"))
+ .map_err(ApiError::NotFound)?;
+
+ let gc_horizon = gc_req.gc_horizon.unwrap_or_else(|| tenant.get_gc_horizon());
+ // Use tenant's pitr setting
+ let pitr = tenant.get_pitr_interval();
+
+ // Run in task_mgr to avoid race with detach operation
+ let (task_done, wait_task_done) = tokio::sync::oneshot::channel();
+ task_mgr::spawn(
+ &tokio::runtime::Handle::current(),
+ TaskKind::GarbageCollector,
+ Some(tenant_id),
+ Some(timeline_id),
+ &format!("timeline_gc_handler garbage collection run for tenant {tenant_id} timeline {timeline_id}"),
+ false,
+ async move {
+ fail::fail_point!("immediate_gc_task_pre");
+ let result = tenant
+ .gc_iteration(Some(timeline_id), gc_horizon, pitr, true)
+ .instrument(info_span!("manual_gc", tenant = %tenant_id, timeline = %timeline_id))
+ .await;
+ // FIXME: `gc_iteration` can return an error for multiple reasons; we should handle it
+ // better once the types support it.
+ match task_done.send(result) {
+ Ok(_) => (),
+ Err(result) => error!("failed to send gc result: {result:?}"),
}
+ Ok(())
}
- }
-
- // FIXME (rodionov) if attach call succeeded, and then pageserver is restarted before download is completed
- // then attach is lost. There would be no retries for that,
- // initial collect will fail because there is no metadata.
- // We either need to start download if we see empty dir after restart or attach caller should
- // be aware of that and retry attach if awaits_download for timeline switched from true to false
- // but timelinne didn't appear locally.
- // Check what happens with remote index in that case.
- let timeline_metadata_path = match timeline_metadata_path {
- Some(path) => path,
- None => anyhow::bail!("No metadata file found in the timeline directory"),
- };
- let metadata = TimelineMetadata::from_bytes(
- &fs::read(&timeline_metadata_path).context("Failed to read timeline metadata file")?,
- )
- .context("Failed to parse timeline metadata file bytes")?;
-
- anyhow::ensure!(
- metadata.ancestor_timeline().is_some() || !timeline_files.is_empty(),
- "Timeline has no ancestor and no layer files"
);
- Ok((metadata, timeline_files))
+ // drop the guard until after we've spawned the task so that timeline shutdown will wait for the task
+ drop(guard);
+
+ Ok(wait_task_done)
}
diff --git a/vendor/postgres-v14 b/vendor/postgres-v14
index da50d99db..360ff1c63 160000
--- a/vendor/postgres-v14
+++ b/vendor/postgres-v14
@@ -1 +1 @@
-Subproject commit da50d99db54848f7a3e910f920aaad7dc6915d36
+Subproject commit 360ff1c637a57d351a7a5a391d8e8afd8fde8c3a
diff --git a/vendor/postgres-v15 b/vendor/postgres-v15
index 780c3f8e3..d31b3f7c6 160000
--- a/vendor/postgres-v15
+++ b/vendor/postgres-v15
@@ -1 +1 @@
-Subproject commit 780c3f8e3524c2e32a2e28884c7b647fcebf71d7
+Subproject commit d31b3f7c6d108e52c8bb11e812ce4e266501ea3d
* Fix https://github.com/neondatabase/neon/issues/1854
* Never log Safekeeper::conninfo in walproposer as it now contains a secret token
* control_panel, test_runner: generate and pass JWT tokens for Safekeeper to compute and pageserver
* Compute: load JWT token for Safekepeer from the environment variable. Do not reuse the token from
pageserver_connstring because it's embedded in there weirdly.
* Pageserver: load JWT token for Safekeeper from the environment variable.
* Rewrite docs/authentication.md
There will be different scopes for those two, so authorization code should be different.
The `check_permission` function is now not in the shared library. Its implementation
is very similar to the one which will be added for Safekeeper. In fact, we may reuse
the same existing root-like 'PageServerApi' scope, but I would prefer to have separate
root-like scopes for services.
Also, generate_management_token in tests is generate_pageserver_token now.
And similarly on attach. This way, if the tenant load/attach fails
halfway through, we don't have any leftover WAL receivers still
running on the broken tenant.
Downsides are:
* We store all components of the config separately. `Url` stores them inside a single
`String` and a bunch of ints which point to different parts of the URL, which is
probably more efficient.
* It is now impossible to pass arbitrary connection strings to the configuration file,
one has to support all components explicitly. However, we never supported anything
except for `host:port` anyway.
Upsides are:
* This significantly restricts the space of possible connection strings, some of which
may be either invalid or unsupported. E.g. Postgres' connection strings may include
a bunch of parameters as query (e.g. `connect_timeout=`, `options=`). These are nether
validated by the current implementation, nor passed to the postgres client library,
Hence, storing separate fields expresses the intention better.
* The same connection configuration may be represented as a URL in multiple ways
(e.g. either `password=` in the query part or a standard URL password).
Now we have a single canonical way.
* Escaping is provided for `options=`.
Other possibilities considered:
* `newtype` with a `String` inside and some validation on creation.
This is more efficient, but harder to log for two reasons:
* Passwords should never end up in logs, so we have to somehow
* Escaped `options=` are harder to read, especially if URL-encoded,
and we use `options=` a lot.
Which ought to replace etcd. This patch only adds the binary and adjusts
Dockerfile to include it; subsequent ones will add deploy of helm chart and the
actual replacement.
It is a simple and fast pub-sub message bus. In this patch only safekeeper
message is supported, but others can be easily added.
Compilation now requires protoc to be installed. Installing protobuf-compiler
package is fine for Debian/Ubuntu.
ref
https://github.com/neondatabase/neon/pull/2733https://github.com/neondatabase/neon/issues/2394
Imagine that you have a tenant with a single branch like this:
---------------==========>
^
gc horizon
where:
---- is the portion of the branch that is older than retention period
==== is the portion of the branch that is newer than retention period.
Before this commit, the sizing model included the logical size at the
GC horizon, but not the WAL after that. In particular, that meant that
on a newly created tenant with just one timeline, where the retention
period covered the whole history of the timeline, i.e. gc_cutoff was 0,
the calculated tenant size was always zero.
We now include the WAL after the GC horizon in the size. So in the
above example, the calculated tenant size would be the logical size
of the database the GC horizon, plus all the WAL after it (marked with
===).
This adds a new `insert_point` function to the sizing model, alongside
`modify_branch`, and changes the code in size.rs to use the new
function. The new function takes an absolute lsn and logical size as
argument, so we no longer need to calculate the difference to the
previous point. Also, the end-size is now optional, because we now
need to add a point to represent the end of each branch to the model,
but we don't want to or need to calculate the logical size at that
point.
Saw a failure like this, from 'test_tenants_attached_after_download' and
'test_tenant_redownloads_truncated_file_on_startup':
> test_runner/fixtures/neon_fixtures.py:1064: in verbose_error
> res.raise_for_status()
> /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/lib/python3.9/site-packages/requests/models.py:1021: in raise_for_status
> raise HTTPError(http_error_msg, response=self)
> E requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:18150/v1/tenant/2334c9c113a82b5dd1651a0a23c53448/timeline
>
> The above exception was the direct cause of the following exception:
> test_runner/regress/test_tenants_with_remote_storage.py:185: in test_tenants_attached_after_download
> restored_timelines = client.timeline_list(tenant_id)
> test_runner/fixtures/neon_fixtures.py:1148: in timeline_list
> self.verbose_error(res)
> test_runner/fixtures/neon_fixtures.py:1070: in verbose_error
> raise PageserverApiException(msg) from e
> E fixtures.neon_fixtures.PageserverApiException: NotFound: Tenant 2334c9c113a82b5dd1651a0a23c53448 is not active. Current state: Loading
These tests starts the pageserver, wait until
assert_no_in_progress_downloads_for_tenant says that
has_downloads_in_progress is false, and then call timeline_list on the
tenant. But has_downloads_in_progress was only returned as true when
the tenant was being attached, not when it was being loaded at
pageserver startup. Change tenant_status API endpoint
(/v1/tenant/:tenant_id) so that it returns
has_downloads_in_progress=true also for tenants that are still in
Loading state.
Despite tests working, on staging the library started to fail with the
following error:
```
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 2022-11-16T11:53:37.191211Z INFO init_tenant_mgr:local_tenant_timeline_files: Collected files for 16 tenants
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: thread 'main' panicked at 'A connector was not available. Either set a custom connector or enable the `rustls` and `native-tls` crate featu>
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: stack backtrace:
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 0: rust_begin_unwind
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 1: core::panicking::panic_fmt
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 2: core::panicking::panic_display
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:72:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 3: core::panicking::panic_str
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:56:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 4: core::option::expect_failed
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/option.rs:1854:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 8: <aws_types::credentials::provider::future::ProvideCredentials as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 9: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 11: <aws_types::credentials::provider::future::ProvideCredentials as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 14: <aws_smithy_http_tower::map_request::MapRequestFuture<F,E> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 15: <core::pin::Pin<P> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 16: <aws_smithy_http_tower::parse_response::ParseResponseService<InnerService,ResponseHandler,RetryPolicy> as tower_service::Service<aws_>
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-smithy-http-tower-0.51.0/src/parse_response.rs:109:34
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 17: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 18: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 19: <core::pin::Pin<P> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 20: <aws_smithy_client::timeout::TimeoutServiceFuture<InnerFuture> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-smithy-client-0.51.0/src/timeout.rs:189:70
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 21: <tower::retry::future::ResponseFuture<P,S,Request> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/retry/future.rs:77:41
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 22: <aws_smithy_client::timeout::TimeoutServiceFuture<InnerFuture> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-smithy-client-0.51.0/src/timeout.rs:189:70
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 23: aws_smithy_client::Client<C,M,R>::call_raw::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-smithy-client-0.51.0/src/lib.rs:227:56
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 24: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 25: aws_smithy_client::Client<C,M,R>::call::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-smithy-client-0.51.0/src/lib.rs:184:29
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 26: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 27: aws_sdk_s3::client::fluent_builders::GetObject::send::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/aws-sdk-s3-0.21.0/src/client.rs:7735:40
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 28: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 29: remote_storage::s3_bucket::S3Bucket::download_object::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at libs/remote_storage/src/s3_bucket.rs:205:20
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 30: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 31: <remote_storage::s3_bucket::S3Bucket as remote_storage::RemoteStorage>::download::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at libs/remote_storage/src/s3_bucket.rs:399:11
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 32: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 33: <core::pin::Pin<P> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 34: remote_storage::GenericRemoteStorage::download_storage_object::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at libs/remote_storage/src/lib.rs:264:55
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 35: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 36: pageserver::storage_sync::download::download_index_part::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/storage_sync/download.rs:148:57
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 37: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 38: pageserver::storage_sync::download::download_index_parts::{{closure}}::{{closure}}::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/storage_sync/download.rs:77:75
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 39: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 40: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.24/src/stream/futures_unordered/mod.rs:514:17
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 41: futures_util::stream::stream::StreamExt::poll_next_unpin
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.24/src/stream/stream/mod.rs:1626:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 42: <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.24/src/stream/stream/next.rs:32:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 43: pageserver::storage_sync::download::download_index_parts::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/storage_sync/download.rs:80:69
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 44: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 45: tokio::park:🧵:CachedParkThread::block_on::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/park/thread.rs:267:54
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 46: tokio::coop::with_budget::{{closure}}
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/coop.rs:102:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 47: std:🧵:local::LocalKey<T>::try_with
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/thread/local.rs:445:16
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 48: std:🧵:local::LocalKey<T>::with
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/thread/local.rs:421:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 49: tokio::coop::with_budget
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/coop.rs:95:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 50: tokio::coop::budget
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/coop.rs:72:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 51: tokio::park:🧵:CachedParkThread::block_on
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/park/thread.rs:267:31
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 52: tokio::runtime::enter::Enter::block_on
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/runtime/enter.rs:152:13
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 53: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/mod.rs:79:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 54: tokio::runtime::Runtime::block_on
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /home/nonroot/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.1/src/runtime/mod.rs:492:44
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 55: pageserver::storage_sync::spawn_storage_sync_task
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/storage_sync.rs:656:34
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 56: pageserver::tenant_mgr::init_tenant_mgr
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/tenant_mgr.rs:88:13
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 57: pageserver::start_pageserver
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/bin/pageserver.rs:269:9
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 58: pageserver::main
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at pageserver/src/bin/pageserver.rs:103:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: 59: core::ops::function::FnOnce::call_once
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
Nov 16 11:53:37 pageserver-0.us-east-2.aws.neon.build pageserver[481974]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
Feels like better testing on the env is needed later, maybe more e2e
tests have to be written (albeit we have download tests, so something
else happens here, tls issues?)
This change introduces a marker file
$repo/tenants/$tenant_id/attaching
that is present while a tenant is in Attaching state.
When pageserver restarts, we use it to resume the tenant attach operation.
Before this change, a crash during tenant attach would result in one of
the following:
1. crash upon restart due to missing metadata file (IIRC)
2. "successful" loading of the tenant with a subset of timelines
This is a part of https://github.com/neondatabase/neon/pull/2595.
It takes out switch to per tenant upload queue and changes to pageserver
startup sequence because these two are highly interleaved with each
other. I'm still not happy with the size of the diff, but splitting
it even more will probably consume even more time. Ideally we should
do it, but this patch isis already a step forward and should be
easier to get this patch in yet still quite difficult.
Mainly because of the size and fixes for existing concerns which
will extend the diff even further
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Tenant size information is gathered by using existing parts of
`Tenant::gc_iteration` which are now separated as
`Tenant::refresh_gc_info`. `Tenant::refresh_gc_info` collects branch
points, and invokes `Timeline::update_gc_info`; nothing was supposed to
be changed there. The gathered branch points (through Timeline's
`GcInfo::retain_lsns`), `GcInfo::horizon_cutoff`, and
`GcInfo::pitr_cutoff` are used to build up a Vec of updates fed into the
`libs/tenant_size_model` to calculate the history size.
The gathered information is now exposed using `GET
/v1/tenant/{tenant_id}/size`, which which will respond with the actual
calculated size. Initially the idea was to have this delivered as tenant
background task and exported via metric, but it might be too
computationally expensive to run it periodically as we don't yet know if
the returned values are any good.
Adds one new metric:
- pageserver_storage_operations_seconds with label `logical_size`
- separating from original `init_logical_size`
Adds a pageserver wide configuration variable:
- `concurrent_tenant_size_logical_size_queries` with default 1
This leaves a lot of TODO's, tracked on issue #2748.
* Support configuring the log format as json or plain.
Separately test json and plain logger. They would be competing on the
same global subscriber otherwise.
* Implement log_format for pageserver config
* Implement configurable log format for safekeeper.
Similar to https://github.com/neondatabase/neon/pull/2395, introduces a state field in Timeline, that's possible to subscribe to.
Adjusts
* walreceiver to not to have any connections if timeline is not Active
* remote storage sync to not to schedule uploads if timeline is Broken
* not to create timelines if a tenant/timeline is broken
* automatically switches timelines' states based on tenant state
Does not adjust timeline's gc, checkpointing and layer flush behaviour much, since it's not safe to cancel these processes abruptly and there's task_mgr::shutdown_tasks that does similar thing.
This API is rather pointless, as sane choice anyway requires knowledge of peers
status and leaders lifetime in any case can intersect, which is fine for us --
so manual elections are straightforward. Here, we deterministically choose among
the reasonably caught up safekeepers, shifting by timeline id to spread the
load.
A step towards custom broker https://github.com/neondatabase/neon/issues/2394
Part of https://github.com/neondatabase/neon/pull/2239
Regular, from scratch, timeline creation involves initdb to be run in a separate directory, data from this directory to be imported into pageserver and, finally, timeline-related background tasks to start.
This PR ensures we don't leave behind any directories that are not marked as temporary and that pageserver removes such directories on restart, allowing timeline creation to be retried with the same IDs, if needed.
It would be good to later rewrite the logic to use a temporary directory, similar what tenant creation does.
Yet currently it's harder than this change, so not done.
* etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error
* clap had released another major update that requires changing every CLI declaration again, deserves a separate PR
The 'local' part was always filled in, so that was easy to merge into
into the TimelineInfo itself. 'remote' only contained two fields,
'remote_consistent_lsn' and 'awaits_download'. I made
'remote_consistent_lsn' an optional field, and 'awaits_download' is now
false if the timeline is not present remotely.
However, I kept stub versions of the 'local' and 'remote' structs for
backwards-compatibility, with a few fields that are actively used by
the control plane. They just duplicate the fields from TimelineInfo
now. They can be removed later, once the control plane has been
updated to use the new fields.
It was only None when you queried the status of a timeline with
'timeline_detail' mgmt API call, and it was still being downloaded. You
can check for that status with the 'tenant_status' API call instead,
checking for has_in_progress_downloads field.
Anothere case was if an error happened while trying to get the current
logical size, in a 'timeline_detail' request. It might make sense to
tolerate such errors, and leave the fields we cannot fill in as empty,
None, 0 or similar, but it doesn't make sense to me to leave the whole
'local' struct empty in tht case.
With the ability to pass commit_lsn. This allows to perform project WAL recovery
through different (from the original) set of safekeepers (or under different
ttid) by
1) moving WAL files to s3 under proper ttid;
2) explicitly creating timeline on safekeepers, setting commit_lsn to the
latest point;
3) putting the lastest .parital file to the timeline directory on safekeepers, if
desired.
Extend test_s3_wal_replay to exersise this behaviour.
Also extends timeline_status endpoint to return postgres information.
* Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label
* Emit libmetrics_build_info on startup of pageserver, safekeeper and
proxy with label "revision" which tells the git revision.
We had a problem where almost all of the threads were waiting on a futex syscall. More specifically:
- `/metrics` handler was inside `TimelineCollector::collect()`, waiting on a mutex for a single Timeline
- This exact timeline was inside `control_file::FileStorage::persist()`, waiting on a mutex for Lazy initialization of `PERSIST_CONTROL_FILE_SECONDS`
- `PERSIST_CONTROL_FILE_SECONDS: Lazy<Histogram>` was blocked on `prometheus::register`
- `prometheus::register` calls `DEFAULT_REGISTRY.write().register()` to take a write lock on Registry and add a new metric
- `DEFAULT_REGISTRY` lock was already taken inside `DEFAULT_REGISTRY.gather()`, which was called by `/metrics` handler to collect all metrics
This commit creates another Registry with a separate lock, to avoid deadlock in a case where `TimelineCollector` triggers registration of new metrics inside default registry.
Creates new `pageserver_api` and `safekeeper_api` crates to serve as the
shared dependencies. Should reduce both recompile times and cold compile
times.
Decreases the size of the optimized `neon_local` binary: 380M -> 179M.
No significant changes for anything else (mostly as expected).