fix(#6366): pageserver: incorrect log level for Tenant not found during basebackup (#6400)

Before this patch, when requesting basebackup for a not-found tenant or
timeline, we'd emit an ERROR-level log entry with a huge stack trace.
See #6366 "Details" section for an example

With this patch, we log at INFO level and only a single line.
Example:

```
2024-01-19T14:16:11.479800Z  INFO page_service_conn_main{peer_addr=127.0.0.1:43448}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4b 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Tenant d69a536d529a68fcf85bc070030cdf4b not found
2024-01-19T14:19:35.807819Z  INFO page_service_conn_main{peer_addr=127.0.0.1:48862}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4a 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Timeline d69a536d529a68fcf85bc070030cdf4a/035484e9c28d8d0138a492caadd03ffd was not found
```

fixes https://github.com/neondatabase/neon/issues/6366

Changes
-------

- Change `handle_basebackup_request` to return a `QueryError`
- The new `impl From<WaitLsnError> for QueryError` is needed so the `?`
at `wait_lsn()` call in `handle_basebackup_request` works again. It's
duplicating `impl From<WaitLsnError> for PageStreamError`.
- Remove hard-to-spot conversion of `handle_basebackup_request` return
value to anyhow::Result (the place where I replaced `anyhow::Ok` with
`Result::<(), QueryError>::Ok(())`
- Add forgotten distinguished handling for "Tenant not found" case in
`impl From<GetActiveTenantError> for QueryError`

This was not at all pleasant, and I find it very hard to follow the
various error conversions.
It took me a while to spot the hard-to-spot `anyhow::Ok` thing above.
It would have been caught by the compiler if we weren't auto-converting
`anyhow::Error` into `QueryError::Other`.
We should move away from that, in my opinion, instead forcing each
`.context()` site to become `.context().map_err(QueryError::Other)`.
But that's for a future PR.
This commit is contained in:
Christian Schwarz
2024-01-30 14:10:48 +01:00
committed by GitHub
parent e3cb715e8a
commit 79137a089f

View File

@@ -368,6 +368,16 @@ impl From<WaitLsnError> for PageStreamError {
}
}
impl From<WaitLsnError> for QueryError {
fn from(value: WaitLsnError) -> Self {
match value {
e @ WaitLsnError::Timeout(_) => Self::Other(anyhow::Error::new(e)),
WaitLsnError::Shutdown => Self::Shutdown,
WaitLsnError::BadState => Self::Reconnect,
}
}
}
impl PageServerHandler {
pub fn new(
conf: &'static PageServerConf,
@@ -1139,7 +1149,7 @@ impl PageServerHandler {
full_backup: bool,
gzip: bool,
ctx: RequestContext,
) -> anyhow::Result<()>
) -> Result<(), QueryError>
where
IO: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
@@ -1404,7 +1414,7 @@ where
)
.await?;
pgb.write_message_noflush(&BeMessage::CommandComplete(b"SELECT 1"))?;
anyhow::Ok(())
Result::<(), QueryError>::Ok(())
},
)
.await?;
@@ -1678,6 +1688,7 @@ impl From<GetActiveTenantError> for QueryError {
| GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping { .. }) => {
QueryError::Shutdown
}
e @ GetActiveTenantError::NotFound(_) => QueryError::NotFound(format!("{e}").into()),
e => QueryError::Other(anyhow::anyhow!(e)),
}
}