neon/pageserver at e98bc4fd2ba3cb4a3fa6d00f98406fb0fdb916a8 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 22:42:57 +00:00

Files

John Spray 7e60563910 pageserver: add GcError type (#7917 )

## Problem

- Because GC exposes all errors as an anyhow::Error, we have
intermittent issues with spurious log errors during shutdown, e.g. in
this failure of a performance test
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/

```
Gc failed 1 times, retrying in 2s: shutting down
```

GC really doesn't do a lot of complicated IO: it doesn't benefit from
the backtrace capabilities of anyhow::Error, and can be expressed more
robustly as an enum.

## Summary of changes

- Add GcError type and use it instead of anyhow::Error in GC functions
- In `gc_iteration_internal`, return GcError::Cancelled on shutdown
rather than Ok(()) (we only used Ok before because we didn't have a
clear cancellation error variant to use).
- In `gc_iteration_internal`, skip past timelines that are shutting
down, to avoid having to go through another GC iteration if we happen to
see a deleting timeline during a GC run.
- In `refresh_gc_info_internal`, avoid an error case where a timeline
might not be found after being looked up, by carrying an Arc<Timeline>
instead of a TimelineId between the first loop and second loop in the
function.
- In HTTP request handler, handle Cancelled variants as 503 instead of
turning all GC errors into 500s.

2024-05-31 22:20:06 +01:00

benches

chore!: always use async walredo, warn if sync is configured (#7754 )

2024-05-15 15:04:52 +02:00

client

feat(pagebench): add aux file bench (#7746 )

2024-05-17 20:04:02 +00:00

compaction

refactor(rtc): remove the duplicate IndexLayerMetadata (#7860 )

2024-05-23 23:24:31 +03:00

ctl

pagectl: key command for dumping what we know about the key (#7890 )