neon/pageserver at bb32f1b3d05fbac6ada499f1c5ec788e6adbb9c1 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-23 06:09:59 +00:00

Files

Erik Grinaker 61f267d8f9 pageserver: only retry WaitForActiveTimeout during shard resolution (#12772 )

## Problem

In https://github.com/neondatabase/neon/pull/12467, timeouts and retries
were added to `Cache::get` tenant shard resolution to paper over an
issue with read unavailability during shard splits. However, this
retries _all_ errors, including irrecoverable errors like `NotFound`.

This causes problems with gRPC child shard routing in #12702, which
targets specific shards with `ShardSelector::Known` and relies on prompt
`NotFound` errors to reroute requests to child shards. These retries
introduce a 1s delay for all reads during child routing.

The broader problem of read unavailability during shard splits is left
as future work, see https://databricks.atlassian.net/browse/LKB-672.

Touches #12702.
Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191).

## Summary of changes

* Change `TenantManager` to always return a concrete
`GetActiveTimelineError`.
* Only retry `WaitForActiveTimeout` errors.
* Lots of code unindentation due to the simplified error handling.

Out of caution, we do not gate the retries on `ShardSelector`, since
this can trigger other races. Improvements here are left as future work.

2025-07-29 12:33:02 +00:00

benches

Bump rand crate to 0.9 (#12674 )

2025-07-22 09:31:39 +00:00

client

A few more SC changes (#12649 )

2025-07-17 23:17:01 +00:00

client_grpc

pageserver/client_grpc: don't set stripe size for unsharded tenants (#12639 )

2025-07-21 12:28:39 +00:00

compaction

Bump rand crate to 0.9 (#12674 )