Prevent to frequent reconnects in case of race condition errors returned by PS (tenant not found) (#6522)

## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1706531433057289

## Summary of changes

1. Do not decrease reconnect timeout until maximal interval value (1
second) is reached
2. Compute reconnect time after connection attempt is taken to exclude
connect time itself from the interval measurement.

So now backend should not perform more than 4 reconnect attempts per
second.
But please notice that backoff is performed locally in each backend and
so if there are many active backends,
then connection (and  so error) rate may be much higher.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
This commit is contained in:
Konstantin Knizhnik
2024-01-31 09:17:32 +02:00
committed by GitHub
parent e8c9a51273
commit e10a7ee391

View File

@@ -328,18 +328,14 @@ pageserver_connect(shardno_t shard_no, int elevel)
now = GetCurrentTimestamp();
us_since_last_connect = now - last_connect_time;
if (us_since_last_connect < delay_us)
if (us_since_last_connect < MAX_RECONNECT_INTERVAL_USEC)
{
pg_usleep(delay_us - us_since_last_connect);
pg_usleep(delay_us);
delay_us *= 2;
if (delay_us > MAX_RECONNECT_INTERVAL_USEC)
delay_us = MAX_RECONNECT_INTERVAL_USEC;
last_connect_time = GetCurrentTimestamp();
}
else
{
delay_us = MIN_RECONNECT_INTERVAL_USEC;
last_connect_time = now;
}
/*
@@ -366,6 +362,7 @@ pageserver_connect(shardno_t shard_no, int elevel)
values[n] = NULL;
n++;
conn = PQconnectdbParams(keywords, values, 1);
last_connect_time = GetCurrentTimestamp();
if (PQstatus(conn) == CONNECTION_BAD)
{