neon/safekeeper at main - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-22 21:59:59 +00:00

Files

Dmitrii Kovalkov edd60730c8 safekeeper: use last_log_term in mconf switch + choose most advanced sk in pull timeline (#12778 )

## Problem
I discovered two bugs corresponding to safekeeper migration, which
together might lead to a data loss during the migration. The second bug
is from a hadron patch and might lead to a data loss during the
safekeeper restore in hadron as well.

1. `switch_membership` returns the current `term` instead of
`last_log_term`. It is used to choose the `sync_position` in the
algorithm, so we might choose the wrong one and break the correctness
guarantees.
2. The current `term` is used to choose the most advanced SK in
`pull_timeline` with higher priority than `flush_lsn`. It is incorrect
because the most advanced safekeeper is the one with the highest
`(last_log_term, flush_lsn)` pair. The compute might bump term on the
least advanced sk, making it the best choice to pull from, and thus
making committed log entries "uncommitted" after `pull_timeline`

Part of https://databricks.atlassian.net/browse/LKB-1017

## Summary of changes
- Return `last_log_term` in `switch_membership`
- Use `(last_log_term, flush_lsn)` as a primary key for choosing the
most advanced sk in `pull_timeline` and deny pulling if the `max_term`
is higher than on the most advanced sk (hadron only)
- Write tests for both cases
- Retry `sync_safekeepers` in `compute_ctl`
- Take into the account the quorum size when calculating `sync_position`

2025-07-31 09:29:25 +00:00

benches

Update storage components to edition 2024 (#10919 )

2025-02-25 23:51:37 +00:00

client

A few SK changes (#12577 )

2025-07-14 16:37:04 +00:00

spec

TLA+ spec for safekeeper membership change (#9966 )

2025-01-09 12:26:17 +00:00

src

safekeeper: use last_log_term in mconf switch + choose most advanced sk in pull timeline (#12778 )

2025-07-31 09:29:25 +00:00

tests

Bump rand crate to 0.9 (#12674 )

2025-07-22 09:31:39 +00:00

Cargo.toml

safekeeper: add global disk usage utilization limit (#12605 )

2025-07-16 14:43:17 +00:00