Commit Graph

374 Commits

Author SHA1 Message Date
Arseny Sher
db343caf5d Fix BecomeLeader, adjust init and target confs. 2024-12-02 12:02:26 +01:00
Arseny Sher
6b7af160bf Fix AccReset 2024-12-02 11:05:39 +01:00
Arseny Sher
a9662b6f64 do not append on non members 2024-12-02 12:11:10 +03:00
Arseny Sher
7a580627fc fixup! AccSwitchConf, AbortChange, AccReset 2024-11-29 17:38:11 +03:00
Arseny Sher
176fe2cade remove identic switch 2024-11-29 17:34:38 +03:00
Arseny Sher
c846389812 AccSwitchConf, AbortChange, AccReset 2024-11-29 17:25:17 +03:00
Arseny Sher
cb1ebedc9f FinishChange 2024-11-29 16:32:08 +03:00
Arseny Sher
08b19dd6aa reconfig CommitEntries 2024-11-29 14:14:34 +03:00
Arseny Sher
ced1903267 Rework CommitEntries 2024-11-29 10:11:33 +03:00
Arseny Sher
9584317564 writing entries, remove prop null conf 2024-11-27 16:55:38 +03:00
Arseny Sher
4c7bdfa70a becomeleader 2024-11-27 14:19:43 +03:00
Arseny Sher
59cb648457 voting 2024-11-27 12:53:26 +03:00
Arseny Sher
8c880d088b RestartProposer 2024-11-26 16:52:05 +03:00
Arseny Sher
c940f196ce Start reconfig 2024-11-26 16:34:40 +03:00
Arseny Sher
d0b4b3e64a fmt 2024-11-26 16:34:26 +03:00
Arseny Sher
617a8711be fix comment 2024-11-25 18:00:48 +03:00
Arseny Sher
da5b71b5c1 a bit of readme 2024-11-25 15:59:07 +03:00
Arseny Sher
6b62b7633b address review 2024-11-25 13:19:58 +03:00
Arseny Sher
07872b310c One more small model 2024-11-18 14:06:13 +03:00
Arseny Sher
90aa12c3d8 Add elected_history 2024-11-18 14:06:13 +03:00
Arseny Sher
02dc3b2ba2 remove obsolete nextentry 2024-11-18 14:06:13 +03:00
Arseny Sher
9ab6a89b5c p2a3t4l4 run 2024-11-18 14:06:13 +03:00
Arseny Sher
79137382e7 note on fig8 2024-11-18 14:06:13 +03:00
Arseny Sher
e87d0813d1 Add some TLC results. 2024-11-18 14:06:13 +03:00
Arseny Sher
ec7c8814f4 add p2a3t4l4 model 2024-11-18 14:06:13 +03:00
Arseny Sher
31c3eb7628 fix previous 2024-11-18 14:06:13 +03:00
Arseny Sher
a2c67361b0 Add cfg to out file name 2024-11-18 14:06:13 +03:00
Arseny Sher
0d057d1374 fix previous 2024-11-18 14:06:13 +03:00
Arseny Sher
2917e49391 add tools var 2024-11-18 14:06:13 +03:00
Arseny Sher
91357b05e8 Get cpu name differently 2024-11-18 14:06:13 +03:00
Arseny Sher
765adaf16c Add even bigger model. 2024-11-18 14:06:13 +03:00
Arseny Sher
50a23d5a14 Move CommittedNotTruncated 2024-11-18 14:06:13 +03:00
Arseny Sher
1c30e6a61a add big model 2024-11-18 14:06:13 +03:00
Arseny Sher
42a9ef3645 fix newline 2024-11-18 14:06:13 +03:00
Arseny Sher
83b8e5c117 Piece of protocol readme. 2024-11-18 14:06:13 +03:00
Arseny Sher
5912932de8 remove whitespace 2024-11-18 14:06:13 +03:00
Arseny Sher
9aa29712d3 more models 2024-11-18 14:06:13 +03:00
Arseny Sher
443a6fdfdb bad quorums 2024-11-18 14:06:13 +03:00
Arseny Sher
664569ecdb MaxTruncatedTerms 2024-11-18 14:06:13 +03:00
Arseny Sher
a9ced3573a Add CommittedNotTruncated 2024-11-18 14:06:13 +03:00
Arseny Sher
f7b9fc1c81 Save runs. 2024-11-18 14:06:13 +03:00
Arseny Sher
979f925949 CommitEntries. 2024-11-18 14:06:13 +03:00
Arseny Sher
13fd695e3f Add TruncateWal 2024-11-18 14:06:13 +03:00
Arseny Sher
2d0c22d77d Start adding term history, election works. 2024-11-18 14:06:13 +03:00
Arseny Sher
9cde4ab0a7 More clean separation of spec and model checking.
and runner script.
2024-11-18 14:06:13 +03:00
Erik Grinaker
de7e4a34ca safekeeper: send AppendResponse on segment flush (#9692)
## Problem

When processing pipelined `AppendRequest`s, we explicitly flush the WAL
every second and return an `AppendResponse`. However, the WAL is also
implicitly flushed on segment bounds, but this does not result in an
`AppendResponse`. Because of this, concurrent transactions may take up
to 1 second to commit and writes may take up to 1 second before sending
to the pageserver.

## Summary of changes

Advance `flush_lsn` when a WAL segment is closed and flushed, and emit
an `AppendResponse`. To accommodate this, track the `flush_lsn` in
addition to the `flush_record_lsn`.
2024-11-17 18:19:14 +01:00
Arseny Sher
d06bf4b0fe safekeeper: fix atomicity of WAL truncation (#9685)
If WAL truncation fails in the middle it might leave some data on disk
above the write/flush LSN. In theory, concatenated with previous records
it might form bogus WAL (though very unlikely in practice because CRC
would protect from that). To protect from that, set
pending_wal_truncation flag: means before any WAL writes truncation must
be retried until it succeeds. We already did that in case of safekeeper
restart, now extend this mechanism for failures without restart. Also,
importantly, reset LSNs in the beginning of the operation, not in the
end, because once on disk deletion starts previous pointers are wrong.

All this most likely haven't created any problems in practice because
CRC protects from the consequences.

Tests for this are hard; simulation infrastructure might be useful here
in the future, but not yet.
2024-11-14 13:06:42 +03:00
Erik Grinaker
2256a5727a safekeeper: use WAL_SEGMENT_SIZE for empty timeline state (#9734)
## Problem

`TimelinePersistentState::empty()`, used for tests and benchmarks, had a
hardcoded 16 MB WAL segment size. This caused confusion when attempting
to change the global segment size.

## Summary of changes

Inherit from `WAL_SEGMENT_SIZE` in `TimelinePersistentState::empty()`.
2024-11-12 20:35:44 +00:00
Erik Grinaker
6b19867410 safekeeper: don't flush control file on WAL ingest path (#9698)
## Problem

The control file is flushed on the WAL ingest path when the commit LSN
advances by one segment, to bound the amount of recovery work in case of
a crash. This involves 3 additional fsyncs, which can have a significant
impact on WAL ingest throughput. This is to some extent mitigated by
`AppendResponse` not being emitted on segment bound flushes, since this
will prevent commit LSN advancement, which will be addressed separately.

## Summary of changes

Don't flush the control file on the WAL ingest path at all. Instead,
leave that responsibility to the timeline manager, but ask it to flush
eagerly if the control file lags the in-memory commit LSN by more than
one segment. This should not cause more than `REFRESH_INTERVAL` (300 ms)
additional latency before flushing the control file, which is
negligible.
2024-11-12 15:17:03 +00:00
Erik Grinaker
f63de5f527 safekeeper: add initialize_segment variant of safekeeper_wal_storage_operation_seconds (#9691)
## Problem

We don't have a metric capturing the latency of segment initialization.
This can be significant due to fsyncs.

## Summary of changes

Add an `initialize_segment` variant of
`safekeeper_wal_storage_operation_seconds`.
2024-11-11 17:55:50 +01:00