Discrete event simulation for safekeepers (#5804)

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-30 03:20:36 +00:00

This PR contains the first version of a
[FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc)
simulation testing for safekeeper and walproposer.

### desim

This is a core "framework" for running determenistic simulation. It
operates on threads, allowing to test syncronous code (like walproposer).

`libs/desim/src/executor.rs` contains implementation of a determenistic
thread execution. This is achieved by blocking all threads, and each
time allowing only a single thread to make an execution step. All
executor's threads are blocked using `yield_me(after_ms)` function. This
function is called when a thread wants to sleep or wait for an external
notification (like blocking on a channel until it has a ready message).

`libs/desim/src/chan.rs` contains implementation of a channel (basic
sync primitive). It has unlimited capacity and any thread can push or
read messages to/from it.

`libs/desim/src/network.rs` has a very naive implementation of a network
(only reliable TCP-like connections are supported for now), that can
have arbitrary delays for each package and failure injections for
breaking connections with some probability.

`libs/desim/src/world.rs` ties everything together, to have a concept of
virtual nodes that can have network connections between them.

### walproposer_sim

Has everything to run walproposer and safekeepers in a simulation.

`safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`,
`send_wal.rs` and `timelines_global_map.rs`.

`walproposer_api.rs` implements all walproposer callback to use
simulation library.

`simulation.rs` defines a schedule – a set of events like `restart <sk>`
or `write_wal` that should happen at time `<ts>`. It also has code to
spawn walproposer/safekeeper threads and provide config to them.

### tests

`simple_test.rs` has tests that just start walproposer and 3 safekeepers
together in a simulation, and tests that they are not crashing right
away.

`misc_test.rs` has tests checking more advanced simulation cases, like
crashing or restarting threads, testing memory deallocation, etc.

`random_test.rs` is the main test, it checks thousands of random seeds
(schedules) for correctness. It roughly corresponds to running a real
python integration test in an environment with very unstable network and
cpu, but in a determenistic way (each seed results in the same execution
log) and much much faster.

Closes #547

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

This commit is contained in:

Arthur Petukhovsky

2024-02-12 20:29:57 +00:00

committed by

GitHub

parent fac50a6264

commit 4be2223a4c

34 changed files with 4603 additions and 25 deletions

									
										10

libs/postgres_ffi/src/xlog_utils.rs
									
												View File
												
				@@ -431,11 +431,11 @@ pub fn generate_wal_segment(segno: u64, system_id: u64, lsn: Lsn) -> Result<Byte

				#[repr(C)]

				#[derive(Serialize)]

				struct XlLogicalMessage {

				    db_id: Oid,

				    transactional: uint32, // bool, takes 4 bytes due to alignment in C structures

				    prefix_size: uint64,

				    message_size: uint64,

				pub struct XlLogicalMessage {

				    pub db_id: Oid,

				    pub transactional: uint32, // bool, takes 4 bytes due to alignment in C structures

				    pub prefix_size: uint64,

				    pub message_size: uint64,

				}

				impl XlLogicalMessage {

Discrete event simulation for safekeepers (#5804)

10 libs/postgres_ffi/src/xlog_utils.rs Unescape Escape View File

10

libs/postgres_ffi/src/xlog_utils.rs

View File