Files
neon/safekeeper/tests/walproposer_sim/log.rs
Arthur Petukhovsky 4be2223a4c Discrete event simulation for safekeepers (#5804)
This PR contains the first version of a
[FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc)
simulation testing for safekeeper and walproposer.

### desim

This is a core "framework" for running determenistic simulation. It
operates on threads, allowing to test syncronous code (like walproposer).

`libs/desim/src/executor.rs` contains implementation of a determenistic
thread execution. This is achieved by blocking all threads, and each
time allowing only a single thread to make an execution step. All
executor's threads are blocked using `yield_me(after_ms)` function. This
function is called when a thread wants to sleep or wait for an external
notification (like blocking on a channel until it has a ready message).

`libs/desim/src/chan.rs` contains implementation of a channel (basic
sync primitive). It has unlimited capacity and any thread can push or
read messages to/from it.

`libs/desim/src/network.rs` has a very naive implementation of a network
(only reliable TCP-like connections are supported for now), that can
have arbitrary delays for each package and failure injections for
breaking connections with some probability.

`libs/desim/src/world.rs` ties everything together, to have a concept of
virtual nodes that can have network connections between them.

### walproposer_sim

Has everything to run walproposer and safekeepers in a simulation.

`safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`,
`send_wal.rs` and `timelines_global_map.rs`.

`walproposer_api.rs` implements all walproposer callback to use
simulation library.

`simulation.rs` defines a schedule – a set of events like `restart <sk>`
or `write_wal` that should happen at time `<ts>`. It also has code to
spawn walproposer/safekeeper threads and provide config to them.

### tests

`simple_test.rs` has tests that just start walproposer and 3 safekeepers
together in a simulation, and tests that they are not crashing right
away.

`misc_test.rs` has tests checking more advanced simulation cases, like
crashing or restarting threads, testing memory deallocation, etc.

`random_test.rs` is the main test, it checks thousands of random seeds
(schedules) for correctness. It roughly corresponds to running a real
python integration test in an environment with very unstable network and
cpu, but in a determenistic way (each seed results in the same execution
log) and much much faster.

Closes #547

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2024-02-12 20:29:57 +00:00

78 lines
2.1 KiB
Rust

use std::{fmt, sync::Arc};
use desim::time::Timing;
use once_cell::sync::OnceCell;
use parking_lot::Mutex;
use tracing_subscriber::fmt::{format::Writer, time::FormatTime};
/// SimClock can be plugged into tracing logger to print simulation time.
#[derive(Clone)]
pub struct SimClock {
clock_ptr: Arc<Mutex<Option<Arc<Timing>>>>,
}
impl Default for SimClock {
fn default() -> Self {
SimClock {
clock_ptr: Arc::new(Mutex::new(None)),
}
}
}
impl SimClock {
pub fn set_clock(&self, clock: Arc<Timing>) {
*self.clock_ptr.lock() = Some(clock);
}
}
impl FormatTime for SimClock {
fn format_time(&self, w: &mut Writer<'_>) -> fmt::Result {
let clock = self.clock_ptr.lock();
if let Some(clock) = clock.as_ref() {
let now = clock.now();
write!(w, "[{}]", now)
} else {
write!(w, "[?]")
}
}
}
static LOGGING_DONE: OnceCell<SimClock> = OnceCell::new();
/// Returns ptr to clocks attached to tracing logger to update them when the
/// world is (re)created.
pub fn init_tracing_logger(debug_enabled: bool) -> SimClock {
LOGGING_DONE
.get_or_init(|| {
let clock = SimClock::default();
let base_logger = tracing_subscriber::fmt()
.with_target(false)
// prefix log lines with simulated time timestamp
.with_timer(clock.clone())
// .with_ansi(true) TODO
.with_max_level(match debug_enabled {
true => tracing::Level::DEBUG,
false => tracing::Level::WARN,
})
.with_writer(std::io::stdout);
base_logger.init();
// logging::replace_panic_hook_with_tracing_panic_hook().forget();
if !debug_enabled {
std::panic::set_hook(Box::new(|_| {}));
}
clock
})
.clone()
}
pub fn init_logger() -> SimClock {
// RUST_TRACEBACK envvar controls whether we print all logs or only warnings.
let debug_enabled = std::env::var("RUST_TRACEBACK").is_ok();
init_tracing_logger(debug_enabled)
}