Neon Failpoint Library
A modern, async-first failpoint library for Neon, replacing the fail crate with enhanced functionality.
Features
- Async-first: All failpoint operations are async and don't require
spawn_blocking - Context matching: Failpoints can be configured to trigger only when specific context conditions are met
- Regex support: Context values can be matched using regular expressions
- Cancellation support: All operations support cancellation tokens
- Dynamic reconfiguration: Paused and sleeping tasks automatically resume when failpoint configurations change
- Backward compatibility: Drop-in replacement for existing
failcrate usage
Supported Actions
off- Disable the failpointpause- Pause indefinitely until disabled, reconfigured, or cancelledsleep(N)- Sleep for N milliseconds (can be interrupted by reconfiguration)return- Return early (empty value)return(value)- Return early with a specific valueexit- Exit the process immediatelypanic(message)- Panic the process with a custom messageN%return(value)- Return with a specific value N% of the time (probability-based)N%M*return(value)- Return with a specific value N% of the time, maximum M timesN%action- Execute any action N% of the time (probability-based)N%M*action- Execute any action N% of the time, maximum M times
Probability-Based Actions
The library supports probability-based failpoints that trigger only a percentage of the time:
// 50% chance to return a value
configure_failpoint("random_failure", "50%return(error)").unwrap();
// 10% chance to sleep, maximum 3 times
configure_failpoint("occasional_delay", "10%3*sleep(1000)").unwrap();
// 25% chance to panic
configure_failpoint("rare_panic", "25%panic(critical error)").unwrap();
The probability system uses a counter to track how many times a probability-based action has been triggered, allowing for precise control over test scenarios.
Dynamic Behavior
When a failpoint is reconfigured while tasks are waiting on it:
- Paused tasks will immediately resume and continue normal execution
- Sleeping tasks will wake up early and continue normal execution
- Removed failpoints will cause all waiting tasks to resume normally
The new configuration only applies to future hits of the failpoint, not to tasks that are already waiting. This allows for flexible testing scenarios where you can pause execution, inspect state, and then resume execution dynamically.
Example: Dynamic Reconfiguration
use neon_failpoint::{configure_failpoint, failpoint, FailpointResult};
use tokio::time::Duration;
// Start a task that will hit a failpoint
let task = tokio::spawn(async {
println!("About to hit failpoint");
match failpoint("test_pause", None).await {
FailpointResult::Return(value) => println!("Returned: {}", value),
FailpointResult::Continue => println!("Continued normally"),
FailpointResult::Cancelled => println!("Cancelled"),
}
});
// Configure the failpoint to pause
configure_failpoint("test_pause", "pause").unwrap();
// Let the task hit the failpoint and pause
tokio::time::sleep(Duration::from_millis(10)).await;
// Change the failpoint configuration - this will wake up the paused task
// The task will resume and continue normally (not apply the new config)
configure_failpoint("test_pause", "return(not_applied)").unwrap();
// The task will complete with Continue, not Return
let result = task.await.unwrap();
Basic Usage
use neon_failpoint::{configure_failpoint, failpoint, FailpointResult};
// Configure a failpoint
configure_failpoint("my_failpoint", "return(42)").unwrap();
// Use the failpoint
match failpoint("my_failpoint", None).await {
FailpointResult::Return(value) => {
println!("Failpoint returned: {}", value);
return value.parse().unwrap_or_default();
}
FailpointResult::Continue => {
// Continue normal execution
}
FailpointResult::Cancelled => {
// Handle cancellation
}
}
Context-Based Failpoint Configuration
Context allows you to create conditional failpoints that only trigger when specific runtime conditions are met. This is particularly useful for testing scenarios where you want to inject failures only for specific tenants, operations, or other contextual conditions.
Configuring Context-Based Failpoints
Use configure_failpoint_with_context() to set up failpoints with context matching:
use neon_failpoint::configure_failpoint_with_context;
use std::collections::HashMap;
let mut context_matchers = HashMap::new();
context_matchers.insert("tenant_id".to_string(), "test_.*".to_string());
context_matchers.insert("operation".to_string(), "backup".to_string());
configure_failpoint_with_context(
"backup_operation", // failpoint name
"return(simulated_failure)", // action to take
context_matchers // context matching rules
).unwrap();
Context Matching Rules
The context matching system works as follows:
- Key-Value Matching: Each entry in
context_matchersspecifies a key that must exist in the runtime context - Regex Support: Values in
context_matchersare treated as regular expressions first - Fallback to Exact Match: If the regex compilation fails, it falls back to exact string matching
- ALL Must Match: All context matchers must match for the failpoint to trigger
Runtime Context Usage
When code hits a failpoint, it provides context using a HashMap<String, String>:
use neon_failpoint::{failpoint, FailpointResult};
use std::collections::HashMap;
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
context.insert("operation".to_string(), "backup".to_string());
context.insert("user_id".to_string(), "user_456".to_string());
match failpoint("backup_operation", Some(&context)) {
either::Either::Left(result) => {
match result {
FailpointResult::Return(value) => {
// This will only trigger if ALL context matchers match
println!("Backup failed: {}", value);
}
FailpointResult::Continue => {
// Continue with normal backup operation
}
FailpointResult::Cancelled => {}
}
}
either::Either::Right(future) => {
match future.await {
FailpointResult::Return(value) => {
// This will only trigger if ALL context matchers match
println!("Backup failed: {}", value);
}
FailpointResult::Continue => {
// Continue with normal backup operation
}
FailpointResult::Cancelled => {}
}
}
}
Context Matching Examples
Regex Matching
// Configure to match test tenants only
let mut matchers = HashMap::new();
matchers.insert("tenant_id".to_string(), "test_.*".to_string());
configure_failpoint_with_context("test_failpoint", "pause", matchers).unwrap();
// This will match
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
// This will NOT match
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "prod_123".to_string());
Multiple Conditions
// Must match BOTH tenant pattern AND operation
let mut matchers = HashMap::new();
matchers.insert("tenant_id".to_string(), "test_.*".to_string());
matchers.insert("operation".to_string(), "backup".to_string());
configure_failpoint_with_context("backup_test", "return(failed)", matchers).unwrap();
// This will match (both conditions met)
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
context.insert("operation".to_string(), "backup".to_string());
// This will NOT match (missing operation)
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
context.insert("operation".to_string(), "restore".to_string());
Exact String Matching
// If regex compilation fails, falls back to exact match
let mut matchers = HashMap::new();
matchers.insert("env".to_string(), "staging".to_string());
configure_failpoint_with_context("env_specific", "sleep(1000)", matchers).unwrap();
// This will match
let mut context = HashMap::new();
context.insert("env".to_string(), "staging".to_string());
// This will NOT match
let mut context = HashMap::new();
context.insert("env".to_string(), "production".to_string());
Benefits of Context-Based Failpoints
- Selective Testing: Only inject failures for specific tenants, environments, or operations
- Production Safety: Avoid accidentally triggering failpoints in production by using context filters
- Complex Scenarios: Test interactions between different components with targeted failures
- Debugging: Isolate issues to specific contexts without affecting the entire system
Context vs. Non-Context Failpoints
- Without context:
configure_failpoint("name", "action")- triggers for ALL hits - With context:
configure_failpoint_with_context("name", "action", matchers)- triggers only when context matches
Context-Specific Failpoints
use neon_failpoint::{configure_failpoint_with_context, failpoint};
use std::collections::HashMap;
// Configure a failpoint that only triggers for specific tenants
let mut context_matchers = HashMap::new();
context_matchers.insert("tenant_id".to_string(), "test_.*".to_string());
context_matchers.insert("operation".to_string(), "backup".to_string());
configure_failpoint_with_context(
"backup_operation",
"return(simulated_failure)",
context_matchers
).unwrap();
// Use with context
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
context.insert("operation".to_string(), "backup".to_string());
match failpoint("backup_operation", Some(&context)) {
either::Either::Left(result) => {
match result {
FailpointResult::Return(value) => {
// This will trigger for tenant_id matching "test_.*"
println!("Backup failed: {}", value);
}
FailpointResult::Continue => {
// Continue with backup
}
FailpointResult::Cancelled => {}
}
}
either::Either::Right(future) => {
match future.await {
FailpointResult::Return(value) => {
// This will trigger for tenant_id matching "test_.*"
println!("Backup failed: {}", value);
}
FailpointResult::Continue => {
// Continue with backup
}
FailpointResult::Cancelled => {}
}
}
}
Macros
The library provides convenient macros for common patterns:
fail_point! - Basic Failpoint Macro
The fail_point! macro has three variants:
-
Simple failpoint -
fail_point!(name)- Just checks the failpoint and continues or returns early (no value)
- Panics if the failpoint is configured with
return(value)since no closure is provided
-
Failpoint with return handler -
fail_point!(name, closure)- Provides a closure to handle return values from the failpoint
- The closure receives
Option<String>and should return the appropriate value
-
Conditional failpoint -
fail_point!(name, condition, closure)- Only checks the failpoint if the condition is true
- Provides a closure to handle return values (receives
&str)
use neon_failpoint::fail_point;
// Simple failpoint - just continue or return early
fail_point!("my_failpoint");
// Failpoint with return value handling
fail_point!("my_failpoint", |value: Option<String>| {
match value {
Some(v) => {
println!("Got value: {}", v);
return Ok(v.parse().unwrap_or_default());
}
None => return Ok(42), // Default return value
}
});
// Conditional failpoint - only check if condition is met
let should_fail = some_condition();
fail_point!("conditional_failpoint", should_fail, |value: &str| {
println!("Conditional failpoint triggered with: {}", value);
return Err(anyhow::anyhow!("Simulated failure"));
});
fail_point_with_context! - Context-Aware Failpoint Macro
The fail_point_with_context! macro has three variants that mirror fail_point! but include context:
- Simple with context -
fail_point_with_context!(name, context) - With context and return handler -
fail_point_with_context!(name, context, closure) - Conditional with context -
fail_point_with_context!(name, context, condition, closure)
use neon_failpoint::{fail_point_with_context};
use std::collections::HashMap;
let mut context = HashMap::new();
context.insert("tenant_id".to_string(), "test_123".to_string());
context.insert("operation".to_string(), "backup".to_string());
// Simple context failpoint
fail_point_with_context!("backup_failpoint", &context);
// Context failpoint with return handler
fail_point_with_context!("backup_failpoint", &context, |value: Option<String>| {
match value {
Some(v) => return Err(anyhow::anyhow!("Backup failed: {}", v)),
None => return Err(anyhow::anyhow!("Backup failed")),
}
});
// Conditional context failpoint
let is_test_tenant = tenant_id.starts_with("test_");
fail_point_with_context!("backup_failpoint", &context, is_test_tenant, |value: Option<String>| {
// Only triggers for test tenants
return Err(anyhow::anyhow!("Test tenant backup failure"));
});
Other Utility Macros
use neon_failpoint::{pausable_failpoint, sleep_millis_async};
// Pausable failpoint with cancellation
let cancel_token = CancellationToken::new();
if let Err(()) = pausable_failpoint!("pause_here", &cancel_token).await {
println!("Failpoint was cancelled");
}
// Sleep failpoint
sleep_millis_async!("sleep_here", &cancel_token).await;
// Context creation helper
let mut context = HashMap::new();
context.insert("key1".to_string(), "value1".to_string());
context.insert("key2".to_string(), "value2".to_string());
Argument Reference
name: String literal - the name of the failpointcontext: Expression that evaluates to&HashMap<String, String>- context for matchingcondition: Boolean expression - only check failpoint if trueclosure: Closure that handles return values:- For
fail_point!with closure: receivesOption<String> - For conditional variants: receives
&str - For
fail_point_with_context!with closure: receivesOption<String>
- For
cancel:&CancellationToken- for cancellation support
Migration from fail crate
The library provides a compatibility layer in libs/utils/src/failpoint_support.rs. Most existing code should work without changes, but you can migrate to the new async APIs for better performance:
Before (with fail crate):
use utils::failpoint_support::pausable_failpoint;
// This used spawn_blocking internally
pausable_failpoint!("my_failpoint", &cancel_token).await?;
After (with neon_failpoint):
use neon_failpoint::{failpoint_with_cancellation, FailpointResult};
// This is fully async
match failpoint_with_cancellation("my_failpoint", None, &cancel_token).await {
FailpointResult::Continue => {},
FailpointResult::Cancelled => return Err(()),
FailpointResult::Return(_) => {},
}
Environment Variable Support
Failpoints can be configured via the FAILPOINTS environment variable:
FAILPOINTS="failpoint1=return(42);failpoint2=sleep(1000);failpoint3=exit"
Testing
The library includes comprehensive tests and examples. Run them with:
cargo test --features testing
cargo run --example context_demo --features testing
HTTP Configuration
The library integrates with the existing HTTP failpoint configuration API. Send POST requests to /v1/failpoints with:
[
{
"name": "my_failpoint",
"actions": "return(42)"
}
]