## Problem We need to test the stability of Neon. ## Summary of changes The test runs random operations on a Neon project. It performs via the Public API calls the following operations: `create a branch`, `delete a branch`, `add a read-only endpoint`, `delete a read-only endpoint`, `restore a branch to a random position in the past`. All the branches and endpoints are loaded with `pgbench`. --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Random Operations Test for Neon Stability
Problem Statement
Neon needs robust testing of Neon's stability to ensure reliability for users. The random operations test addresses this by continuously exercising the API with unpredictable sequences of operations, helping to identify edge cases and potential issues that might not be caught by deterministic tests.
Key Components
1. Class Structure
The test implements three main classes to model the Neon architecture:
- NeonProject: Represents a Neon project and manages the lifecycle of branches and endpoints
- NeonBranch: Represents a branch within a project, with methods for creating child branches, endpoints, and performing point-in-time restores
- NeonEndpoint: Represents an endpoint (connection point) for a branch, with methods for managing benchmarks
2. Operations Tested
The test randomly performs the following operations with weighted probabilities:
- Creating branches
- Deleting branches
- Adding read-only endpoints
- Deleting read-only endpoints
- Restoring branches to random points in time
3. Load Generation
Each branch and endpoint is loaded with pgbench to simulate real database workloads during testing. This ensures that the operations are performed against branches with actual data and ongoing transactions.
4. Error Handling
The test includes robust error handling for various scenarios:
- Branch limit exceeded
- Connection timeouts
- Control plane timeouts (HTTP 524 errors)
- Benchmark failures
5. CI Integration
The test is integrated into the CI pipeline via a GitHub workflow that runs daily, ensuring continuous validation of API stability.
How It Works
- The test creates a Neon project using the Public API
- It initializes the main branch with pgbench data
- It performs random operations according to the weighted probabilities
- During each operation, it checks that all running benchmarks are still operational
- The test cleans up by deleting the project at the end
Configuration
The test can be configured with:
RANDOM_SEED: Set a specific random seed for reproducible test runsNEON_API_KEY: API key for authenticationNEON_API_BASE_URL: Base URL for the API (defaults to staging environment)NUM_OPERATIONS: The number of operations to be performed
Running the Test
The test is designed to run in the CI environment but can also be executed locally:
NEON_API_KEY=your_api_key ./scripts/pytest test_runner/random_ops/test_random_ops.py -m remote_cluster
To run with a specific random seed for reproducibility:
RANDOM_SEED=12345 NEON_API_KEY=your_api_key ./scripts/pytest test_runner/random_ops/test_random_ops.py -m remote_cluster
To run with the custom number of operations:
NUM_OPERATIONS=500 NEON_API_KEY=your_api_key ./scripts/pytest test_runner/random_ops/test_random_ops.py -m remote_cluster
Benefits
This test provides several key benefits:
- Comprehensive API testing: Exercises multiple API endpoints in combination
- Edge case discovery: Random sequences may uncover issues not found in deterministic tests
- Stability validation: Continuous execution helps ensure long-term API reliability
- Regression prevention: Detects if new changes break existing API functionality
Future Improvements
Potential enhancements to the test could include:
- Adding more API operations, e.g.
reset_to_parent,snapshot, etc - Implementing more sophisticated load patterns
- Adding metrics collection to measure API performance
- Extending test duration for longer-term stability validation