Spaces:
Running
Storage Layer Tests
This directory contains tests for the StarFish storage layer implementation.
Overview
The storage layer is responsible for persisting metadata and data artifacts for synthetic data generation jobs. It provides:
- A pluggable interface for different storage backends
- A hybrid local implementation using SQLite for metadata and JSON files for data
- Comprehensive APIs for storing projects, jobs, and records
Running Tests
Prerequisites
Install test dependencies:
pip install -r ../requirements-test.txt
Running the Main Test Script
For a quick verification of functionality, run the main test script:
python -m tests.new_storage.test_storage_main
This script provides a user-friendly overview of the storage layer's functionality by:
- Creating a test storage in
/tmp/starfish_test_db
(or a custom path viaSTARFISH_TEST_DB_DIR
env var) - Exercising all major API functions in sequence
- Verifying data integrity
- Leaving the files for inspection (unless
STARFISH_TEST_MODE=cleanup
is set)
You can run in full test mode which includes a small performance test:
STARFISH_TEST_MODE=full python -m tests.new_storage.test_storage_main
Performance Testing
For dedicated performance testing of the storage layer with a larger dataset (1000 records):
python -m tests.new_storage.local.test_performance
This will create 500 execution jobs with 2 records each (1000 total records) and measure the performance of various storage operations.
Running the Full Test Suite
To run the comprehensive test suite with pytest:
pytest -xvs tests/new_storage/
For just the local storage implementation:
pytest -xvs tests/new_storage/local/
Test Structure
test_storage_main.py
: User-friendly standalone test scriptlocal/test_performance.py
: Performance test with 1000 recordslocal/test_local_storage.py
: Comprehensive pytest-based test suitelocal/test_basic_storage.py
: Simple standalone test script
Test Database
Tests use separate test databases (by default in /tmp/starfish_test_*
directories) to avoid interfering with production data.
Test Coverage
The tests verify:
- Project creation and retrieval
- Master job lifecycle (creation, status updates, completion)
- Execution job lifecycle (creation, status updates, completion)
- Record storage and retrieval (metadata and data)
- Filtering and pagination of records
- End-to-end workflows
- Performance with varied dataset sizes
Implementation Details
The local storage implementation uses:
- SQLite for metadata (tables for projects, jobs, and records)
- JSON files for data artifacts (configs and record data)
- WAL mode for SQLite for better concurrency
- Nested directory structure for JSON files based on record IDs