Spaces:
Running
Running
Configuration Refactoring
Overview
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.
Key Changes
Centralized Configuration
All previously hard-coded parameters have been moved to config.py
and organized by functional category:
- PDF_SETTINGS: Parameters for PDF processing
- SEGMENTATION_SETTINGS: Image segmentation configuration
- CACHE_SETTINGS: Cache TTL and capacity settings
- TEXT_REPAIR_SETTINGS: Duplication detection and repair thresholds
Environment Variable Support
All configuration parameters can now be overridden via environment variables:
# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200
# Example: Increase cache size
export CACHE_MAX_ENTRIES=50
Import Strategy
To prevent circular dependencies, configuration is imported at function level where needed:
def process_image():
from config import SEGMENTATION_SETTINGS
# Function implementation using settings
Benefits
- Maintainability: Settings are centralized and documented
- Flexibility: Configuration can be adjusted without code changes
- Consistency: Standardized approach to configuration across modules
- Traceability: Clear overview of all configurable parameters
Future Improvements
- Add configuration schema validation
- Support for configuration profiles (dev/test/prod)
- Add detailed documentation for each parameter