# Configuration Refactoring ## Overview This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system. ## Key Changes ### Centralized Configuration All previously hard-coded parameters have been moved to `config.py` and organized by functional category: - **PDF_SETTINGS**: Parameters for PDF processing - **SEGMENTATION_SETTINGS**: Image segmentation configuration - **CACHE_SETTINGS**: Cache TTL and capacity settings - **TEXT_REPAIR_SETTINGS**: Duplication detection and repair thresholds ### Environment Variable Support All configuration parameters can now be overridden via environment variables: ```bash # Example: Override PDF DPI export PDF_DEFAULT_DPI=200 # Example: Increase cache size export CACHE_MAX_ENTRIES=50 ``` ### Import Strategy To prevent circular dependencies, configuration is imported at function level where needed: ```python def process_image(): from config import SEGMENTATION_SETTINGS # Function implementation using settings ``` ## Benefits - **Maintainability**: Settings are centralized and documented - **Flexibility**: Configuration can be adjusted without code changes - **Consistency**: Standardized approach to configuration across modules - **Traceability**: Clear overview of all configurable parameters ## Future Improvements - Add configuration schema validation - Support for configuration profiles (dev/test/prod) - Add detailed documentation for each parameter