historical-ocr / docs /config_refactoring.md
milwright's picture
fix cline
2d01495
# Configuration Refactoring
## Overview
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.
## Key Changes
### Centralized Configuration
All previously hard-coded parameters have been moved to `config.py` and organized by functional category:
- **PDF_SETTINGS**: Parameters for PDF processing
- **SEGMENTATION_SETTINGS**: Image segmentation configuration
- **CACHE_SETTINGS**: Cache TTL and capacity settings
- **TEXT_REPAIR_SETTINGS**: Duplication detection and repair thresholds
### Environment Variable Support
All configuration parameters can now be overridden via environment variables:
```bash
# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200
# Example: Increase cache size
export CACHE_MAX_ENTRIES=50
```
### Import Strategy
To prevent circular dependencies, configuration is imported at function level where needed:
```python
def process_image():
from config import SEGMENTATION_SETTINGS
# Function implementation using settings
```
## Benefits
- **Maintainability**: Settings are centralized and documented
- **Flexibility**: Configuration can be adjusted without code changes
- **Consistency**: Standardized approach to configuration across modules
- **Traceability**: Clear overview of all configurable parameters
## Future Improvements
- Add configuration schema validation
- Support for configuration profiles (dev/test/prod)
- Add detailed documentation for each parameter