Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.45.1
Configuration Refactoring
Overview
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.
Key Changes
Centralized Configuration
All previously hard-coded parameters have been moved to config.py
and organized by functional category:
- PDF_SETTINGS: Parameters for PDF processing
- SEGMENTATION_SETTINGS: Image segmentation configuration
- CACHE_SETTINGS: Cache TTL and capacity settings
- TEXT_REPAIR_SETTINGS: Duplication detection and repair thresholds
Environment Variable Support
All configuration parameters can now be overridden via environment variables:
# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200
# Example: Increase cache size
export CACHE_MAX_ENTRIES=50
Import Strategy
To prevent circular dependencies, configuration is imported at function level where needed:
def process_image():
from config import SEGMENTATION_SETTINGS
# Function implementation using settings
Benefits
- Maintainability: Settings are centralized and documented
- Flexibility: Configuration can be adjusted without code changes
- Consistency: Standardized approach to configuration across modules
- Traceability: Clear overview of all configurable parameters
Future Improvements
- Add configuration schema validation
- Support for configuration profiles (dev/test/prod)
- Add detailed documentation for each parameter