historical-ocr / docs /config_refactoring.md
milwright's picture
fix cline
2d01495
|
raw
history blame
1.5 kB

Configuration Refactoring

Overview

This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.

Key Changes

Centralized Configuration

All previously hard-coded parameters have been moved to config.py and organized by functional category:

  • PDF_SETTINGS: Parameters for PDF processing
  • SEGMENTATION_SETTINGS: Image segmentation configuration
  • CACHE_SETTINGS: Cache TTL and capacity settings
  • TEXT_REPAIR_SETTINGS: Duplication detection and repair thresholds

Environment Variable Support

All configuration parameters can now be overridden via environment variables:

# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200

# Example: Increase cache size
export CACHE_MAX_ENTRIES=50

Import Strategy

To prevent circular dependencies, configuration is imported at function level where needed:

def process_image():
    from config import SEGMENTATION_SETTINGS
    # Function implementation using settings

Benefits

  • Maintainability: Settings are centralized and documented
  • Flexibility: Configuration can be adjusted without code changes
  • Consistency: Standardized approach to configuration across modules
  • Traceability: Clear overview of all configurable parameters

Future Improvements

  • Add configuration schema validation
  • Support for configuration profiles (dev/test/prod)
  • Add detailed documentation for each parameter