Spaces:

milwright
/

historical-ocr

Running

App Files Files Community

historical-ocr / docs /config_refactoring.md

milwright's picture

fix cline

2d01495 2 months ago

|

1.5 kB

Configuration Refactoring

Overview

This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.

Key Changes

Centralized Configuration

All previously hard-coded parameters have been moved to config.py and organized by functional category:

PDF_SETTINGS: Parameters for PDF processing
SEGMENTATION_SETTINGS: Image segmentation configuration
CACHE_SETTINGS: Cache TTL and capacity settings
TEXT_REPAIR_SETTINGS: Duplication detection and repair thresholds

Environment Variable Support

All configuration parameters can now be overridden via environment variables:

# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200

# Example: Increase cache size
export CACHE_MAX_ENTRIES=50

Import Strategy

To prevent circular dependencies, configuration is imported at function level where needed:

def process_image():
    from config import SEGMENTATION_SETTINGS
    # Function implementation using settings

Benefits

Maintainability: Settings are centralized and documented
Flexibility: Configuration can be adjusted without code changes
Consistency: Standardized approach to configuration across modules
Traceability: Clear overview of all configurable parameters

Future Improvements

Add configuration schema validation
Support for configuration profiles (dev/test/prod)
Add detailed documentation for each parameter