Spaces:

milwright
/

historical-ocr

Running

historical-ocr / docs /config_refactoring.md

fix cline

2d01495 14 days ago

1.5 kB

	# Configuration Refactoring

	## Overview
	This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.

	## Key Changes

	### Centralized Configuration
	All previously hard-coded parameters have been moved to `config.py` and organized by functional category:

	- PDF_SETTINGS: Parameters for PDF processing
	- SEGMENTATION_SETTINGS: Image segmentation configuration
	- CACHE_SETTINGS: Cache TTL and capacity settings
	- TEXT_REPAIR_SETTINGS: Duplication detection and repair thresholds

	### Environment Variable Support
	All configuration parameters can now be overridden via environment variables:

	```bash
	# Example: Override PDF DPI
	export PDF_DEFAULT_DPI=200

	# Example: Increase cache size
	export CACHE_MAX_ENTRIES=50
	```

	### Import Strategy
	To prevent circular dependencies, configuration is imported at function level where needed:

	```python
	def process_image():
	from config import SEGMENTATION_SETTINGS
	# Function implementation using settings
	```

	## Benefits

	- Maintainability: Settings are centralized and documented
	- Flexibility: Configuration can be adjusted without code changes
	- Consistency: Standardized approach to configuration across modules
	- Traceability: Clear overview of all configurable parameters

	## Future Improvements

	- Add configuration schema validation
	- Support for configuration profiles (dev/test/prod)
	- Add detailed documentation for each parameter