File size: 1,496 Bytes
2d01495
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Configuration Refactoring

## Overview
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.

## Key Changes

### Centralized Configuration
All previously hard-coded parameters have been moved to `config.py` and organized by functional category:

- **PDF_SETTINGS**: Parameters for PDF processing
- **SEGMENTATION_SETTINGS**: Image segmentation configuration
- **CACHE_SETTINGS**: Cache TTL and capacity settings
- **TEXT_REPAIR_SETTINGS**: Duplication detection and repair thresholds

### Environment Variable Support
All configuration parameters can now be overridden via environment variables:

```bash
# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200

# Example: Increase cache size
export CACHE_MAX_ENTRIES=50
```

### Import Strategy
To prevent circular dependencies, configuration is imported at function level where needed:

```python
def process_image():
    from config import SEGMENTATION_SETTINGS
    # Function implementation using settings
```

## Benefits

- **Maintainability**: Settings are centralized and documented
- **Flexibility**: Configuration can be adjusted without code changes
- **Consistency**: Standardized approach to configuration across modules
- **Traceability**: Clear overview of all configurable parameters

## Future Improvements

- Add configuration schema validation
- Support for configuration profiles (dev/test/prod)
- Add detailed documentation for each parameter