historical-ocr / docs /preprocessing_triage.md
milwright's picture
Consolidate segmentation improvements and code cleanup
42dc069
# OCR Preprocessing Triage
## Quick Fixes Implemented
1. **Handwritten** - Disabled thresholding, uses grayscale only
2. **Newspapers** - Increased block size (51) and constant (10) for softer thresholding
3. **JPEG Artifacts** - Auto-detection and specialized denoising
4. **Border Issues** - Crops edges after deskew to avoid threshold problems
5. **Low Resolution** - Upscales small text for better recognition
## Testing
```
python testing/test_triage_fix.py
```
Check `output/comparison/` for results.