Spaces:
Running
Running
title: Historical OCR | |
emoji: π | |
colorFrom: blue | |
colorTo: purple | |
sdk: streamlit | |
sdk_version: 1.44.1 | |
app_file: app.py | |
pinned: false | |
license: gpl-3.0 | |
short_description: advanced OCR application for historical document analysis | |
# Historical OCR | |
An advanced OCR application for historical document analysis using Mistral AI. | |
> **Note:** This tool is designed to assist scholars in historical research by extracting text from challenging documents. While it may not achieve 100% accuracy for all materials, it serves as a valuable research aid for navigating historical documents, particularly historical newspapers, handwritten documents, and photos of archival materials. | |
## Features | |
- **OCR with Context:** AI-enhanced OCR optimized for historical documents | |
- **Document Type Detection:** Automatically identifies handwritten letters, recipes, scientific texts, and more | |
- **Advanced Image Preprocessing:** | |
- Automatic deskewing to correct document orientation | |
- Smart thresholding with Otsu and adaptive methods | |
- Morphological operations to clean up text | |
- Document-type specific optimization | |
- **Custom Prompting:** Tailor the AI analysis with document-specific instructions | |
- **Structured Output:** Returns organized, structured information based on document type | |
## Using This App | |
1. Upload a historical document (image or PDF) | |
2. Add optional context or special instructions | |
3. Get detailed, structured OCR results with historical context | |
## Supported Document Types | |
- Handwritten letters and correspondence | |
- Historical recipes and cookbooks | |
- Travel accounts and exploration logs | |
- Scientific papers and experiments | |
- Legal documents and certificates | |
- Historical newspaper articles | |
- General historical texts | |
## Technical Details | |
Built with Streamlit and Mistral AI's OCR and large language model capabilities. | |
--- | |
Created by Zach Muhlbauer, CUNY Graduate Center | |