--- title: Historical OCR emoji: ⚙️ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.44.1 app_file: app.py pinned: false license: gpl-3.0 short_description: advanced OCR application for historical document analysis --- # Historical OCR An advanced OCR application for historical document analysis using Mistral AI. > **Note:** This tool is designed to assist scholars in historical research by extracting text from challenging documents. While it may not achieve 100% accuracy for all materials, it serves as a valuable research aid for navigating historical documents, particularly historical newspapers, handwritten documents, and photos of archival materials. ## Features - **OCR with Context:** AI-enhanced OCR optimized for historical documents - **Document Type Detection:** Automatically identifies handwritten letters, recipes, scientific texts, and more - **Advanced Image Preprocessing:** - Automatic deskewing to correct document orientation - Smart thresholding with Otsu and adaptive methods - Morphological operations to clean up text - Document-type specific optimization - **Custom Prompting:** Tailor the AI analysis with document-specific instructions - **Structured Output:** Returns organized, structured information based on document type ## Using This App 1. Upload a historical document (image or PDF) 2. Add optional context or special instructions 3. Get detailed, structured OCR results with historical context ## Supported Document Types - Handwritten letters and correspondence - Historical recipes and cookbooks - Travel accounts and exploration logs - Scientific papers and experiments - Legal documents and certificates - Historical newspaper articles - General historical texts ## Technical Details Built with Streamlit and Mistral AI's OCR and large language model capabilities. --- Created by Zach Muhlbauer, CUNY Graduate Center