Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.45.1
metadata
title: Historical OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: gpl-3.0
short_description: advanced OCR application for historical document analysis
Historical OCR
An advanced OCR application for historical document analysis using Mistral AI.
Note: This tool is designed to assist scholars in historical research by extracting text from challenging documents. While it may not achieve 100% accuracy for all materials, it serves as a valuable research aid for navigating historical documents, particularly historical newspapers, handwritten documents, and photos of archival materials.
Features
- OCR with Context: AI-enhanced OCR optimized for historical documents
- Document Type Detection: Automatically identifies handwritten letters, recipes, scientific texts, and more
- Advanced Image Preprocessing:
- Automatic deskewing to correct document orientation
- Smart thresholding with Otsu and adaptive methods
- Morphological operations to clean up text
- Document-type specific optimization
- Custom Prompting: Tailor the AI analysis with document-specific instructions
- Structured Output: Returns organized, structured information based on document type
Using This App
- Upload a historical document (image or PDF)
- Add optional context or special instructions
- Get detailed, structured OCR results with historical context
Supported Document Types
- Handwritten letters and correspondence
- Historical recipes and cookbooks
- Travel accounts and exploration logs
- Scientific papers and experiments
- Legal documents and certificates
- Historical newspaper articles
- General historical texts
Technical Details
Built with Streamlit and Mistral AI's OCR and large language model capabilities.
Created by Zach Muhlbauer, CUNY Graduate Center