File size: 1,902 Bytes
d4d589a
c50e726
d4d589a
c50e726
d4d589a
 
c50e726
d4d589a
 
c50e726
 
d4d589a
 
0b0421d
59aaeae
 
 
3c4dfc4
 
59aaeae
 
 
 
c04ffe5
 
 
 
 
59aaeae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c4dfc4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
title: Historical OCR
emoji: πŸ“œ
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: gpl-3.0
short_description: advanced OCR application for historical document analysis
---

# Historical OCR

An advanced OCR application for historical document analysis using Mistral AI.

> **Note:** This tool is designed to assist scholars in historical research by extracting text from challenging documents. While it may not achieve 100% accuracy for all materials, it serves as a valuable research aid for navigating historical documents, particularly historical newspapers, handwritten documents, and photos of archival materials.

## Features

- **OCR with Context:** AI-enhanced OCR optimized for historical documents
- **Document Type Detection:** Automatically identifies handwritten letters, recipes, scientific texts, and more
- **Advanced Image Preprocessing:** 
  - Automatic deskewing to correct document orientation
  - Smart thresholding with Otsu and adaptive methods
  - Morphological operations to clean up text
  - Document-type specific optimization
- **Custom Prompting:** Tailor the AI analysis with document-specific instructions
- **Structured Output:** Returns organized, structured information based on document type

## Using This App

1. Upload a historical document (image or PDF)
2. Add optional context or special instructions 
3. Get detailed, structured OCR results with historical context

## Supported Document Types

- Handwritten letters and correspondence
- Historical recipes and cookbooks
- Travel accounts and exploration logs
- Scientific papers and experiments
- Legal documents and certificates
- Historical newspaper articles
- General historical texts

## Technical Details

Built with Streamlit and Mistral AI's OCR and large language model capabilities.

---

Created by Zach Muhlbauer, CUNY Graduate Center