Spaces:
Build error
Build error
Update README.md with Demo and Documentation
Browse files
README.md
CHANGED
@@ -8,4 +8,196 @@ pinned: false
|
|
8 |
license: apache-2.0
|
9 |
---
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
license: apache-2.0
|
9 |
---
|
10 |
|
11 |
+
# Handwritten Name Recognition (OCR) App βπ»
|
12 |
+
|
13 |
+
*An end-to-end Streamlit application for training and predicting handwritten names using a CRNN model.*
|
14 |
+
|
15 |
+
[π Demo and Documentation](https://drive.google.com/drive/folders/1rOmwyTJkDCsU-Wuh-_CzvQ9sdb_ci_kX?usp=sharing)
|
16 |
+
[π GitHub Repository]([www.google.com](https://github.com/marianeft/handwritten_name_ocr_app.git))
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
## Table of Contents
|
21 |
+
|
22 |
+
* [Overview](#overview)
|
23 |
+
* [Quickstart](#quickstart)
|
24 |
+
* [Features](#features)
|
25 |
+
* [Project Structure](#project-structure)
|
26 |
+
* [Project Index](#project-index)
|
27 |
+
* [Roadmap](#roadmap)
|
28 |
+
* [Contribution](#contribution)
|
29 |
+
* [License](#license)
|
30 |
+
* [Acknowledgements](#acknowledgements)
|
31 |
+
|
32 |
+
---
|
33 |
+
|
34 |
+
## πΉοΈ Overview
|
35 |
+
|
36 |
+
This project implements a Handwritten Name Recognition (OCR) system using a Convolutional Recurrent Neural Network (CRNN) architecture built with PyTorch. The application is presented as an interactive web interface using Streamlit, allowing users to:
|
37 |
+
|
38 |
+
1. **Train** a new OCR model from a local dataset.
|
39 |
+
2. **Load** a pre-trained model.
|
40 |
+
3. **Predict** text from uploaded handwritten image files.
|
41 |
+
4. **Upload** the local dataset to the Hugging Face Hub for sharing and versioning.
|
42 |
+
|
43 |
+
The CRNN model combines a CNN backbone for feature extraction from images and a Bidirectional LSTM layer for sequence modeling, followed by a linear layer for character classification using CTC (Connectionist Temporal Classification) Loss.
|
44 |
+
|
45 |
+
---
|
46 |
+
|
47 |
+
## π© Quickstart
|
48 |
+
|
49 |
+
Follow these steps to get the application up and running on your local machine.
|
50 |
+
|
51 |
+
### Prerequisites
|
52 |
+
|
53 |
+
* Python 3.8+
|
54 |
+
* `pip` (Python package installer)
|
55 |
+
|
56 |
+
#### 1. Clone the Repository (or set up your project folder)
|
57 |
+
|
58 |
+
Ensure your project structure matches the expected layout (e.g., `app.py`, `config.py`, `data/`, `models/` etc.).
|
59 |
+
|
60 |
+
#### 2. Create and Activate a Virtual Environment
|
61 |
+
It's highly recommended to use a virtual environment to manage dependencies.
|
62 |
+
|
63 |
+
``` bash
|
64 |
+
# Navigate to your project root directory
|
65 |
+
cd path/to/your/handwritten_name_ocr_app
|
66 |
+
|
67 |
+
# Create a virtual environment named 'venvy'
|
68 |
+
python -m venv venvy
|
69 |
+
|
70 |
+
# Activate the virtual environment
|
71 |
+
# On Windows (Command Prompt):
|
72 |
+
.\venvy\Scripts\activate.bat
|
73 |
+
|
74 |
+
# On Windows (PowerShell):
|
75 |
+
.\venvy\Scripts\Activate.ps1
|
76 |
+
|
77 |
+
# On macOS/Linux:
|
78 |
+
source venvy/bin/activate
|
79 |
+
```
|
80 |
+
|
81 |
+
#### 3. Install Dependencies
|
82 |
+
With your virtual environment activated, install all required Python packages:
|
83 |
+
`pip install streamlit` `pandas` `numpy` `Pillow` `torch` `torchvision` `scikit-learn` `tqdm` `editdistance` `huggingface_hub`
|
84 |
+
|
85 |
+
|
86 |
+
*Note on PyTorch (torch and torchvision):
|
87 |
+
The command above installs the CPU-only version of PyTorch. If you have a CUDA-enabled GPU and want to leverage it for faster training, please refer to the official PyTorch website (pytorch.org/get-started/locally/) for specific installation commands tailored to your CUDA version.*
|
88 |
+
|
89 |
+
#### 4. Prepare Your Dataset
|
90 |
+
The application expects a dataset structured as follows:
|
91 |
+
``` bash
|
92 |
+
data/
|
93 |
+
βββ images/
|
94 |
+
β βββ train/
|
95 |
+
β β βββ image1.png
|
96 |
+
β β βββ image2.png
|
97 |
+
β β βββ ...
|
98 |
+
β βββ test/
|
99 |
+
β βββ image_test1.png
|
100 |
+
β βββ image_test2.png
|
101 |
+
β βββ ...
|
102 |
+
βββ train.csv
|
103 |
+
βββ test.csv
|
104 |
+
```
|
105 |
+
|
106 |
+
#### 5. Clear Python Cache *(Important!)*
|
107 |
+
After making code changes or installing new packages, it's crucial to clear Python's compiled cache to ensure the latest code is used.
|
108 |
+
|
109 |
+
```bash
|
110 |
+
find . -name "__pycache__" -exec rm -rf {} + # For macOS/Linux
|
111 |
+
|
112 |
+
Get-ChildItem -Path . -Include __pycache__ -Recurse | Remove-Item -Recurse -Force # For Windows PowerShell
|
113 |
+
```
|
114 |
+
|
115 |
+
### 6. Run the Streamlit Application
|
116 |
+
With your virtual environment activated and dependencies installed:
|
117 |
+
`streamlit run app.py`
|
118 |
+
|
119 |
+
|
120 |
+
*This will open the application in your web browser.*
|
121 |
+
|
122 |
+
## βοΈ Features
|
123 |
+
- **CRNN Model Architecture**: Utilizes a Convolutional Recurrent Neural Network for robust OCR.
|
124 |
+
- **CTC Loss**: Employs Connectionist Temporal Classification for sequence prediction.
|
125 |
+
**Model Training**: Train a new OCR model from your local image and CSV datasets.
|
126 |
+
- **Pre-trained Model Loading**: Load previously saved models to avoid retraining.
|
127 |
+
- **Handwritten Text Prediction**: Upload an image and get instant text recognition.
|
128 |
+
- **Training Progress Visualization**: Real-time updates and plots for training loss, CER, and accuracy.
|
129 |
+
- **Hugging Face Hub Integration**: Seamlessly upload your dataset to the Hugging Face Hub for easy sharing and version control.
|
130 |
+
- **Responsive UI**: Built with Streamlit for an intuitive and user-friendly experience.
|
131 |
+
|
132 |
+
|
133 |
+
## ποΈ Project Structure
|
134 |
+
```
|
135 |
+
handwritten_name_ocr_app/
|
136 |
+
βββ app.py # Main Streamlit application file
|
137 |
+
βββ config.py # Configuration settings (paths, model params, chars)
|
138 |
+
βββ data/ # Directory for datasets
|
139 |
+
β βββ images/
|
140 |
+
β β βββ train/ # Training images
|
141 |
+
β β βββ test/ # Testing images
|
142 |
+
β βββ train.csv # Training labels
|
143 |
+
β βββ test.csv # Testing labels
|
144 |
+
βββ data_handler_ocr.py # Custom PyTorch Dataset and DataLoader logic
|
145 |
+
βββ models/ # Directory to save/load trained models
|
146 |
+
β βββ handwritten_name_ocr_model.pth # Default model save path
|
147 |
+
βββ model_ocr.py # Defines the CRNN model architecture and training/evaluation functions
|
148 |
+
βββ utils_ocr.py # Utility functions for image preprocessing
|
149 |
+
βββ requirements.txt # List of Python dependencies
|
150 |
+
βββ venvy/ # Python virtual environment (created by `python -m venv venvy`)
|
151 |
+
βββ ...
|
152 |
+
````
|
153 |
+
|
154 |
+
## ποΈ Project Index
|
155 |
+
|
156 |
+
`app.py`: The central Streamlit application. Handles UI, triggers training/prediction, and integrates with Hugging Face Hub.
|
157 |
+
|
158 |
+
`config.py`: Stores global configuration variables such as file paths, image dimensions, character sets, and training hyperparameters.
|
159 |
+
|
160 |
+
`data_handler_ocr.py`: Contains the CharIndexer class for character-to-index mapping and the OCRDataset and ocr_collate_fn for efficient data loading and batching for PyTorch.
|
161 |
+
|
162 |
+
`model_ocr.py`: Defines the CNN_Backbone, BidirectionalLSTM, and CRNN (the main OCR model) classes. It also includes functions for train_ocr_model, evaluate_model, save_ocr_model, load_ocr_model, and ctc_greedy_decode.
|
163 |
+
|
164 |
+
``utils_ocr.py``: Provides helper functions for image preprocessing steps like binarization, resizing, and normalization, used before feeding images to the model.
|
165 |
+
|
166 |
+
|
167 |
+
|
168 |
+
## π Roadmap
|
169 |
+
- Advanced Data Augmentation: Implement more sophisticated augmentation techniques (e.g., elastic deformations, random noise) for training data.
|
170 |
+
- Beam Search Decoding: Replace greedy decoding with beam search for potentially more accurate predictions.
|
171 |
+
- Error Analysis Dashboard: Integrate a more detailed error analysis section to visualize common recognition mistakes.
|
172 |
+
- Support for Multiple Languages: Extend character sets and train on multilingual datasets.
|
173 |
+
- Deployment to Cloud Platforms: Provide instructions for deploying the Streamlit app to platforms like Hugging Face Spaces, Heroku, or AWS.
|
174 |
+
- Pre-trained Model Download: Allow users to download pre-trained models directly from Hugging Face Hub.
|
175 |
+
- Interactive Drawing Pad: Enable users to draw a name directly in the app for recognition.
|
176 |
+
|
177 |
+
## π Contribution
|
178 |
+
Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please feel free to *fork the repository.*
|
179 |
+
- Create a new branch (git checkout -b feature/your-feature-name).
|
180 |
+
Make your changes.
|
181 |
+
- Commit your changes (git commit -m 'Add new feature').
|
182 |
+
- Push to the branch (git push origin feature/your-feature-name).
|
183 |
+
- Open a Pull Request.
|
184 |
+
|
185 |
+
## βοΈ License
|
186 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
187 |
+
|
188 |
+
## β¨ Acknowledgements
|
189 |
+
**Streamlit**: For building interactive web applications with ease.
|
190 |
+
|
191 |
+
**PyTorch**: The open-source machine learning framework.
|
192 |
+
|
193 |
+
**Hugging** Face Hub: For model and dataset sharing.
|
194 |
+
|
195 |
+
**OpenCV**: For image processing utilities (implicitly used via utils_ocr).
|
196 |
+
|
197 |
+
**EditDistance**: For efficient calculation of character error rate.
|
198 |
+
|
199 |
+
**tqdm**: For progress bars during training.
|
200 |
+
|
201 |
+
---
|
202 |
+
|
203 |
+
*Built using Streamlit, PyTorch, OpenCV, and EditDistance Β© 2025 by **MFT***
|