marianeft's picture
Initial upload of files
442eace
metadata
title: Handwritten Name Recognizer
emoji: πŸ“Š
colorFrom: indigo
colorTo: gray
sdk: docker
pinned: false
license: apache-2.0

Handwritten Name Recognition (OCR) App ✍🏻

An end-to-end Streamlit application for training and predicting handwritten names using a CRNN model.

πŸ“ƒ Demo and Documentation πŸ“‚ GitHub Repository


Table of Contents


πŸ•ΉοΈ Overview

This project implements a Handwritten Name Recognition (OCR) system using a Convolutional Recurrent Neural Network (CRNN) architecture built with PyTorch. The application is presented as an interactive web interface using Streamlit, allowing users to:

  1. Train a new OCR model from a local dataset.
  2. Load a pre-trained model.
  3. Predict text from uploaded handwritten image files.
  4. Upload the local dataset to the Hugging Face Hub for sharing and versioning.

The CRNN model combines a CNN backbone for feature extraction from images and a Bidirectional LSTM layer for sequence modeling, followed by a linear layer for character classification using CTC (Connectionist Temporal Classification) Loss.


🚩 Quickstart

Follow these steps to get the application up and running on your local machine.

Prerequisites

  • Python 3.8+
  • pip (Python package installer)

1. Clone the Repository (or set up your project folder)

Ensure your project structure matches the expected layout (e.g., app.py, config.py, data/, models/ etc.).

2. Create and Activate a Virtual Environment

It's highly recommended to use a virtual environment to manage dependencies.

# Navigate to your project root directory
cd path/to/your/handwritten_name_ocr_app

# Create a virtual environment named 'venvy'
python -m venv venvy

# Activate the virtual environment
# On Windows (Command Prompt):
.\venvy\Scripts\activate.bat

# On Windows (PowerShell):
.\venvy\Scripts\Activate.ps1

# On macOS/Linux:
source venvy/bin/activate

3. Install Dependencies

With your virtual environment activated, install all required Python packages: pip install streamlit pandas numpy Pillow torch torchvision scikit-learn tqdm editdistance huggingface_hub

Note on PyTorch (torch and torchvision): The command above installs the CPU-only version of PyTorch. If you have a CUDA-enabled GPU and want to leverage it for faster training, please refer to the official PyTorch website (pytorch.org/get-started/locally/) for specific installation commands tailored to your CUDA version.

4. Prepare Your Dataset

The application expects a dataset structured as follows:

data/
β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”œβ”€β”€ image1.png
β”‚   β”‚   β”œβ”€β”€ image2.png
β”‚   β”‚   └── ...
β”‚   └── test/
β”‚       β”œβ”€β”€ image_test1.png
β”‚       β”œβ”€β”€ image_test2.png
β”‚       └── ...
β”œβ”€β”€ train.csv
└── test.csv

5. Clear Python Cache (Important!)

After making code changes or installing new packages, it's crucial to clear Python's compiled cache to ensure the latest code is used.

find . -name "__pycache__" -exec rm -rf {} +  # For macOS/Linux

Get-ChildItem -Path . -Include __pycache__ -Recurse | Remove-Item -Recurse -Force # For Windows PowerShell

6. Run the Streamlit Application

With your virtual environment activated and dependencies installed: streamlit run app.py

This will open the application in your web browser.

✏️ Features

  • CRNN Model Architecture: Utilizes a Convolutional Recurrent Neural Network for robust OCR.
  • CTC Loss: Employs Connectionist Temporal Classification for sequence prediction. Model Training: Train a new OCR model from your local image and CSV datasets.
  • Pre-trained Model Loading: Load previously saved models to avoid retraining.
  • Handwritten Text Prediction: Upload an image and get instant text recognition.
  • Training Progress Visualization: Real-time updates and plots for training loss, CER, and accuracy.
  • Hugging Face Hub Integration: Seamlessly upload your dataset to the Hugging Face Hub for easy sharing and version control.
  • Responsive UI: Built with Streamlit for an intuitive and user-friendly experience.

πŸ—οΈ Project Structure

handwritten_name_ocr_app/
β”œβ”€β”€ app.py                  # Main Streamlit application file
β”œβ”€β”€ config.py               # Configuration settings (paths, model params, chars)
β”œβ”€β”€ data/                   # Directory for datasets
β”‚   β”œβ”€β”€ images/
β”‚   β”‚   β”œβ”€β”€ train/          # Training images
β”‚   β”‚   └── test/           # Testing images
β”‚   β”œβ”€β”€ train.csv           # Training labels
β”‚   └── test.csv            # Testing labels
β”œβ”€β”€ data_handler_ocr.py     # Custom PyTorch Dataset and DataLoader logic
β”œβ”€β”€ models/                 # Directory to save/load trained models
β”‚   └── handwritten_name_ocr_model.pth # Default model save path
β”œβ”€β”€ model_ocr.py            # Defines the CRNN model architecture and training/evaluation functions
β”œβ”€β”€ utils_ocr.py            # Utility functions for image preprocessing
β”œβ”€β”€ requirements.txt        # List of Python dependencies
└── venvy/                  # Python virtual environment (created by `python -m venv venvy`)
    └── ...

πŸ—ƒοΈ Project Index

app.py: The central Streamlit application. Handles UI, triggers training/prediction, and integrates with Hugging Face Hub.

config.py: Stores global configuration variables such as file paths, image dimensions, character sets, and training hyperparameters.

data_handler_ocr.py: Contains the CharIndexer class for character-to-index mapping and the OCRDataset and ocr_collate_fn for efficient data loading and batching for PyTorch.

model_ocr.py: Defines the CNN_Backbone, BidirectionalLSTM, and CRNN (the main OCR model) classes. It also includes functions for train_ocr_model, evaluate_model, save_ocr_model, load_ocr_model, and ctc_greedy_decode.

utils_ocr.py: Provides helper functions for image preprocessing steps like binarization, resizing, and normalization, used before feeding images to the model.

πŸ“Œ Roadmap

  • Advanced Data Augmentation: Implement more sophisticated augmentation techniques (e.g., elastic deformations, random noise) for training data.
  • Beam Search Decoding: Replace greedy decoding with beam search for potentially more accurate predictions.
  • Error Analysis Dashboard: Integrate a more detailed error analysis section to visualize common recognition mistakes.
  • Support for Multiple Languages: Extend character sets and train on multilingual datasets.
  • Deployment to Cloud Platforms: Provide instructions for deploying the Streamlit app to platforms like Hugging Face Spaces, Heroku, or AWS.
  • Pre-trained Model Download: Allow users to download pre-trained models directly from Hugging Face Hub.
  • Interactive Drawing Pad: Enable users to draw a name directly in the app for recognition.

🎁 Contribution

Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please feel free to fork the repository.

  • Create a new branch (git checkout -b feature/your-feature-name). Make your changes.
  • Commit your changes (git commit -m 'Add new feature').
  • Push to the branch (git push origin feature/your-feature-name).
  • Open a Pull Request.

βš–οΈ License

This project is licensed under the MIT License - see the LICENSE file for details.

✨ Acknowledgements

Streamlit: For building interactive web applications with ease.

PyTorch: The open-source machine learning framework.

Hugging Face Hub: For model and dataset sharing.

OpenCV: For image processing utilities (implicitly used via utils_ocr).

EditDistance: For efficient calculation of character error rate.

tqdm: For progress bars during training.


Built using Streamlit, PyTorch, OpenCV, and EditDistance Β© 2025 by MFT