textlens-ocr / README.md
GoConqurer's picture
πŸ”§ Fix HuggingFace Spaces deployment issues
6789f6f
|
raw
history blame
7.64 kB
metadata
title: TextLens - AI-Powered OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

πŸ” TextLens - AI-Powered OCR

A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.

✨ Features

  • πŸ€– Advanced VLM OCR: Uses Microsoft Florence-2 for state-of-the-art text extraction
  • πŸ”„ Smart Fallback System: Automatically falls back to EasyOCR if Florence-2 fails
  • πŸ§ͺ Demo Mode: Test mode for demonstration when other methods are unavailable
  • 🎨 Modern UI: Clean, responsive Gradio interface with excellent UX
  • πŸ“± Multiple Input Methods: Upload, webcam, clipboard support
  • ⚑ Real-time Processing: Automatic text extraction on image upload
  • πŸ“‹ Copy Functionality: Easy text copying from results
  • πŸš€ GPU Acceleration: Supports CUDA, MPS, and CPU inference
  • πŸ›‘οΈ Error Handling: Robust error handling and user-friendly messages

πŸ—οΈ Architecture

textlens-ocr/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # Project documentation
β”œβ”€β”€ models/               # OCR processing modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── ocr_processor.py  # Advanced OCR class with fallbacks
β”œβ”€β”€ utils/                # Utility functions
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── image_utils.py    # Image preprocessing utilities
└── ui/                   # User interface components
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ interface.py      # Gradio interface
    β”œβ”€β”€ handlers.py       # Event handlers
    └── styles.py         # CSS styling

πŸš€ Quick Start

Local Development

  1. Clone the repository

    git clone https://github.com/KumarAmrit30/textlens-ocr.git
    cd textlens-ocr
    
  2. Set up Python environment

    python3 -m venv textlens_env
    source textlens_env/bin/activate  # On Windows: textlens_env\Scripts\activate
    
  3. Install dependencies

    pip install -r requirements.txt
    
  4. Run the application

    python app.py
    
  5. Open your browser Navigate to http://localhost:7860

Quick Test

Run the test suite to verify everything works:

python test_ocr.py

πŸ”§ Technical Details

OCR Processing Pipeline

  1. Primary: Microsoft Florence-2 VLM

    • State-of-the-art vision-language model
    • Supports both basic OCR and region-based extraction
    • GPU accelerated inference
  2. Fallback: EasyOCR

    • Traditional OCR with good accuracy
    • Works when Florence-2 fails to load
    • Multi-language support
  3. Demo Mode: Test Mode

    • Demonstration functionality
    • Shows interface working correctly
    • Used when other methods are unavailable

Model Loading Strategy

The application uses an intelligent loading strategy:

try:
    # Try Florence-2 with specific revision
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        revision='refs/pr/6',
        trust_remote_code=True
    )
except:
    # Fall back to default Florence-2
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        trust_remote_code=True
    )

Device Detection

Automatically detects and uses the best available device:

  • CUDA: NVIDIA GPUs with CUDA support
  • MPS: Apple Silicon Macs (M1/M2/M3)
  • CPU: Fallback for all systems

πŸ“Š Performance

Model Size Speed Accuracy Use Case
Florence-2-base 230M Fast High General OCR
Florence-2-large 770M Medium Very High High accuracy needs
EasyOCR ~100MB Medium Good Fallback/Multilingual

πŸ” Supported Image Formats

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • WebP (.webp)
  • BMP (.bmp)
  • TIFF (.tiff, .tif)
  • GIF (.gif)

🎯 Use Cases

  • πŸ“„ Document Digitization: Convert physical documents to text
  • πŸͺ Receipt Processing: Extract data from receipts and invoices
  • πŸ“± Screenshot Text Extraction: Get text from app screenshots
  • πŸš— License Plate Reading: Extract text from vehicle plates
  • πŸ“š Book/Article Scanning: Digitize printed materials
  • 🌐 Multilingual Text: Process text in various languages

πŸ› οΈ Configuration

Model Selection

Change the model in models/ocr_processor.py:

# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")

UI Customization

Modify the Gradio interface in app.py:

  • Update colors and styling in the CSS section
  • Change layout in the create_interface() function
  • Add new features or components

πŸ§ͺ Testing

The project includes comprehensive tests:

# Run all tests
python test_ocr.py

# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"

πŸš€ Deployment

HuggingFace Spaces

  1. Fork this repository
  2. Create a new Space on HuggingFace
  3. Connect your repository
  4. The app will automatically deploy

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Local Server

# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app

πŸ” Environment Variables

Variable Description Default
GRADIO_SERVER_PORT Server port 7860
TRANSFORMERS_CACHE Model cache directory ~/.cache/huggingface
CUDA_VISIBLE_DEVICES GPU device selection All available

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

πŸ“ API Reference

OCRProcessor Class

from models.ocr_processor import OCRProcessor

# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

# Extract text
text = ocr.extract_text(image)

# Extract with regions
result = ocr.extract_text_with_regions(image)

# Get model info
info = ocr.get_model_info()

πŸ› Troubleshooting

Common Issues

  1. Model Loading Errors

    # Install missing dependencies
    pip install einops timm
    
  2. CUDA Out of Memory

    # Use CPU instead
    ocr = OCRProcessor()
    ocr.device = "cpu"
    
  3. SSL Certificate Errors

    # Update certificates (macOS)
    /Applications/Python\ 3.x/Install\ Certificates.command
    

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Microsoft for the Florence-2 model
  • HuggingFace for the transformers library
  • Gradio for the web interface framework
  • EasyOCR for fallback OCR capabilities

πŸ“ž Support

  • Create an issue for bug reports
  • Start a discussion for feature requests
  • Check existing issues before posting

Made with ❀️ for the AI community