Spaces:
Running
title: TextLens - AI-Powered OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
π TextLens - AI-Powered OCR
A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.
β¨ Features
- π€ Advanced VLM OCR: Uses Microsoft Florence-2 for state-of-the-art text extraction
- π Smart Fallback System: Automatically falls back to EasyOCR if Florence-2 fails
- π§ͺ Demo Mode: Test mode for demonstration when other methods are unavailable
- π¨ Modern UI: Clean, responsive Gradio interface with excellent UX
- π± Multiple Input Methods: Upload, webcam, clipboard support
- β‘ Real-time Processing: Automatic text extraction on image upload
- π Copy Functionality: Easy text copying from results
- π GPU Acceleration: Supports CUDA, MPS, and CPU inference
- π‘οΈ Error Handling: Robust error handling and user-friendly messages
ποΈ Architecture
textlens-ocr/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ models/ # OCR processing modules
β βββ __init__.py
β βββ ocr_processor.py # Advanced OCR class with fallbacks
βββ utils/ # Utility functions
β βββ __init__.py
β βββ image_utils.py # Image preprocessing utilities
βββ ui/ # User interface components
βββ __init__.py
βββ interface.py # Gradio interface
βββ handlers.py # Event handlers
βββ styles.py # CSS styling
π Quick Start
Local Development
Clone the repository
git clone https://github.com/KumarAmrit30/textlens-ocr.git cd textlens-ocr
Set up Python environment
python3 -m venv textlens_env source textlens_env/bin/activate # On Windows: textlens_env\Scripts\activate
Install dependencies
pip install -r requirements.txt
Run the application
python app.py
Open your browser Navigate to
http://localhost:7860
Quick Test
Run the test suite to verify everything works:
python test_ocr.py
π§ Technical Details
OCR Processing Pipeline
Primary: Microsoft Florence-2 VLM
- State-of-the-art vision-language model
- Supports both basic OCR and region-based extraction
- GPU accelerated inference
Fallback: EasyOCR
- Traditional OCR with good accuracy
- Works when Florence-2 fails to load
- Multi-language support
Demo Mode: Test Mode
- Demonstration functionality
- Shows interface working correctly
- Used when other methods are unavailable
Model Loading Strategy
The application uses an intelligent loading strategy:
try:
# Try Florence-2 with specific revision
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
revision='refs/pr/6',
trust_remote_code=True
)
except:
# Fall back to default Florence-2
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
trust_remote_code=True
)
Device Detection
Automatically detects and uses the best available device:
- CUDA: NVIDIA GPUs with CUDA support
- MPS: Apple Silicon Macs (M1/M2/M3)
- CPU: Fallback for all systems
π Performance
Model | Size | Speed | Accuracy | Use Case |
---|---|---|---|---|
Florence-2-base | 230M | Fast | High | General OCR |
Florence-2-large | 770M | Medium | Very High | High accuracy needs |
EasyOCR | ~100MB | Medium | Good | Fallback/Multilingual |
π Supported Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- WebP (.webp)
- BMP (.bmp)
- TIFF (.tiff, .tif)
- GIF (.gif)
π― Use Cases
- π Document Digitization: Convert physical documents to text
- πͺ Receipt Processing: Extract data from receipts and invoices
- π± Screenshot Text Extraction: Get text from app screenshots
- π License Plate Reading: Extract text from vehicle plates
- π Book/Article Scanning: Digitize printed materials
- π Multilingual Text: Process text in various languages
π οΈ Configuration
Model Selection
Change the model in models/ocr_processor.py
:
# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
UI Customization
Modify the Gradio interface in app.py
:
- Update colors and styling in the CSS section
- Change layout in the
create_interface()
function - Add new features or components
π§ͺ Testing
The project includes comprehensive tests:
# Run all tests
python test_ocr.py
# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"
π Deployment
HuggingFace Spaces
- Fork this repository
- Create a new Space on HuggingFace
- Connect your repository
- The app will automatically deploy
Docker Deployment
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Local Server
# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app
π Environment Variables
Variable | Description | Default |
---|---|---|
GRADIO_SERVER_PORT |
Server port | 7860 |
TRANSFORMERS_CACHE |
Model cache directory | ~/.cache/huggingface |
CUDA_VISIBLE_DEVICES |
GPU device selection | All available |
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
π API Reference
OCRProcessor Class
from models.ocr_processor import OCRProcessor
# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Extract text
text = ocr.extract_text(image)
# Extract with regions
result = ocr.extract_text_with_regions(image)
# Get model info
info = ocr.get_model_info()
π Troubleshooting
Common Issues
Model Loading Errors
# Install missing dependencies pip install einops timm
CUDA Out of Memory
# Use CPU instead ocr = OCRProcessor() ocr.device = "cpu"
SSL Certificate Errors
# Update certificates (macOS) /Applications/Python\ 3.x/Install\ Certificates.command
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Microsoft for the Florence-2 model
- HuggingFace for the transformers library
- Gradio for the web interface framework
- EasyOCR for fallback OCR capabilities
π Support
- Create an issue for bug reports
- Start a discussion for feature requests
- Check existing issues before posting
Made with β€οΈ for the AI community