Spaces:
Running
title: TextLens - AI-Powered OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
π TextLens - AI-Powered OCR
A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.
π Live Demo
π Try it now: https://huggingface.co/spaces/GoConqurer/textlens-ocr
β¨ Key Features
π€ Advanced AI-Powered OCR
- Microsoft Florence-2 VLM: State-of-the-art vision-language model for text extraction
- Intelligent Fallback System: Automatic fallback to EasyOCR if primary model fails
- Multi-Model Support: Florence-2-base and Florence-2-large variants
- Real-time Processing: Instant text extraction on image upload
π¨ Modern User Experience
- Clean UI: Professional Gradio interface with intuitive design
- Multiple Input Methods: Upload files, use webcam, or paste from clipboard
- Copy-to-Clipboard: One-click text copying functionality
- Responsive Design: Works seamlessly on desktop and mobile devices
- Dark/Light Theme: Automatic theme adaptation
β‘ Performance & Reliability
- GPU Acceleration: Supports CUDA, MPS (Apple Silicon), and CPU inference
- Smart Device Detection: Automatically uses best available hardware
- Error Resilience: Robust error handling with graceful degradation
- Memory Optimization: Efficient model loading and cleanup
π‘οΈ Enterprise Features
- Zero Downtime Deployment: Blue-green deployment with health checks
- Health Monitoring: Built-in
/health
and/ready
endpoints - Graceful Shutdown: Signal handling for clean application restarts
- Production Ready: Scalable architecture with automated deployment
ποΈ Architecture
textlens-ocr/
βββ π± Frontend (Gradio UI)
β βββ ui/interface.py # Main interface components
β βββ ui/handlers.py # Event handlers & logic
β βββ ui/styles.py # CSS styling & themes
βββ π§ AI Models
β βββ models/ocr_processor.py # OCR engine with fallbacks
βββ π§ Utilities
β βββ utils/image_utils.py # Image preprocessing
βββ π Deployment
β βββ .github/workflows/ # CI/CD pipelines
β βββ scripts/deploy.py # Manual deployment tools
β βββ deployment.config.yml # Deployment configuration
βββ π Documentation
β βββ README.md # Main documentation
β βββ DEPLOYMENT.md # Deployment guide
βββ βοΈ Configuration
βββ app.py # Main application entry
βββ requirements.txt # Dependencies
π Quick Start
π Online (Recommended)
Instant access - No installation required: π Launch TextLens
π» Local Development
Clone Repository
git clone https://github.com/KumarAmrit30/textlens-ocr.git cd textlens-ocr
Setup Environment
python -m venv textlens_env source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate pip install -r requirements.txt
Launch Application
python app.py
π Open:
http://localhost:7860
π§ͺ Quick Test
# Verify installation
python -c "from models.ocr_processor import OCRProcessor; print('β
TextLens ready!')"
π Model Performance
Model | Size | Speed | Accuracy | Best For |
---|---|---|---|---|
Florence-2-base | 270M | β‘ Fast | π High | General OCR, Real-time |
Florence-2-large | 770M | π Medium | π Very High | High accuracy needs |
EasyOCR | ~100M | π Medium | π Good | Fallback, Multilingual |
π― Supported Use Cases
Category | Examples | Performance |
---|---|---|
π Documents | PDFs, Scanned papers, Forms | βββββ |
π§Ύ Receipts | Shopping receipts, Invoices | ββββ |
π± Screenshots | App interfaces, Error messages | βββββ |
π Vehicle | License plates, VIN numbers | ββββ |
π Books | Printed text, Handwritten notes | ββββ |
π Multilingual | Multiple languages | βββ |
π§ Configuration
ποΈ Model Selection
from models.ocr_processor import OCRProcessor
# Fast inference (recommended)
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Maximum accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
π¨ UI Customization
Modify ui/styles.py
to customize appearance:
# Change color scheme
PRIMARY_COLOR = "#1f77b4"
SECONDARY_COLOR = "#ff7f0e"
# Update layout
INTERFACE_WIDTH = "100%"
βοΈ Environment Variables
Variable | Description | Default |
---|---|---|
SPACE_ID |
HuggingFace Space ID | Auto-detected |
DEPLOYMENT_STAGE |
deployment stage | production |
TRANSFORMERS_CACHE |
Model cache path | ~/.cache/huggingface |
CUDA_VISIBLE_DEVICES |
GPU selection | All available |
π Deployment
π€ HuggingFace Spaces (Recommended)
Automatic Deployment:
- Fork this repository
- Push to
main
/master
branch - GitHub Actions automatically deploys to HuggingFace Spaces
- Access your deployed app at:
https://huggingface.co/spaces/USERNAME/textlens-ocr
Manual Deployment:
- Go to GitHub Actions
- Select "Deploy to HuggingFace Spaces"
- Click "Run workflow"
- Choose deployment type:
- Direct: Quick deployment to production
- Blue-Green: Zero downtime with staging validation
π Zero Downtime Deployment
Our enterprise-grade deployment system ensures zero downtime for users:
Features:
- π΅ Blue-Green Deployment: Test in staging before production
- π₯ Health Monitoring: Automatic health checks with retry logic
- π Graceful Shutdown: Clean application restarts
- π Real-time Monitoring: Deployment status tracking
Health Endpoints:
GET /health
- Application health statusGET /ready
- Application readiness check
Deployment Flow:
graph LR
A[Code Push] --> B[Validate]
B --> C[Deploy Staging]
C --> D[Health Check]
D --> E[Deploy Production]
E --> F[Verify]
F --> G[Complete β
]
π³ Docker Deployment
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Build and run:
docker build -t textlens-ocr .
docker run -p 7860:7860 textlens-ocr
βοΈ Cloud Platforms
Platform | Status | Guide |
---|---|---|
HuggingFace Spaces | β Ready | Deploy Now |
Google Colab | β Compatible | Open in Colab |
AWS/GCP/Azure | π§ Docker | Use Docker deployment |
Heroku | β οΈ Limited | GPU not available |
π§ͺ Testing & Development
π Running Tests
# Basic functionality test
python -c "
from models.ocr_processor import OCRProcessor
ocr = OCRProcessor()
print(f'β
Model loaded: {ocr.get_model_info()}')
"
# Test with sample image
python -c "
from PIL import Image
from models.ocr_processor import OCRProcessor
import requests
# Download test image
img_url = 'https://via.placeholder.com/300x100/000000/FFFFFF?text=Hello+World'
image = Image.open(requests.get(img_url, stream=True).raw)
# Test OCR
ocr = OCRProcessor()
result = ocr.extract_text(image)
print(f'β
OCR Result: {result}')
"
π οΈ Development Tools
# Install development dependencies
pip install -r requirements.txt
# Format code
black . --line-length 88
# Type checking
mypy models/ utils/ ui/
# Lint code
flake8 --max-line-length 88
π API Reference
OCRProcessor Class
from models.ocr_processor import OCRProcessor
# Initialize processor
ocr = OCRProcessor(
model_name="microsoft/Florence-2-base", # Model selection
device=None, # Auto-detect device
torch_dtype=None # Auto-select dtype
)
# Extract text from image
text = ocr.extract_text(image)
# Returns: str
# Extract text with bounding boxes
result = ocr.extract_text_with_regions(image)
# Returns: dict with text and regions
# Get model information
info = ocr.get_model_info()
# Returns: dict with model details
# Cleanup resources
ocr.cleanup()
Health Check API
# Check application health
curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/health
# Response:
{
"status": "healthy",
"timestamp": 1640995200,
"version": "1.0.0",
"environment": "production"
}
# Check readiness
curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/ready
# Response:
{
"status": "ready",
"timestamp": 1640995200
}
π¨ Troubleshooting
Common Issues
Issue | Symptoms | Solution |
---|---|---|
Model Loading Error | ImportError, CUDA errors | Check GPU drivers, install CUDA toolkit |
Memory Error | Out of memory | Reduce batch size, use CPU inference |
SSL Certificate | SSL errors on macOS | Run certificate update command |
Permission Error | File access denied | Check file permissions, run as admin |
Debug Commands
# Check CUDA availability
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
# Check transformers version
python -c "import transformers; print(f'Transformers: {transformers.__version__}')"
# Test health endpoint locally
curl http://localhost:7860/health
# View application logs
tail -f textlens.log
Getting Help
- π Check existing issues: GitHub Issues
- π Create new issue: Provide error details and environment info
- π¬ Join discussion: GitHub Discussions
- π§ Contact: Create an issue for direct support
π€ Contributing
We welcome contributions! Here's how to get started:
π§ Development Setup
Fork & Clone
git clone https://github.com/YOUR_USERNAME/textlens-ocr.git cd textlens-ocr
Create Branch
git checkout -b feature/your-feature-name
Make Changes
- Add new features or fix bugs
- Update tests and documentation
- Follow code style guidelines
Test Changes
python -m pytest tests/ python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
Submit PR
git add . git commit -m "feat: add your feature description" git push origin feature/your-feature-name
π Contribution Guidelines
- Code Style: Follow PEP 8, use Black formatter
- Documentation: Update README and docstrings
- Tests: Add tests for new functionality
- Commits: Use conventional commit messages
- Issues: Link PRs to relevant issues
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Third-Party Licenses
- Microsoft Florence-2: MIT License
- HuggingFace Transformers: Apache License 2.0
- Gradio: Apache License 2.0
- EasyOCR: Apache License 2.0
π Acknowledgments
Special thanks to:
- Microsoft Research for the incredible Florence-2 vision-language model
- HuggingFace for the transformers library and Spaces platform
- Gradio Team for the amazing web interface framework
- JaidedAI for EasyOCR fallback capabilities
- Open Source Community for continuous support and contributions
π Project Status
Component | Status | Version |
---|---|---|
Core OCR | β Stable | v1.0.0 |
Web UI | β Stable | v1.0.0 |
Deployment | β Production | v1.0.0 |
API | β Stable | v1.0.0 |
Documentation | β Complete | v1.0.0 |
π― Roadmap
- Multi-language UI support
- Batch processing for multiple images
- API rate limiting and authentication
- Custom model fine-tuning support
- Mobile app development
- Cloud storage integration
π Support & Community
π Links
- π Homepage: GitHub Repository
- π Live Demo: HuggingFace Spaces
- π Issues: Report Bugs
- π¬ Discussions: GitHub Discussions
- π Documentation: Deployment Guide
π Stats
Made with β€οΈ for the AI community