Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.47.0
title: Collar Multimodal RAG Demo
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
Collar Multimodal RAG Demo - Production Ready
A production-ready multimodal RAG (Retrieval-Augmented Generation) system with team management, chat history, and advanced document processing capabilities.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
π New Production Features
1. Multi-Page Citations
- Complex Query Support: The AI can now retrieve and cite multiple pages when queries reference information across different documents
- Smart Citation System: Automatically identifies and displays which pages contain relevant information
- Configurable Results: Users can specify how many pages to retrieve (1-10 pages)
2. Team-Based Repository Management
- Folder Uploads: Upload multiple documents as organized collections
- Team Isolation: Each team has access only to their own document collections
- Master Repository: Documents are organized in team-specific repositories for easy access
- Collection Naming: Optional custom names for document collections
3. Authentication & Team Management
- User Authentication: Secure login system with bcrypt password hashing
- Team-Based Access: Separate entry points for Team A and Team B
- Session Management: Secure session handling with automatic timeout
- Access Control: Users can only access and manage their team's documents
4. Chat History & Persistence
- Conversation Tracking: All queries and responses are saved to a SQLite database
- Historical Context: View previous conversations with timestamps
- Cited Pages History: Track which pages were referenced in each conversation
- Team-Specific History: Each team sees only their own conversation history
5. Advanced Relevance Scoring
- Multimodal Embeddings: ColPali-based semantic understanding of text and visual content
- Intelligent Ranking: Sophisticated relevance scoring with cosine similarity and dot product
- Quality Assessment: Automatic evaluation of information relevance and completeness
- Diversity Optimization: Ensures comprehensive coverage across document collections
π§ Installation & Setup
Prerequisites
- Python 3.8+
- Docker Desktop
- Ollama
- CUDA-compatible GPU (recommended)
1. Install Dependencies
pip install -r requirements.txt
2. Environment Configuration
Create a .env
file with the following variables:
colpali=your_colpali_model
ollama=your_ollama_model
flashattn=1
temperature=0.8
batchsize=5
metrictype=IP
mnum=16
efnum=500
topk=50
3. Start Services
The application will automatically:
- Start Docker Desktop (Windows)
- Start Ollama server
- Initialize Docker containers
- Create default users
π₯ Default Users
The system creates default users for each team:
Team | Username | Password |
---|---|---|
Team A | admin_team_a | admin123_team_a |
Team B | admin_team_b | admin123_team_b |
π Usage Guide
1. Authentication
- Navigate to the "π Authentication" tab
- Enter your username and password
- Click "Login" to access team-specific features
2. Document Management
- Go to "π Document Management" tab
- Optionally enter a collection name for organization
- Set the maximum pages to extract per document
- Upload multiple PPT/PDF files
- Click "Upload to Repository" to process documents
- Use "Refresh Collections" to see available document collections
3. Advanced Querying
- Navigate to "π Advanced Query" tab
- Enter your query in the text box
- Adjust the number of pages to retrieve (1-10)
- Click "Search Documents" to get AI response with citations
- View the cited pages and retrieved document images
- Check relevance scores to understand information quality (see "Relevance Score Calculation" section)
4. Chat History
- Go to "π¬ Chat History" tab
- Adjust the number of conversations to display
- Click "Refresh History" to view recent conversations
- Each entry shows query, response, cited pages, and timestamp
5. Data Management
- Access "βοΈ Data Management" tab
- Select collections to delete (team-restricted)
- Configure database parameters for optimal performance
- Update settings as needed
ποΈ Architecture
Database Schema
- users: User accounts with team assignments
- chat_history: Conversation tracking with citations
- document_collections: Team-specific document organization
Security Features
- Password Hashing: bcrypt for secure password storage
- Session Management: UUID-based session tokens
- Access Control: Team-based document isolation
- Input Validation: Comprehensive error handling
Performance Optimizations
- Multi-threading: Concurrent document processing
- Memory Management: Efficient image and vector handling
- Caching: Session-based caching for improved response times
- Batch Processing: Configurable batch sizes for GPU optimization
π Relevance Score Calculation
The system uses sophisticated relevance scoring to determine how well retrieved documents align with user queries. This process is crucial for selecting the most pertinent information for generating accurate and contextually appropriate responses.
How Relevance Scores Work
1. Document Embedding Process
- Page Segmentation: Each document page is processed as a complete unit
- Multimodal Encoding: Both text and visual elements are captured using ColPali embeddings
- Vector Representation: Pages are transformed into high-dimensional numerical vectors (typically 768-1024 dimensions)
- Semantic Capture: The embedding captures semantic meaning, not just keyword matches
2. Query Embedding
- Query Processing: User queries are converted into embeddings using the same ColPali model
- Semantic Understanding: The system understands query intent, not just literal words
- Context Preservation: Query context and meaning are maintained in the embedding
3. Similarity Computation
- Cosine Similarity: Primary similarity measure between query and document embeddings
- Dot Product: Alternative similarity calculation for high-dimensional vectors
- Normalized Scores: Similarity scores are normalized to a 0-1 range
- Distance Metrics: Lower distances indicate higher relevance
4. Score Aggregation & Ranking
- Individual Page Scores: Each page gets a relevance score based on similarity
- Collection Diversity: Scores are adjusted to promote diversity across document collections
- Consecutive Page Optimization: Adjacent pages are considered for better context
- Final Ranking: Pages are ranked by their aggregated relevance scores
Relevance Score Interpretation
Score Range | Relevance Level | Description |
---|---|---|
0.90 - 1.00 | Excellent | Highly relevant, directly answers the query |
0.80 - 0.89 | Very Good | Very relevant, provides substantial information |
0.70 - 0.79 | Good | Relevant, contains useful information |
0.60 - 0.69 | Moderate | Somewhat relevant, may contain partial answers |
0.50 - 0.59 | Basic | Minimally relevant, limited usefulness |
< 0.50 | Poor | Not relevant, unlikely to be useful |
Example Relevance Calculation
Query: "What are the safety procedures for handling explosives?"
Document Pages:
- Page 15: "Safety protocols for explosive materials" β Score: 0.95 (Excellent)
- Page 23: "Equipment requirements for explosive handling" β Score: 0.92 (Very Good)
- Page 8: "General laboratory safety guidelines" β Score: 0.88 (Very Good)
- Page 45: "Chemical storage procedures" β Score: 0.65 (Moderate)
Selection Process:
- Pages 15, 23, and 8 are selected for their high relevance
- Page 45 is excluded due to lower relevance
- The system ensures diversity across different aspects of safety procedures
Advanced Features
Multi-Modal Relevance
- Visual Elements: Images, charts, and diagrams contribute to relevance scores
- Text-Vision Alignment: ColPali captures relationships between text and visual content
- Layout Understanding: Document structure and formatting influence relevance
Context-Aware Scoring
- Query Complexity: Complex queries may retrieve more pages with varied scores
- Cross-Reference Detection: Pages that reference each other get boosted scores
- Temporal Relevance: Recent documents may receive slight score adjustments
Quality Assurance
- Score Verification: System validates that selected pages meet minimum relevance thresholds
- Diversity Optimization: Ensures selected pages provide comprehensive coverage
- Redundancy Reduction: Avoids selecting multiple pages with very similar content
Configuration Parameters
# Relevance scoring configuration
metrictype=IP # Inner Product similarity
mnum=16 # Number of connections in HNSW graph
efnum=500 # Search depth for high-quality results
topk=50 # Maximum results to consider
Performance Impact
- Search Speed: Relevance scoring adds minimal overhead (~10-50ms per query)
- Accuracy: High-quality embeddings ensure accurate relevance assessment
- Scalability: Efficient vector operations support large document collections
- Memory Usage: Optimized to handle thousands of document pages efficiently
π Security Considerations
Production Deployment
- HTTPS: Always use HTTPS in production
- Environment Variables: Store sensitive data in environment variables
- Database Security: Use production-grade database (PostgreSQL/MySQL)
- Rate Limiting: Implement API rate limiting
- Logging: Add comprehensive logging for security monitoring
Recommended Security Enhancements
# Add to production deployment
import logging
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
# Rate limiting
limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour"]
)
# Security headers
@app.after_request
def add_security_headers(response):
response.headers['X-Content-Type-Options'] = 'nosniff'
response.headers['X-Frame-Options'] = 'DENY'
response.headers['X-XSS-Protection'] = '1; mode=block'
return response
π Deployment
Docker Deployment
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Environment Variables for Production
# Database
DATABASE_URL=postgresql://user:password@localhost/dbname
SECRET_KEY=your-secret-key-here
# Security
BCRYPT_ROUNDS=12
SESSION_TIMEOUT=3600
# Performance
WORKER_THREADS=4
MAX_UPLOAD_SIZE=100MB
π Monitoring & Analytics
Key Metrics to Track
- Query Response Time: Average time for AI responses
- Document Processing Time: Time to index new documents
- User Activity: Login frequency and session duration
- Error Rates: Failed queries and system errors
- Storage Usage: Database and file system utilization
Logging Configuration
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('app.log'),
logging.StreamHandler()
]
)
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new features
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
For support and questions:
- Create an issue in the repository
- Check the documentation
- Review the troubleshooting guide
Made by Collar - Enhanced with Team Management & Chat History