demo-updated / README.md
Kazel's picture
Update README.md
2ee6344 verified
|
raw
history blame
12.2 kB
metadata
title: Collar Multimodal RAG Demo
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false

Collar Multimodal RAG Demo - Production Ready

A production-ready multimodal RAG (Retrieval-Augmented Generation) system with team management, chat history, and advanced document processing capabilities.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

πŸš€ New Production Features

1. Multi-Page Citations

  • Complex Query Support: The AI can now retrieve and cite multiple pages when queries reference information across different documents
  • Smart Citation System: Automatically identifies and displays which pages contain relevant information
  • Configurable Results: Users can specify how many pages to retrieve (1-10 pages)

2. Team-Based Repository Management

  • Folder Uploads: Upload multiple documents as organized collections
  • Team Isolation: Each team has access only to their own document collections
  • Master Repository: Documents are organized in team-specific repositories for easy access
  • Collection Naming: Optional custom names for document collections

3. Authentication & Team Management

  • User Authentication: Secure login system with bcrypt password hashing
  • Team-Based Access: Separate entry points for Team A and Team B
  • Session Management: Secure session handling with automatic timeout
  • Access Control: Users can only access and manage their team's documents

4. Chat History & Persistence

  • Conversation Tracking: All queries and responses are saved to a SQLite database
  • Historical Context: View previous conversations with timestamps
  • Cited Pages History: Track which pages were referenced in each conversation
  • Team-Specific History: Each team sees only their own conversation history

5. Advanced Relevance Scoring

  • Multimodal Embeddings: ColPali-based semantic understanding of text and visual content
  • Intelligent Ranking: Sophisticated relevance scoring with cosine similarity and dot product
  • Quality Assessment: Automatic evaluation of information relevance and completeness
  • Diversity Optimization: Ensures comprehensive coverage across document collections

πŸ”§ Installation & Setup

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • Ollama
  • CUDA-compatible GPU (recommended)

1. Install Dependencies

pip install -r requirements.txt

2. Environment Configuration

Create a .env file with the following variables:

colpali=your_colpali_model
ollama=your_ollama_model
flashattn=1
temperature=0.8
batchsize=5
metrictype=IP
mnum=16
efnum=500
topk=50

3. Start Services

The application will automatically:

  • Start Docker Desktop (Windows)
  • Start Ollama server
  • Initialize Docker containers
  • Create default users

πŸ‘₯ Default Users

The system creates default users for each team:

Team Username Password
Team A admin_team_a admin123_team_a
Team B admin_team_b admin123_team_b

πŸ“– Usage Guide

1. Authentication

  1. Navigate to the "πŸ” Authentication" tab
  2. Enter your username and password
  3. Click "Login" to access team-specific features

2. Document Management

  1. Go to "πŸ“ Document Management" tab
  2. Optionally enter a collection name for organization
  3. Set the maximum pages to extract per document
  4. Upload multiple PPT/PDF files
  5. Click "Upload to Repository" to process documents
  6. Use "Refresh Collections" to see available document collections

3. Advanced Querying

  1. Navigate to "πŸ” Advanced Query" tab
  2. Enter your query in the text box
  3. Adjust the number of pages to retrieve (1-10)
  4. Click "Search Documents" to get AI response with citations
  5. View the cited pages and retrieved document images
  6. Check relevance scores to understand information quality (see "Relevance Score Calculation" section)

4. Chat History

  1. Go to "πŸ’¬ Chat History" tab
  2. Adjust the number of conversations to display
  3. Click "Refresh History" to view recent conversations
  4. Each entry shows query, response, cited pages, and timestamp

5. Data Management

  1. Access "βš™οΈ Data Management" tab
  2. Select collections to delete (team-restricted)
  3. Configure database parameters for optimal performance
  4. Update settings as needed

πŸ—οΈ Architecture

Database Schema

  • users: User accounts with team assignments
  • chat_history: Conversation tracking with citations
  • document_collections: Team-specific document organization

Security Features

  • Password Hashing: bcrypt for secure password storage
  • Session Management: UUID-based session tokens
  • Access Control: Team-based document isolation
  • Input Validation: Comprehensive error handling

Performance Optimizations

  • Multi-threading: Concurrent document processing
  • Memory Management: Efficient image and vector handling
  • Caching: Session-based caching for improved response times
  • Batch Processing: Configurable batch sizes for GPU optimization

πŸ” Relevance Score Calculation

The system uses sophisticated relevance scoring to determine how well retrieved documents align with user queries. This process is crucial for selecting the most pertinent information for generating accurate and contextually appropriate responses.

How Relevance Scores Work

1. Document Embedding Process

  • Page Segmentation: Each document page is processed as a complete unit
  • Multimodal Encoding: Both text and visual elements are captured using ColPali embeddings
  • Vector Representation: Pages are transformed into high-dimensional numerical vectors (typically 768-1024 dimensions)
  • Semantic Capture: The embedding captures semantic meaning, not just keyword matches

2. Query Embedding

  • Query Processing: User queries are converted into embeddings using the same ColPali model
  • Semantic Understanding: The system understands query intent, not just literal words
  • Context Preservation: Query context and meaning are maintained in the embedding

3. Similarity Computation

  • Cosine Similarity: Primary similarity measure between query and document embeddings
  • Dot Product: Alternative similarity calculation for high-dimensional vectors
  • Normalized Scores: Similarity scores are normalized to a 0-1 range
  • Distance Metrics: Lower distances indicate higher relevance

4. Score Aggregation & Ranking

  • Individual Page Scores: Each page gets a relevance score based on similarity
  • Collection Diversity: Scores are adjusted to promote diversity across document collections
  • Consecutive Page Optimization: Adjacent pages are considered for better context
  • Final Ranking: Pages are ranked by their aggregated relevance scores

Relevance Score Interpretation

Score Range Relevance Level Description
0.90 - 1.00 Excellent Highly relevant, directly answers the query
0.80 - 0.89 Very Good Very relevant, provides substantial information
0.70 - 0.79 Good Relevant, contains useful information
0.60 - 0.69 Moderate Somewhat relevant, may contain partial answers
0.50 - 0.59 Basic Minimally relevant, limited usefulness
< 0.50 Poor Not relevant, unlikely to be useful

Example Relevance Calculation

Query: "What are the safety procedures for handling explosives?"

Document Pages:

  1. Page 15: "Safety protocols for explosive materials" β†’ Score: 0.95 (Excellent)
  2. Page 23: "Equipment requirements for explosive handling" β†’ Score: 0.92 (Very Good)
  3. Page 8: "General laboratory safety guidelines" β†’ Score: 0.88 (Very Good)
  4. Page 45: "Chemical storage procedures" β†’ Score: 0.65 (Moderate)

Selection Process:

  • Pages 15, 23, and 8 are selected for their high relevance
  • Page 45 is excluded due to lower relevance
  • The system ensures diversity across different aspects of safety procedures

Advanced Features

Multi-Modal Relevance

  • Visual Elements: Images, charts, and diagrams contribute to relevance scores
  • Text-Vision Alignment: ColPali captures relationships between text and visual content
  • Layout Understanding: Document structure and formatting influence relevance

Context-Aware Scoring

  • Query Complexity: Complex queries may retrieve more pages with varied scores
  • Cross-Reference Detection: Pages that reference each other get boosted scores
  • Temporal Relevance: Recent documents may receive slight score adjustments

Quality Assurance

  • Score Verification: System validates that selected pages meet minimum relevance thresholds
  • Diversity Optimization: Ensures selected pages provide comprehensive coverage
  • Redundancy Reduction: Avoids selecting multiple pages with very similar content

Configuration Parameters

# Relevance scoring configuration
metrictype=IP          # Inner Product similarity
mnum=16                # Number of connections in HNSW graph
efnum=500              # Search depth for high-quality results
topk=50                # Maximum results to consider

Performance Impact

  • Search Speed: Relevance scoring adds minimal overhead (~10-50ms per query)
  • Accuracy: High-quality embeddings ensure accurate relevance assessment
  • Scalability: Efficient vector operations support large document collections
  • Memory Usage: Optimized to handle thousands of document pages efficiently

πŸ”’ Security Considerations

Production Deployment

  1. HTTPS: Always use HTTPS in production
  2. Environment Variables: Store sensitive data in environment variables
  3. Database Security: Use production-grade database (PostgreSQL/MySQL)
  4. Rate Limiting: Implement API rate limiting
  5. Logging: Add comprehensive logging for security monitoring

Recommended Security Enhancements

# Add to production deployment
import logging
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

# Rate limiting
limiter = Limiter(
    app,
    key_func=get_remote_address,
    default_limits=["200 per day", "50 per hour"]
)

# Security headers
@app.after_request
def add_security_headers(response):
    response.headers['X-Content-Type-Options'] = 'nosniff'
    response.headers['X-Frame-Options'] = 'DENY'
    response.headers['X-XSS-Protection'] = '1; mode=block'
    return response

πŸš€ Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Environment Variables for Production

# Database
DATABASE_URL=postgresql://user:password@localhost/dbname
SECRET_KEY=your-secret-key-here

# Security
BCRYPT_ROUNDS=12
SESSION_TIMEOUT=3600

# Performance
WORKER_THREADS=4
MAX_UPLOAD_SIZE=100MB

πŸ“Š Monitoring & Analytics

Key Metrics to Track

  • Query Response Time: Average time for AI responses
  • Document Processing Time: Time to index new documents
  • User Activity: Login frequency and session duration
  • Error Rates: Failed queries and system errors
  • Storage Usage: Database and file system utilization

Logging Configuration

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('app.log'),
        logging.StreamHandler()
    ]
)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new features
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For support and questions:

  • Create an issue in the repository
  • Check the documentation
  • Review the troubleshooting guide

Made by Collar - Enhanced with Team Management & Chat History