Spaces:

Kazel
/

demo-updated

Running on Zero

App Files Files Community

demo-updated / README.md

Kazel

Update README.md

2ee6344 verified 22 days ago

preview code

raw

history blame

12.2 kB

metadata

title: Collar Multimodal RAG Demo
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false

Collar Multimodal RAG Demo - Production Ready

A production-ready multimodal RAG (Retrieval-Augmented Generation) system with team management, chat history, and advanced document processing capabilities.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🚀 New Production Features

1. Multi-Page Citations

Complex Query Support: The AI can now retrieve and cite multiple pages when queries reference information across different documents
Smart Citation System: Automatically identifies and displays which pages contain relevant information
Configurable Results: Users can specify how many pages to retrieve (1-10 pages)

2. Team-Based Repository Management

Folder Uploads: Upload multiple documents as organized collections
Team Isolation: Each team has access only to their own document collections
Master Repository: Documents are organized in team-specific repositories for easy access
Collection Naming: Optional custom names for document collections

3. Authentication & Team Management

User Authentication: Secure login system with bcrypt password hashing
Team-Based Access: Separate entry points for Team A and Team B
Session Management: Secure session handling with automatic timeout
Access Control: Users can only access and manage their team's documents

4. Chat History & Persistence

Conversation Tracking: All queries and responses are saved to a SQLite database
Historical Context: View previous conversations with timestamps
Cited Pages History: Track which pages were referenced in each conversation
Team-Specific History: Each team sees only their own conversation history

5. Advanced Relevance Scoring

Multimodal Embeddings: ColPali-based semantic understanding of text and visual content
Intelligent Ranking: Sophisticated relevance scoring with cosine similarity and dot product
Quality Assessment: Automatic evaluation of information relevance and completeness
Diversity Optimization: Ensures comprehensive coverage across document collections

🔧 Installation & Setup

Prerequisites

Python 3.8+
Docker Desktop
Ollama
CUDA-compatible GPU (recommended)

1. Install Dependencies

pip install -r requirements.txt

2. Environment Configuration

Create a .env file with the following variables:

colpali=your_colpali_model
ollama=your_ollama_model
flashattn=1
temperature=0.8
batchsize=5
metrictype=IP
mnum=16
efnum=500
topk=50

3. Start Services

The application will automatically:

Start Docker Desktop (Windows)
Start Ollama server
Initialize Docker containers
Create default users

👥 Default Users

The system creates default users for each team:

Team	Username	Password
Team A	admin_team_a	admin123_team_a
Team B	admin_team_b	admin123_team_b

📖 Usage Guide

1. Authentication

Navigate to the "🔐 Authentication" tab
Enter your username and password
Click "Login" to access team-specific features

2. Document Management

Go to "📁 Document Management" tab
Optionally enter a collection name for organization
Set the maximum pages to extract per document
Upload multiple PPT/PDF files
Click "Upload to Repository" to process documents
Use "Refresh Collections" to see available document collections

3. Advanced Querying

Navigate to "🔍 Advanced Query" tab
Enter your query in the text box
Adjust the number of pages to retrieve (1-10)
Click "Search Documents" to get AI response with citations
View the cited pages and retrieved document images
Check relevance scores to understand information quality (see "Relevance Score Calculation" section)

4. Chat History

Go to "💬 Chat History" tab
Adjust the number of conversations to display
Click "Refresh History" to view recent conversations
Each entry shows query, response, cited pages, and timestamp

5. Data Management

Access "⚙️ Data Management" tab
Select collections to delete (team-restricted)
Configure database parameters for optimal performance
Update settings as needed

🏗️ Architecture

Database Schema

users: User accounts with team assignments
chat_history: Conversation tracking with citations
document_collections: Team-specific document organization

Security Features

Password Hashing: bcrypt for secure password storage
Session Management: UUID-based session tokens
Access Control: Team-based document isolation
Input Validation: Comprehensive error handling

Performance Optimizations

Multi-threading: Concurrent document processing
Memory Management: Efficient image and vector handling
Caching: Session-based caching for improved response times
Batch Processing: Configurable batch sizes for GPU optimization

🔍 Relevance Score Calculation

The system uses sophisticated relevance scoring to determine how well retrieved documents align with user queries. This process is crucial for selecting the most pertinent information for generating accurate and contextually appropriate responses.

How Relevance Scores Work

1. Document Embedding Process

Page Segmentation: Each document page is processed as a complete unit
Multimodal Encoding: Both text and visual elements are captured using ColPali embeddings
Vector Representation: Pages are transformed into high-dimensional numerical vectors (typically 768-1024 dimensions)
Semantic Capture: The embedding captures semantic meaning, not just keyword matches

2. Query Embedding

Query Processing: User queries are converted into embeddings using the same ColPali model
Semantic Understanding: The system understands query intent, not just literal words
Context Preservation: Query context and meaning are maintained in the embedding

3. Similarity Computation

Cosine Similarity: Primary similarity measure between query and document embeddings
Dot Product: Alternative similarity calculation for high-dimensional vectors
Normalized Scores: Similarity scores are normalized to a 0-1 range
Distance Metrics: Lower distances indicate higher relevance

4. Score Aggregation & Ranking

Individual Page Scores: Each page gets a relevance score based on similarity
Collection Diversity: Scores are adjusted to promote diversity across document collections
Consecutive Page Optimization: Adjacent pages are considered for better context
Final Ranking: Pages are ranked by their aggregated relevance scores

Relevance Score Interpretation

Score Range	Relevance Level	Description
0.90 - 1.00	Excellent	Highly relevant, directly answers the query
0.80 - 0.89	Very Good	Very relevant, provides substantial information
0.70 - 0.79	Good	Relevant, contains useful information
0.60 - 0.69	Moderate	Somewhat relevant, may contain partial answers
0.50 - 0.59	Basic	Minimally relevant, limited usefulness
< 0.50	Poor	Not relevant, unlikely to be useful

Example Relevance Calculation

Query: "What are the safety procedures for handling explosives?"

Document Pages:

Page 15: "Safety protocols for explosive materials" → Score: 0.95 (Excellent)
Page 23: "Equipment requirements for explosive handling" → Score: 0.92 (Very Good)
Page 8: "General laboratory safety guidelines" → Score: 0.88 (Very Good)
Page 45: "Chemical storage procedures" → Score: 0.65 (Moderate)

Selection Process:

Pages 15, 23, and 8 are selected for their high relevance
Page 45 is excluded due to lower relevance
The system ensures diversity across different aspects of safety procedures

Advanced Features

Multi-Modal Relevance

Visual Elements: Images, charts, and diagrams contribute to relevance scores
Text-Vision Alignment: ColPali captures relationships between text and visual content
Layout Understanding: Document structure and formatting influence relevance

Context-Aware Scoring

Query Complexity: Complex queries may retrieve more pages with varied scores
Cross-Reference Detection: Pages that reference each other get boosted scores
Temporal Relevance: Recent documents may receive slight score adjustments

Quality Assurance

Score Verification: System validates that selected pages meet minimum relevance thresholds
Diversity Optimization: Ensures selected pages provide comprehensive coverage
Redundancy Reduction: Avoids selecting multiple pages with very similar content

Configuration Parameters

# Relevance scoring configuration
metrictype=IP          # Inner Product similarity
mnum=16                # Number of connections in HNSW graph
efnum=500              # Search depth for high-quality results
topk=50                # Maximum results to consider

Performance Impact

Search Speed: Relevance scoring adds minimal overhead (~10-50ms per query)
Accuracy: High-quality embeddings ensure accurate relevance assessment
Scalability: Efficient vector operations support large document collections
Memory Usage: Optimized to handle thousands of document pages efficiently

🔒 Security Considerations

Production Deployment

HTTPS: Always use HTTPS in production
Environment Variables: Store sensitive data in environment variables
Database Security: Use production-grade database (PostgreSQL/MySQL)
Rate Limiting: Implement API rate limiting
Logging: Add comprehensive logging for security monitoring

Recommended Security Enhancements

# Add to production deployment
import logging
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

# Rate limiting
limiter = Limiter(
    app,
    key_func=get_remote_address,
    default_limits=["200 per day", "50 per hour"]
)

# Security headers
@app.after_request
def add_security_headers(response):
    response.headers['X-Content-Type-Options'] = 'nosniff'
    response.headers['X-Frame-Options'] = 'DENY'
    response.headers['X-XSS-Protection'] = '1; mode=block'
    return response

🚀 Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Environment Variables for Production

# Database
DATABASE_URL=postgresql://user:password@localhost/dbname
SECRET_KEY=your-secret-key-here

# Security
BCRYPT_ROUNDS=12
SESSION_TIMEOUT=3600

# Performance
WORKER_THREADS=4
MAX_UPLOAD_SIZE=100MB

📊 Monitoring & Analytics

Key Metrics to Track

Query Response Time: Average time for AI responses
Document Processing Time: Time to index new documents
User Activity: Login frequency and session duration
Error Rates: Failed queries and system errors
Storage Usage: Database and file system utilization

Logging Configuration

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('app.log'),
        logging.StreamHandler()
    ]
)

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new features
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

For support and questions:

Create an issue in the repository
Check the documentation
Review the troubleshooting guide

Made by Collar - Enhanced with Team Management & Chat History