HF_RepoSense / README.md
naman1102's picture
Update README.md
865e2c1 verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade
metadata
title: HF RepoSense
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: AI-powered HuggingFace repository intelligence
tags:
  - agent-demo-track

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

πŸš€ HF RepoSense : Video demo

HF Accounts for all contributors: Naman Gupta: naman1102 Surya Boddu : MLconArtist Lakshmi Girija Dhulipati: dlgirija Mohamed Ifreen Seyed Ibrahim: Mohamed-Ifreen

AI-powered HuggingFace repository intelligence

An intelligent AI system for discovering, analyzing, and evaluating HuggingFace repositories. HF RepoSense uses advanced AI to understand your requirements, search for relevant repositories, and provide comprehensive analysis with personalized recommendations.

HF RepoSense Python Hugging Face

✨ Features

  • πŸ€– AI Assistant: Intelligent conversation-based repository discovery
  • πŸ” Smart Search: Auto-detection of repository IDs vs. keywords
  • πŸ“Š Automated Analysis: LLM-powered repository evaluation and ranking
  • πŸ† Top 3 Selection: AI-curated most relevant repositories
  • πŸ’¬ Repository Explorer: Interactive chat with repository contents
  • 🎯 Requirements Extraction: Automatic keyword extraction from conversations
  • πŸ“‹ Comprehensive Results: Detailed analysis with strengths, weaknesses, and specialities

🚦 Quick Start

Prerequisites

  • Python 3.8+
  • OpenAI API key (for LLM analysis)
  • Hugging Face access (for repository downloads)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd HF-RepoSense
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Set up environment variables

    export modal_api="your_openai_api_key"
    export base_url="your_openai_base_url"
    
  4. Run the application

    python app.py
    
  5. Open your browser to http://localhost:7860

πŸ“– User Guide

πŸ€– Using the AI Assistant (Recommended)

  1. Start a Conversation

    • Navigate to the "πŸ€– AI Assistant" tab
    • Describe your project: "I'm building a chatbot for customer service"
    • The AI will ask clarifying questions about your needs
  2. Automatic Discovery

    • When the AI has enough information, it will automatically:
      • Extract relevant keywords from your conversation
      • Search for matching repositories
      • Analyze and rank them by relevance
  3. Review Results

    • The interface automatically switches to "πŸ”¬ Analysis & Results"
    • View the top 3 most relevant repositories
    • Browse all analyzed repositories with detailed insights

πŸ“ Using Smart Search (Direct Input)

  1. Repository IDs

    microsoft/DialoGPT-medium
    openai/whisper
    huggingface/transformers
    
  2. Keywords

    text generation
    image classification
    sentiment analysis
    
  3. Mixed Input

    • The system automatically detects the input type
    • Repository IDs (containing /) are processed directly
    • Keywords trigger automatic repository search

πŸ”¬ Analyzing Results

  • Top 3 Repositories: AI-selected most relevant based on your requirements
  • Detailed Analysis: Strengths, weaknesses, specialities, and relevance ratings
  • Quick Actions: Click repository names to visit or explore them
  • Repository Explorer: Deep dive into individual repositories with AI chat

πŸ” Repository Explorer

  1. Access Methods:

    • Click "πŸ” Open in Repo Explorer" from repository actions
    • Manually enter repository ID in the Repo Explorer tab
  2. Features:

    • Automatic repository loading and analysis
    • Interactive chat about repository contents
    • File structure exploration
    • Code analysis and explanations

πŸ› οΈ Technical Architecture

Core Components

app.py                 # Main Gradio interface and orchestration
β”œβ”€β”€ analyzer.py        # Repository analysis and LLM processing
β”œβ”€β”€ hf_utils.py       # Hugging Face API interactions
β”œβ”€β”€ chatbot_page.py   # AI assistant conversation logic
└── repo_explorer.py  # Repository exploration interface

Key Features Implementation

πŸ€– AI Assistant

  • System Prompt: Focused on requirements gathering, not recommendations
  • Auto-Extraction: Detects conversation readiness for keyword extraction
  • Smart Processing: Converts natural language to actionable search queries

πŸ” Smart Input Detection

def is_repo_id_format(text: str) -> bool:
    # Detects if input contains repository IDs (with /) vs keywords
    lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
    slash_count = sum(1 for line in lines if '/' in line)
    return slash_count >= len(lines) * 0.5

πŸ† LLM-Powered Repository Ranking

  • Model: Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ
  • Criteria: Requirements matching, strengths, relevance rating, speciality alignment
  • Output: JSON-formatted repository rankings

πŸ“Š Analysis Pipeline

  1. Download: Repository files (.py, .md, .txt)
  2. Combine: Merge files into single analyzable document
  3. Analyze: LLM evaluation for strengths, weaknesses, specialities
  4. Rank: User requirement-based relevance scoring
  5. Select: Top 3 most relevant repositories

Data Flow

graph TD
    A[User Input] --> B{Input Type?}
    B -->|Keywords| C[Repository Search]
    B -->|Repo IDs| D[Direct Processing]
    C --> E[Repository List]
    D --> E
    E --> F[Download & Analyze]
    F --> G[LLM Evaluation]
    G --> H[Ranking & Selection]
    H --> I[Results Display]
    I --> J[Repository Explorer]

File Structure

πŸ“¦ HF-RepoSense/
β”œβ”€β”€ πŸ“„ app.py                    # Main application
β”œβ”€β”€ πŸ“„ analyzer.py               # Repository analysis logic
β”œβ”€β”€ πŸ“„ hf_utils.py              # Hugging Face utilities
β”œβ”€β”€ πŸ“„ chatbot_page.py          # AI assistant functionality
β”œβ”€β”€ πŸ“„ repo_explorer.py         # Repository exploration
β”œβ”€β”€ πŸ“„ requirements.txt         # Python dependencies
β”œβ”€β”€ πŸ“„ README.md               # Documentation
β”œβ”€β”€ πŸ“„ repo_ids.csv            # Analysis results storage
└── πŸ“ repo_files/             # Temporary repository downloads

Dependencies

gradio>=4.0.0          # Web interface framework
pandas>=1.5.0          # Data manipulation
regex>=2022.0.0        # Advanced regex operations
openai>=1.0.0          # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0       # HTTP requests

Environment Variables

Variable Description Required
modal_api OpenAI API key for LLM analysis βœ…
base_url OpenAI API base URL βœ…

LLM Integration

Analysis Prompt Structure

ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations  
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}

Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""

Repository Ranking System

  • Input: User requirements + repository analysis data
  • Processing: LLM evaluates relevance and ranks repositories
  • Output: Top 3 most relevant repositories in order

UI Components

Modern Design Features

  • Gradient Backgrounds: Linear gradients for visual appeal
  • Glassmorphism: Backdrop blur effects for modern look
  • Responsive Layout: Adaptive to different screen sizes
  • Interactive Elements: Hover effects and smooth transitions
  • Modal System: Repository action selection popups

Tab Organization

  1. πŸ€– AI Assistant: Conversation-based discovery
  2. πŸ“ Smart Search: Direct input processing
  3. πŸ”¬ Analysis & Results: Comprehensive analysis display
  4. πŸ” Repo Explorer: Interactive repository exploration

Advanced Features

Auto-Navigation

  • Automatic tab switching based on workflow state
  • Smooth scrolling to top on tab changes
  • Progressive disclosure of information

Error Handling

  • Graceful fallbacks for LLM failures
  • CSV update retry mechanisms
  • User-friendly error messages

Performance Optimizations

  • Parallel processing for multiple repositories
  • Progress tracking for long operations
  • Efficient file caching and cleanup

πŸ”§ Configuration

Customizing Analysis

  • Modify CHATBOT_SYSTEM_PROMPT for different assistant behavior
  • Adjust repository search limits in search_top_spaces()
  • Configure analysis criteria in get_top_relevant_repos()

Adding File Types

# In analyzer.py
download_filtered_space_files(
    repo_id, 
    local_dir="repo_files", 
    file_extensions=['.py', '.md', '.txt', '.js', '.ts']  # Add more
)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Gradio: For the amazing web interface framework
  • Hugging Face: For the incredible repository ecosystem
  • OpenAI: For powerful language model capabilities

Built with ❀️ for the open source community

πŸš€ Happy repository hunting! πŸš€