HF_RepoSense / README.md
naman1102's picture
Help
9a9c028
|
raw
history blame
9.4 kB
metadata
title: Agentic HF Analyzer
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: Recommends users which Repos/Spaces to look at

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

πŸš€ HF Repo Analyzer

An AI-powered Hugging Face repository discovery and analysis tool that helps you find, evaluate, and explore the best repositories for your specific needs.

HF Repo Analyzer Python Hugging Face

✨ Features

  • πŸ€– AI Assistant: Intelligent conversation-based repository discovery
  • πŸ” Smart Search: Auto-detection of repository IDs vs. keywords
  • πŸ“Š Automated Analysis: LLM-powered repository evaluation and ranking
  • πŸ† Top 3 Selection: AI-curated most relevant repositories
  • πŸ’¬ Repository Explorer: Interactive chat with repository contents
  • 🎯 Requirements Extraction: Automatic keyword extraction from conversations
  • πŸ“‹ Comprehensive Results: Detailed analysis with strengths, weaknesses, and specialities

🚦 Quick Start

Prerequisites

  • Python 3.8+
  • OpenAI API key (for LLM analysis)
  • Hugging Face access (for repository downloads)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd Agentic_HF_Analyzer
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Set up environment variables

    export modal_api="your_openai_api_key"
    export base_url="your_openai_base_url"
    
  4. Run the application

    python app.py
    
  5. Open your browser to http://localhost:7860

πŸ“– User Guide

πŸ€– Using the AI Assistant (Recommended)

  1. Start a Conversation

    • Navigate to the "πŸ€– AI Assistant" tab
    • Describe your project: "I'm building a chatbot for customer service"
    • The AI will ask clarifying questions about your needs
  2. Automatic Discovery

    • When the AI has enough information, it will automatically:
      • Extract relevant keywords from your conversation
      • Search for matching repositories
      • Analyze and rank them by relevance
  3. Review Results

    • The interface automatically switches to "πŸ”¬ Analysis & Results"
    • View the top 3 most relevant repositories
    • Browse all analyzed repositories with detailed insights

πŸ“ Using Smart Search (Direct Input)

  1. Repository IDs

    microsoft/DialoGPT-medium
    openai/whisper
    huggingface/transformers
    
  2. Keywords

    text generation
    image classification
    sentiment analysis
    
  3. Mixed Input

    • The system automatically detects the input type
    • Repository IDs (containing /) are processed directly
    • Keywords trigger automatic repository search

πŸ”¬ Analyzing Results

  • Top 3 Repositories: AI-selected most relevant based on your requirements
  • Detailed Analysis: Strengths, weaknesses, specialities, and relevance ratings
  • Quick Actions: Click repository names to visit or explore them
  • Repository Explorer: Deep dive into individual repositories with AI chat

πŸ” Repository Explorer

  1. Access Methods:

    • Click "πŸ” Open in Repo Explorer" from repository actions
    • Manually enter repository ID in the Repo Explorer tab
  2. Features:

    • Automatic repository loading and analysis
    • Interactive chat about repository contents
    • File structure exploration
    • Code analysis and explanations

πŸ› οΈ Technical Architecture

Core Components

app.py                 # Main Gradio interface and orchestration
β”œβ”€β”€ analyzer.py        # Repository analysis and LLM processing
β”œβ”€β”€ hf_utils.py       # Hugging Face API interactions
β”œβ”€β”€ chatbot_page.py   # AI assistant conversation logic
└── repo_explorer.py  # Repository exploration interface

Key Features Implementation

πŸ€– AI Assistant

  • System Prompt: Focused on requirements gathering, not recommendations
  • Auto-Extraction: Detects conversation readiness for keyword extraction
  • Smart Processing: Converts natural language to actionable search queries

πŸ” Smart Input Detection

def is_repo_id_format(text: str) -> bool:
    # Detects if input contains repository IDs (with /) vs keywords
    lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
    slash_count = sum(1 for line in lines if '/' in line)
    return slash_count >= len(lines) * 0.5

πŸ† LLM-Powered Repository Ranking

  • Model: Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ
  • Criteria: Requirements matching, strengths, relevance rating, speciality alignment
  • Output: JSON-formatted repository rankings

πŸ“Š Analysis Pipeline

  1. Download: Repository files (.py, .md, .txt)
  2. Combine: Merge files into single analyzable document
  3. Analyze: LLM evaluation for strengths, weaknesses, specialities
  4. Rank: User requirement-based relevance scoring
  5. Select: Top 3 most relevant repositories

Data Flow

graph TD
    A[User Input] --> B{Input Type?}
    B -->|Keywords| C[Repository Search]
    B -->|Repo IDs| D[Direct Processing]
    C --> E[Repository List]
    D --> E
    E --> F[Download & Analyze]
    F --> G[LLM Evaluation]
    G --> H[Ranking & Selection]
    H --> I[Results Display]
    I --> J[Repository Explorer]

File Structure

πŸ“¦ Agentic_HF_Analyzer/
β”œβ”€β”€ πŸ“„ app.py                    # Main application
β”œβ”€β”€ πŸ“„ analyzer.py               # Repository analysis logic
β”œβ”€β”€ πŸ“„ hf_utils.py              # Hugging Face utilities
β”œβ”€β”€ πŸ“„ chatbot_page.py          # AI assistant functionality
β”œβ”€β”€ πŸ“„ repo_explorer.py         # Repository exploration
β”œβ”€β”€ πŸ“„ requirements.txt         # Python dependencies
β”œβ”€β”€ πŸ“„ README.md               # Documentation
β”œβ”€β”€ πŸ“„ repo_ids.csv            # Analysis results storage
└── πŸ“ repo_files/             # Temporary repository downloads

Dependencies

gradio>=4.0.0          # Web interface framework
pandas>=1.5.0          # Data manipulation
regex>=2022.0.0        # Advanced regex operations
openai>=1.0.0          # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0       # HTTP requests

Environment Variables

Variable Description Required
modal_api OpenAI API key for LLM analysis βœ…
base_url OpenAI API base URL βœ…

LLM Integration

Analysis Prompt Structure

ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations  
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}

Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""

Repository Ranking System

  • Input: User requirements + repository analysis data
  • Processing: LLM evaluates relevance and ranks repositories
  • Output: Top 3 most relevant repositories in order

UI Components

Modern Design Features

  • Gradient Backgrounds: Linear gradients for visual appeal
  • Glassmorphism: Backdrop blur effects for modern look
  • Responsive Layout: Adaptive to different screen sizes
  • Interactive Elements: Hover effects and smooth transitions
  • Modal System: Repository action selection popups

Tab Organization

  1. πŸ€– AI Assistant: Conversation-based discovery
  2. πŸ“ Smart Search: Direct input processing
  3. πŸ”¬ Analysis & Results: Comprehensive analysis display
  4. πŸ” Repo Explorer: Interactive repository exploration

Advanced Features

Auto-Navigation

  • Automatic tab switching based on workflow state
  • Smooth scrolling to top on tab changes
  • Progressive disclosure of information

Error Handling

  • Graceful fallbacks for LLM failures
  • CSV update retry mechanisms
  • User-friendly error messages

Performance Optimizations

  • Parallel processing for multiple repositories
  • Progress tracking for long operations
  • Efficient file caching and cleanup

πŸ”§ Configuration

Customizing Analysis

  • Modify CHATBOT_SYSTEM_PROMPT for different assistant behavior
  • Adjust repository search limits in search_top_spaces()
  • Configure analysis criteria in get_top_relevant_repos()

Adding File Types

# In analyzer.py
download_filtered_space_files(
    repo_id, 
    local_dir="repo_files", 
    file_extensions=['.py', '.md', '.txt', '.js', '.ts']  # Add more
)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Gradio: For the amazing web interface framework
  • Hugging Face: For the incredible repository ecosystem
  • OpenAI: For powerful language model capabilities

Built with ❀️ for the open source community

πŸš€ Happy repository hunting! πŸš€