metadata

title: Agentic HF Analyzer
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: Recommends users which Repos/Spaces to look at

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🚀 HF Repo Analyzer

An AI-powered Hugging Face repository discovery and analysis tool that helps you find, evaluate, and explore the best repositories for your specific needs.

✨ Features

🤖 AI Assistant: Intelligent conversation-based repository discovery
🔍 Smart Search: Auto-detection of repository IDs vs. keywords
📊 Automated Analysis: LLM-powered repository evaluation and ranking
🏆 Top 3 Selection: AI-curated most relevant repositories
💬 Repository Explorer: Interactive chat with repository contents
🎯 Requirements Extraction: Automatic keyword extraction from conversations
📋 Comprehensive Results: Detailed analysis with strengths, weaknesses, and specialities

🚦 Quick Start

Prerequisites

Python 3.8+
OpenAI API key (for LLM analysis)
Hugging Face access (for repository downloads)

Installation

Clone the repository

git clone <repository-url>
cd Agentic_HF_Analyzer

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

export modal_api="your_openai_api_key"
export base_url="your_openai_base_url"

Run the application
```
python app.py
```
Open your browser to http://localhost:7860

📖 User Guide

🤖 Using the AI Assistant (Recommended)

Start a Conversation
- Navigate to the "🤖 AI Assistant" tab
- Describe your project: "I'm building a chatbot for customer service"
- The AI will ask clarifying questions about your needs
Automatic Discovery
- When the AI has enough information, it will automatically:
  - Extract relevant keywords from your conversation
  - Search for matching repositories
  - Analyze and rank them by relevance
Review Results
- The interface automatically switches to "🔬 Analysis & Results"
- View the top 3 most relevant repositories
- Browse all analyzed repositories with detailed insights

📝 Using Smart Search (Direct Input)

Repository IDs

microsoft/DialoGPT-medium
openai/whisper
huggingface/transformers

Keywords

text generation
image classification
sentiment analysis

Mixed Input
- The system automatically detects the input type
- Repository IDs (containing /) are processed directly
- Keywords trigger automatic repository search

🔬 Analyzing Results

Top 3 Repositories: AI-selected most relevant based on your requirements
Detailed Analysis: Strengths, weaknesses, specialities, and relevance ratings
Quick Actions: Click repository names to visit or explore them
Repository Explorer: Deep dive into individual repositories with AI chat

🔍 Repository Explorer

Access Methods:
- Click "🔍 Open in Repo Explorer" from repository actions
- Manually enter repository ID in the Repo Explorer tab
Features:
- Automatic repository loading and analysis
- Interactive chat about repository contents
- File structure exploration
- Code analysis and explanations

🛠️ Technical Architecture

Core Components

app.py                 # Main Gradio interface and orchestration
├── analyzer.py        # Repository analysis and LLM processing
├── hf_utils.py       # Hugging Face API interactions
├── chatbot_page.py   # AI assistant conversation logic
└── repo_explorer.py  # Repository exploration interface

Key Features Implementation

🤖 AI Assistant

System Prompt: Focused on requirements gathering, not recommendations
Auto-Extraction: Detects conversation readiness for keyword extraction
Smart Processing: Converts natural language to actionable search queries

🔍 Smart Input Detection

def is_repo_id_format(text: str) -> bool:
    # Detects if input contains repository IDs (with /) vs keywords
    lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
    slash_count = sum(1 for line in lines if '/' in line)
    return slash_count >= len(lines) * 0.5

🏆 LLM-Powered Repository Ranking

Model: Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ
Criteria: Requirements matching, strengths, relevance rating, speciality alignment
Output: JSON-formatted repository rankings

📊 Analysis Pipeline

Download: Repository files (.py, .md, .txt)
Combine: Merge files into single analyzable document
Analyze: LLM evaluation for strengths, weaknesses, specialities
Rank: User requirement-based relevance scoring
Select: Top 3 most relevant repositories

Data Flow

graph TD
    A[User Input] --> B{Input Type?}
    B -->|Keywords| C[Repository Search]
    B -->|Repo IDs| D[Direct Processing]
    C --> E[Repository List]
    D --> E
    E --> F[Download & Analyze]
    F --> G[LLM Evaluation]
    G --> H[Ranking & Selection]
    H --> I[Results Display]
    I --> J[Repository Explorer]

File Structure

📦 Agentic_HF_Analyzer/
├── 📄 app.py                    # Main application
├── 📄 analyzer.py               # Repository analysis logic
├── 📄 hf_utils.py              # Hugging Face utilities
├── 📄 chatbot_page.py          # AI assistant functionality
├── 📄 repo_explorer.py         # Repository exploration
├── 📄 requirements.txt         # Python dependencies
├── 📄 README.md               # Documentation
├── 📄 repo_ids.csv            # Analysis results storage
└── 📁 repo_files/             # Temporary repository downloads

Dependencies

gradio>=4.0.0          # Web interface framework
pandas>=1.5.0          # Data manipulation
regex>=2022.0.0        # Advanced regex operations
openai>=1.0.0          # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0       # HTTP requests

Environment Variables

Variable	Description	Required
`modal_api`	OpenAI API key for LLM analysis	✅
`base_url`	OpenAI API base URL	✅

LLM Integration

Analysis Prompt Structure

ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations  
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}

Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""

Repository Ranking System

Input: User requirements + repository analysis data
Processing: LLM evaluates relevance and ranks repositories
Output: Top 3 most relevant repositories in order

UI Components

Modern Design Features

Gradient Backgrounds: Linear gradients for visual appeal
Glassmorphism: Backdrop blur effects for modern look
Responsive Layout: Adaptive to different screen sizes
Interactive Elements: Hover effects and smooth transitions
Modal System: Repository action selection popups

Tab Organization

🤖 AI Assistant: Conversation-based discovery
📝 Smart Search: Direct input processing
🔬 Analysis & Results: Comprehensive analysis display
🔍 Repo Explorer: Interactive repository exploration

Advanced Features

Auto-Navigation

Automatic tab switching based on workflow state
Smooth scrolling to top on tab changes
Progressive disclosure of information

Error Handling

Graceful fallbacks for LLM failures
CSV update retry mechanisms
User-friendly error messages

Performance Optimizations

Parallel processing for multiple repositories
Progress tracking for long operations
Efficient file caching and cleanup

🔧 Configuration

Customizing Analysis

Modify CHATBOT_SYSTEM_PROMPT for different assistant behavior
Adjust repository search limits in search_top_spaces()
Configure analysis criteria in get_top_relevant_repos()

Adding File Types

# In analyzer.py
download_filtered_space_files(
    repo_id, 
    local_dir="repo_files", 
    file_extensions=['.py', '.md', '.txt', '.js', '.ts']  # Add more
)

🤝 Contributing

Fork the repository
Create a feature branch
Implement your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Gradio: For the amazing web interface framework
Hugging Face: For the incredible repository ecosystem
OpenAI: For powerful language model capabilities

Built with ❤️ for the open source community

🚀 Happy repository hunting! 🚀