HF_RepoSense / README.md
naman1102's picture
Update README.md
865e2c1 verified
---
title: HF RepoSense
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: AI-powered HuggingFace repository intelligence
tags:
- agent-demo-track
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# πŸš€ HF RepoSense : [Video demo](https://youtu.be/UqSRQy_8t-E)
HF Accounts for all contributors:
Naman Gupta: naman1102
Surya Boddu : MLconArtist
Lakshmi GirijaΒ Dhulipati: dlgirija
Mohamed Ifreen Seyed Ibrahim: Mohamed-Ifreen
**AI-powered HuggingFace repository intelligence**
An intelligent AI system for discovering, analyzing, and evaluating HuggingFace repositories. HF RepoSense uses advanced AI to understand your requirements, search for relevant repositories, and provide comprehensive analysis with personalized recommendations.
![HF RepoSense](https://img.shields.io/badge/Powered%20by-Gradio-orange)
![Python](https://img.shields.io/badge/Python-3.8+-blue)
![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Spaces-yellow)
## ✨ Features
- πŸ€– **AI Assistant**: Intelligent conversation-based repository discovery
- πŸ” **Smart Search**: Auto-detection of repository IDs vs. keywords
- πŸ“Š **Automated Analysis**: LLM-powered repository evaluation and ranking
- πŸ† **Top 3 Selection**: AI-curated most relevant repositories
- πŸ’¬ **Repository Explorer**: Interactive chat with repository contents
- 🎯 **Requirements Extraction**: Automatic keyword extraction from conversations
- πŸ“‹ **Comprehensive Results**: Detailed analysis with strengths, weaknesses, and specialities
## 🚦 Quick Start
### Prerequisites
- Python 3.8+
- OpenAI API key (for LLM analysis)
- Hugging Face access (for repository downloads)
### Installation
1. **Clone the repository**
```bash
git clone <repository-url>
cd HF-RepoSense
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Set up environment variables**
```bash
export modal_api="your_openai_api_key"
export base_url="your_openai_base_url"
```
4. **Run the application**
```bash
python app.py
```
5. **Open your browser** to `http://localhost:7860`
## πŸ“– User Guide
### πŸ€– Using the AI Assistant (Recommended)
1. **Start a Conversation**
- Navigate to the "πŸ€– AI Assistant" tab
- Describe your project: "I'm building a chatbot for customer service"
- The AI will ask clarifying questions about your needs
2. **Automatic Discovery**
- When the AI has enough information, it will automatically:
- Extract relevant keywords from your conversation
- Search for matching repositories
- Analyze and rank them by relevance
3. **Review Results**
- The interface automatically switches to "πŸ”¬ Analysis & Results"
- View the top 3 most relevant repositories
- Browse all analyzed repositories with detailed insights
### πŸ“ Using Smart Search (Direct Input)
1. **Repository IDs**
```
microsoft/DialoGPT-medium
openai/whisper
huggingface/transformers
```
2. **Keywords**
```
text generation
image classification
sentiment analysis
```
3. **Mixed Input**
- The system automatically detects the input type
- Repository IDs (containing `/`) are processed directly
- Keywords trigger automatic repository search
### πŸ”¬ Analyzing Results
- **Top 3 Repositories**: AI-selected most relevant based on your requirements
- **Detailed Analysis**: Strengths, weaknesses, specialities, and relevance ratings
- **Quick Actions**: Click repository names to visit or explore them
- **Repository Explorer**: Deep dive into individual repositories with AI chat
### πŸ” Repository Explorer
1. **Access Methods**:
- Click "πŸ” Open in Repo Explorer" from repository actions
- Manually enter repository ID in the Repo Explorer tab
2. **Features**:
- Automatic repository loading and analysis
- Interactive chat about repository contents
- File structure exploration
- Code analysis and explanations
## πŸ› οΈ Technical Architecture
### Core Components
```
app.py # Main Gradio interface and orchestration
β”œβ”€β”€ analyzer.py # Repository analysis and LLM processing
β”œβ”€β”€ hf_utils.py # Hugging Face API interactions
β”œβ”€β”€ chatbot_page.py # AI assistant conversation logic
└── repo_explorer.py # Repository exploration interface
```
### Key Features Implementation
#### πŸ€– AI Assistant
- **System Prompt**: Focused on requirements gathering, not recommendations
- **Auto-Extraction**: Detects conversation readiness for keyword extraction
- **Smart Processing**: Converts natural language to actionable search queries
#### πŸ” Smart Input Detection
```python
def is_repo_id_format(text: str) -> bool:
# Detects if input contains repository IDs (with /) vs keywords
lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
slash_count = sum(1 for line in lines if '/' in line)
return slash_count >= len(lines) * 0.5
```
#### πŸ† LLM-Powered Repository Ranking
- **Model**: `Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ`
- **Criteria**: Requirements matching, strengths, relevance rating, speciality alignment
- **Output**: JSON-formatted repository rankings
#### πŸ“Š Analysis Pipeline
1. **Download**: Repository files (`.py`, `.md`, `.txt`)
2. **Combine**: Merge files into single analyzable document
3. **Analyze**: LLM evaluation for strengths, weaknesses, specialities
4. **Rank**: User requirement-based relevance scoring
5. **Select**: Top 3 most relevant repositories
### Data Flow
```mermaid
graph TD
A[User Input] --> B{Input Type?}
B -->|Keywords| C[Repository Search]
B -->|Repo IDs| D[Direct Processing]
C --> E[Repository List]
D --> E
E --> F[Download & Analyze]
F --> G[LLM Evaluation]
G --> H[Ranking & Selection]
H --> I[Results Display]
I --> J[Repository Explorer]
```
### File Structure
```
πŸ“¦ HF-RepoSense/
β”œβ”€β”€ πŸ“„ app.py # Main application
β”œβ”€β”€ πŸ“„ analyzer.py # Repository analysis logic
β”œβ”€β”€ πŸ“„ hf_utils.py # Hugging Face utilities
β”œβ”€β”€ πŸ“„ chatbot_page.py # AI assistant functionality
β”œβ”€β”€ πŸ“„ repo_explorer.py # Repository exploration
β”œβ”€β”€ πŸ“„ requirements.txt # Python dependencies
β”œβ”€β”€ πŸ“„ README.md # Documentation
β”œβ”€β”€ πŸ“„ repo_ids.csv # Analysis results storage
└── πŸ“ repo_files/ # Temporary repository downloads
```
### Dependencies
```
gradio>=4.0.0 # Web interface framework
pandas>=1.5.0 # Data manipulation
regex>=2022.0.0 # Advanced regex operations
openai>=1.0.0 # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0 # HTTP requests
```
### Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `modal_api` | OpenAI API key for LLM analysis | βœ… |
| `base_url` | OpenAI API base URL | βœ… |
### LLM Integration
#### Analysis Prompt Structure
```python
ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}
Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""
```
#### Repository Ranking System
- **Input**: User requirements + repository analysis data
- **Processing**: LLM evaluates relevance and ranks repositories
- **Output**: Top 3 most relevant repositories in order
### UI Components
#### Modern Design Features
- **Gradient Backgrounds**: Linear gradients for visual appeal
- **Glassmorphism**: Backdrop blur effects for modern look
- **Responsive Layout**: Adaptive to different screen sizes
- **Interactive Elements**: Hover effects and smooth transitions
- **Modal System**: Repository action selection popups
#### Tab Organization
1. **πŸ€– AI Assistant**: Conversation-based discovery
2. **πŸ“ Smart Search**: Direct input processing
3. **πŸ”¬ Analysis & Results**: Comprehensive analysis display
4. **πŸ” Repo Explorer**: Interactive repository exploration
### Advanced Features
#### Auto-Navigation
- Automatic tab switching based on workflow state
- Smooth scrolling to top on tab changes
- Progressive disclosure of information
#### Error Handling
- Graceful fallbacks for LLM failures
- CSV update retry mechanisms
- User-friendly error messages
#### Performance Optimizations
- Parallel processing for multiple repositories
- Progress tracking for long operations
- Efficient file caching and cleanup
## πŸ”§ Configuration
### Customizing Analysis
- Modify `CHATBOT_SYSTEM_PROMPT` for different assistant behavior
- Adjust repository search limits in `search_top_spaces()`
- Configure analysis criteria in `get_top_relevant_repos()`
### Adding File Types
```python
# In analyzer.py
download_filtered_space_files(
repo_id,
local_dir="repo_files",
file_extensions=['.py', '.md', '.txt', '.js', '.ts'] # Add more
)
```
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Implement your changes
4. Add tests if applicable
5. Submit a pull request
## πŸ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
## πŸ™ Acknowledgments
- **Gradio**: For the amazing web interface framework
- **Hugging Face**: For the incredible repository ecosystem
- **OpenAI**: For powerful language model capabilities
---
<div align="center">
<p>Built with ❀️ for the open source community</p>
<p>πŸš€ Happy repository hunting! πŸš€</p>
</div>