|
--- |
|
title: HF RepoSense |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: 5.32.1 |
|
app_file: app.py |
|
pinned: false |
|
short_description: AI-powered HuggingFace repository intelligence |
|
tags: |
|
- agent-demo-track |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# π HF RepoSense : [Video demo](https://youtu.be/UqSRQy_8t-E) |
|
|
|
HF Accounts for all contributors: |
|
Naman Gupta: naman1102 |
|
Surya Boddu : MLconArtist |
|
Lakshmi GirijaΒ Dhulipati: dlgirija |
|
Mohamed Ifreen Seyed Ibrahim: Mohamed-Ifreen |
|
|
|
**AI-powered HuggingFace repository intelligence** |
|
|
|
An intelligent AI system for discovering, analyzing, and evaluating HuggingFace repositories. HF RepoSense uses advanced AI to understand your requirements, search for relevant repositories, and provide comprehensive analysis with personalized recommendations. |
|
|
|
 |
|
 |
|
 |
|
|
|
## β¨ Features |
|
|
|
- π€ **AI Assistant**: Intelligent conversation-based repository discovery |
|
- π **Smart Search**: Auto-detection of repository IDs vs. keywords |
|
- π **Automated Analysis**: LLM-powered repository evaluation and ranking |
|
- π **Top 3 Selection**: AI-curated most relevant repositories |
|
- π¬ **Repository Explorer**: Interactive chat with repository contents |
|
- π― **Requirements Extraction**: Automatic keyword extraction from conversations |
|
- π **Comprehensive Results**: Detailed analysis with strengths, weaknesses, and specialities |
|
|
|
|
|
## π¦ Quick Start |
|
|
|
### Prerequisites |
|
|
|
- Python 3.8+ |
|
- OpenAI API key (for LLM analysis) |
|
- Hugging Face access (for repository downloads) |
|
|
|
### Installation |
|
|
|
1. **Clone the repository** |
|
```bash |
|
git clone <repository-url> |
|
cd HF-RepoSense |
|
``` |
|
|
|
2. **Install dependencies** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. **Set up environment variables** |
|
```bash |
|
export modal_api="your_openai_api_key" |
|
export base_url="your_openai_base_url" |
|
``` |
|
|
|
4. **Run the application** |
|
```bash |
|
python app.py |
|
``` |
|
|
|
5. **Open your browser** to `http://localhost:7860` |
|
|
|
## π User Guide |
|
|
|
### π€ Using the AI Assistant (Recommended) |
|
|
|
1. **Start a Conversation** |
|
- Navigate to the "π€ AI Assistant" tab |
|
- Describe your project: "I'm building a chatbot for customer service" |
|
- The AI will ask clarifying questions about your needs |
|
|
|
2. **Automatic Discovery** |
|
- When the AI has enough information, it will automatically: |
|
- Extract relevant keywords from your conversation |
|
- Search for matching repositories |
|
- Analyze and rank them by relevance |
|
|
|
3. **Review Results** |
|
- The interface automatically switches to "π¬ Analysis & Results" |
|
- View the top 3 most relevant repositories |
|
- Browse all analyzed repositories with detailed insights |
|
|
|
### π Using Smart Search (Direct Input) |
|
|
|
1. **Repository IDs** |
|
``` |
|
microsoft/DialoGPT-medium |
|
openai/whisper |
|
huggingface/transformers |
|
``` |
|
|
|
2. **Keywords** |
|
``` |
|
text generation |
|
image classification |
|
sentiment analysis |
|
``` |
|
|
|
3. **Mixed Input** |
|
- The system automatically detects the input type |
|
- Repository IDs (containing `/`) are processed directly |
|
- Keywords trigger automatic repository search |
|
|
|
### π¬ Analyzing Results |
|
|
|
- **Top 3 Repositories**: AI-selected most relevant based on your requirements |
|
- **Detailed Analysis**: Strengths, weaknesses, specialities, and relevance ratings |
|
- **Quick Actions**: Click repository names to visit or explore them |
|
- **Repository Explorer**: Deep dive into individual repositories with AI chat |
|
|
|
### π Repository Explorer |
|
|
|
1. **Access Methods**: |
|
- Click "π Open in Repo Explorer" from repository actions |
|
- Manually enter repository ID in the Repo Explorer tab |
|
|
|
2. **Features**: |
|
- Automatic repository loading and analysis |
|
- Interactive chat about repository contents |
|
- File structure exploration |
|
- Code analysis and explanations |
|
|
|
## π οΈ Technical Architecture |
|
|
|
### Core Components |
|
|
|
``` |
|
app.py # Main Gradio interface and orchestration |
|
βββ analyzer.py # Repository analysis and LLM processing |
|
βββ hf_utils.py # Hugging Face API interactions |
|
βββ chatbot_page.py # AI assistant conversation logic |
|
βββ repo_explorer.py # Repository exploration interface |
|
``` |
|
|
|
### Key Features Implementation |
|
|
|
#### π€ AI Assistant |
|
- **System Prompt**: Focused on requirements gathering, not recommendations |
|
- **Auto-Extraction**: Detects conversation readiness for keyword extraction |
|
- **Smart Processing**: Converts natural language to actionable search queries |
|
|
|
#### π Smart Input Detection |
|
```python |
|
def is_repo_id_format(text: str) -> bool: |
|
# Detects if input contains repository IDs (with /) vs keywords |
|
lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()] |
|
slash_count = sum(1 for line in lines if '/' in line) |
|
return slash_count >= len(lines) * 0.5 |
|
``` |
|
|
|
#### π LLM-Powered Repository Ranking |
|
- **Model**: `Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ` |
|
- **Criteria**: Requirements matching, strengths, relevance rating, speciality alignment |
|
- **Output**: JSON-formatted repository rankings |
|
|
|
#### π Analysis Pipeline |
|
1. **Download**: Repository files (`.py`, `.md`, `.txt`) |
|
2. **Combine**: Merge files into single analyzable document |
|
3. **Analyze**: LLM evaluation for strengths, weaknesses, specialities |
|
4. **Rank**: User requirement-based relevance scoring |
|
5. **Select**: Top 3 most relevant repositories |
|
|
|
### Data Flow |
|
|
|
```mermaid |
|
graph TD |
|
A[User Input] --> B{Input Type?} |
|
B -->|Keywords| C[Repository Search] |
|
B -->|Repo IDs| D[Direct Processing] |
|
C --> E[Repository List] |
|
D --> E |
|
E --> F[Download & Analyze] |
|
F --> G[LLM Evaluation] |
|
G --> H[Ranking & Selection] |
|
H --> I[Results Display] |
|
I --> J[Repository Explorer] |
|
``` |
|
|
|
### File Structure |
|
|
|
``` |
|
π¦ HF-RepoSense/ |
|
βββ π app.py # Main application |
|
βββ π analyzer.py # Repository analysis logic |
|
βββ π hf_utils.py # Hugging Face utilities |
|
βββ π chatbot_page.py # AI assistant functionality |
|
βββ π repo_explorer.py # Repository exploration |
|
βββ π requirements.txt # Python dependencies |
|
βββ π README.md # Documentation |
|
βββ π repo_ids.csv # Analysis results storage |
|
βββ π repo_files/ # Temporary repository downloads |
|
``` |
|
|
|
### Dependencies |
|
|
|
``` |
|
gradio>=4.0.0 # Web interface framework |
|
pandas>=1.5.0 # Data manipulation |
|
regex>=2022.0.0 # Advanced regex operations |
|
openai>=1.0.0 # LLM API access |
|
huggingface_hub>=0.16.0 # HF repository access |
|
requests>=2.28.0 # HTTP requests |
|
``` |
|
|
|
### Environment Variables |
|
|
|
| Variable | Description | Required | |
|
|----------|-------------|----------| |
|
| `modal_api` | OpenAI API key for LLM analysis | β
| |
|
| `base_url` | OpenAI API base URL | β
| |
|
|
|
### LLM Integration |
|
|
|
#### Analysis Prompt Structure |
|
```python |
|
ANALYSIS_PROMPT = """ |
|
Analyze this repository and provide: |
|
1. Strengths and capabilities |
|
2. Potential weaknesses or limitations |
|
3. Primary speciality/use case |
|
4. Relevance rating for: {user_requirements} |
|
|
|
Return valid JSON with: strength, weaknesses, speciality, relevance rating |
|
""" |
|
``` |
|
|
|
#### Repository Ranking System |
|
- **Input**: User requirements + repository analysis data |
|
- **Processing**: LLM evaluates relevance and ranks repositories |
|
- **Output**: Top 3 most relevant repositories in order |
|
|
|
### UI Components |
|
|
|
#### Modern Design Features |
|
- **Gradient Backgrounds**: Linear gradients for visual appeal |
|
- **Glassmorphism**: Backdrop blur effects for modern look |
|
- **Responsive Layout**: Adaptive to different screen sizes |
|
- **Interactive Elements**: Hover effects and smooth transitions |
|
- **Modal System**: Repository action selection popups |
|
|
|
#### Tab Organization |
|
1. **π€ AI Assistant**: Conversation-based discovery |
|
2. **π Smart Search**: Direct input processing |
|
3. **π¬ Analysis & Results**: Comprehensive analysis display |
|
4. **π Repo Explorer**: Interactive repository exploration |
|
|
|
### Advanced Features |
|
|
|
#### Auto-Navigation |
|
- Automatic tab switching based on workflow state |
|
- Smooth scrolling to top on tab changes |
|
- Progressive disclosure of information |
|
|
|
#### Error Handling |
|
- Graceful fallbacks for LLM failures |
|
- CSV update retry mechanisms |
|
- User-friendly error messages |
|
|
|
#### Performance Optimizations |
|
- Parallel processing for multiple repositories |
|
- Progress tracking for long operations |
|
- Efficient file caching and cleanup |
|
|
|
## π§ Configuration |
|
|
|
### Customizing Analysis |
|
- Modify `CHATBOT_SYSTEM_PROMPT` for different assistant behavior |
|
- Adjust repository search limits in `search_top_spaces()` |
|
- Configure analysis criteria in `get_top_relevant_repos()` |
|
|
|
### Adding File Types |
|
```python |
|
# In analyzer.py |
|
download_filtered_space_files( |
|
repo_id, |
|
local_dir="repo_files", |
|
file_extensions=['.py', '.md', '.txt', '.js', '.ts'] # Add more |
|
) |
|
``` |
|
|
|
## π€ Contributing |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Implement your changes |
|
4. Add tests if applicable |
|
5. Submit a pull request |
|
|
|
## π License |
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
## π Acknowledgments |
|
|
|
- **Gradio**: For the amazing web interface framework |
|
- **Hugging Face**: For the incredible repository ecosystem |
|
- **OpenAI**: For powerful language model capabilities |
|
|
|
--- |
|
|
|
<div align="center"> |
|
<p>Built with β€οΈ for the open source community</p> |
|
<p>π Happy repository hunting! π</p> |
|
</div> |
|
|