File size: 9,396 Bytes
3617fb9
 
 
 
 
 
 
 
 
 
 
 
 
9a9c028
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
---
title: Agentic HF Analyzer
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: Recommends users which Repos/Spaces to look at
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# πŸš€ HF Repo Analyzer

An AI-powered Hugging Face repository discovery and analysis tool that helps you find, evaluate, and explore the best repositories for your specific needs.

![HF Repo Analyzer](https://img.shields.io/badge/Powered%20by-Gradio-orange)
![Python](https://img.shields.io/badge/Python-3.8+-blue)
![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Spaces-yellow)

## ✨ Features

- πŸ€– **AI Assistant**: Intelligent conversation-based repository discovery
- πŸ” **Smart Search**: Auto-detection of repository IDs vs. keywords
- πŸ“Š **Automated Analysis**: LLM-powered repository evaluation and ranking
- πŸ† **Top 3 Selection**: AI-curated most relevant repositories
- πŸ’¬ **Repository Explorer**: Interactive chat with repository contents
- 🎯 **Requirements Extraction**: Automatic keyword extraction from conversations
- πŸ“‹ **Comprehensive Results**: Detailed analysis with strengths, weaknesses, and specialities

## 🚦 Quick Start

### Prerequisites

- Python 3.8+
- OpenAI API key (for LLM analysis)
- Hugging Face access (for repository downloads)

### Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd Agentic_HF_Analyzer
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**
   ```bash
   export modal_api="your_openai_api_key"
   export base_url="your_openai_base_url"
   ```

4. **Run the application**
   ```bash
   python app.py
   ```

5. **Open your browser** to `http://localhost:7860`

## πŸ“– User Guide

### πŸ€– Using the AI Assistant (Recommended)

1. **Start a Conversation**
   - Navigate to the "πŸ€– AI Assistant" tab
   - Describe your project: "I'm building a chatbot for customer service"
   - The AI will ask clarifying questions about your needs

2. **Automatic Discovery**
   - When the AI has enough information, it will automatically:
     - Extract relevant keywords from your conversation
     - Search for matching repositories
     - Analyze and rank them by relevance

3. **Review Results**
   - The interface automatically switches to "πŸ”¬ Analysis & Results"
   - View the top 3 most relevant repositories
   - Browse all analyzed repositories with detailed insights

### πŸ“ Using Smart Search (Direct Input)

1. **Repository IDs**
   ```
   microsoft/DialoGPT-medium
   openai/whisper
   huggingface/transformers
   ```

2. **Keywords**
   ```
   text generation
   image classification
   sentiment analysis
   ```

3. **Mixed Input**
   - The system automatically detects the input type
   - Repository IDs (containing `/`) are processed directly
   - Keywords trigger automatic repository search

### πŸ”¬ Analyzing Results

- **Top 3 Repositories**: AI-selected most relevant based on your requirements
- **Detailed Analysis**: Strengths, weaknesses, specialities, and relevance ratings
- **Quick Actions**: Click repository names to visit or explore them
- **Repository Explorer**: Deep dive into individual repositories with AI chat

### πŸ” Repository Explorer

1. **Access Methods**:
   - Click "πŸ” Open in Repo Explorer" from repository actions
   - Manually enter repository ID in the Repo Explorer tab

2. **Features**:
   - Automatic repository loading and analysis
   - Interactive chat about repository contents
   - File structure exploration
   - Code analysis and explanations

## πŸ› οΈ Technical Architecture

### Core Components

```
app.py                 # Main Gradio interface and orchestration
β”œβ”€β”€ analyzer.py        # Repository analysis and LLM processing
β”œβ”€β”€ hf_utils.py       # Hugging Face API interactions
β”œβ”€β”€ chatbot_page.py   # AI assistant conversation logic
└── repo_explorer.py  # Repository exploration interface
```

### Key Features Implementation

#### πŸ€– AI Assistant
- **System Prompt**: Focused on requirements gathering, not recommendations
- **Auto-Extraction**: Detects conversation readiness for keyword extraction
- **Smart Processing**: Converts natural language to actionable search queries

#### πŸ” Smart Input Detection
```python
def is_repo_id_format(text: str) -> bool:
    # Detects if input contains repository IDs (with /) vs keywords
    lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
    slash_count = sum(1 for line in lines if '/' in line)
    return slash_count >= len(lines) * 0.5
```

#### πŸ† LLM-Powered Repository Ranking
- **Model**: `Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ`
- **Criteria**: Requirements matching, strengths, relevance rating, speciality alignment
- **Output**: JSON-formatted repository rankings

#### πŸ“Š Analysis Pipeline
1. **Download**: Repository files (`.py`, `.md`, `.txt`)
2. **Combine**: Merge files into single analyzable document
3. **Analyze**: LLM evaluation for strengths, weaknesses, specialities
4. **Rank**: User requirement-based relevance scoring
5. **Select**: Top 3 most relevant repositories

### Data Flow

```mermaid
graph TD
    A[User Input] --> B{Input Type?}
    B -->|Keywords| C[Repository Search]
    B -->|Repo IDs| D[Direct Processing]
    C --> E[Repository List]
    D --> E
    E --> F[Download & Analyze]
    F --> G[LLM Evaluation]
    G --> H[Ranking & Selection]
    H --> I[Results Display]
    I --> J[Repository Explorer]
```

### File Structure

```
πŸ“¦ Agentic_HF_Analyzer/
β”œβ”€β”€ πŸ“„ app.py                    # Main application
β”œβ”€β”€ πŸ“„ analyzer.py               # Repository analysis logic
β”œβ”€β”€ πŸ“„ hf_utils.py              # Hugging Face utilities
β”œβ”€β”€ πŸ“„ chatbot_page.py          # AI assistant functionality
β”œβ”€β”€ πŸ“„ repo_explorer.py         # Repository exploration
β”œβ”€β”€ πŸ“„ requirements.txt         # Python dependencies
β”œβ”€β”€ πŸ“„ README.md               # Documentation
β”œβ”€β”€ πŸ“„ repo_ids.csv            # Analysis results storage
└── πŸ“ repo_files/             # Temporary repository downloads
```

### Dependencies

```
gradio>=4.0.0          # Web interface framework
pandas>=1.5.0          # Data manipulation
regex>=2022.0.0        # Advanced regex operations
openai>=1.0.0          # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0       # HTTP requests
```

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `modal_api` | OpenAI API key for LLM analysis | βœ… |
| `base_url` | OpenAI API base URL | βœ… |

### LLM Integration

#### Analysis Prompt Structure
```python
ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations  
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}

Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""
```

#### Repository Ranking System
- **Input**: User requirements + repository analysis data
- **Processing**: LLM evaluates relevance and ranks repositories
- **Output**: Top 3 most relevant repositories in order

### UI Components

#### Modern Design Features
- **Gradient Backgrounds**: Linear gradients for visual appeal
- **Glassmorphism**: Backdrop blur effects for modern look
- **Responsive Layout**: Adaptive to different screen sizes
- **Interactive Elements**: Hover effects and smooth transitions
- **Modal System**: Repository action selection popups

#### Tab Organization
1. **πŸ€– AI Assistant**: Conversation-based discovery
2. **πŸ“ Smart Search**: Direct input processing
3. **πŸ”¬ Analysis & Results**: Comprehensive analysis display
4. **πŸ” Repo Explorer**: Interactive repository exploration

### Advanced Features

#### Auto-Navigation
- Automatic tab switching based on workflow state
- Smooth scrolling to top on tab changes
- Progressive disclosure of information

#### Error Handling
- Graceful fallbacks for LLM failures
- CSV update retry mechanisms
- User-friendly error messages

#### Performance Optimizations
- Parallel processing for multiple repositories
- Progress tracking for long operations
- Efficient file caching and cleanup

## πŸ”§ Configuration

### Customizing Analysis
- Modify `CHATBOT_SYSTEM_PROMPT` for different assistant behavior
- Adjust repository search limits in `search_top_spaces()`
- Configure analysis criteria in `get_top_relevant_repos()`

### Adding File Types
```python
# In analyzer.py
download_filtered_space_files(
    repo_id, 
    local_dir="repo_files", 
    file_extensions=['.py', '.md', '.txt', '.js', '.ts']  # Add more
)
```

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Implement your changes
4. Add tests if applicable
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- **Gradio**: For the amazing web interface framework
- **Hugging Face**: For the incredible repository ecosystem
- **OpenAI**: For powerful language model capabilities

---

<div align="center">
  <p>Built with ❀️ for the open source community</p>
  <p>πŸš€ Happy repository hunting! πŸš€</p>
</div>