naman1102 commited on
Commit
9a9c028
Β·
1 Parent(s): 1dbc8cb
Files changed (2) hide show
  1. README.md +295 -0
  2. app.py +159 -2
README.md CHANGED
@@ -11,3 +11,298 @@ short_description: Recommends users which Repos/Spaces to look at
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # πŸš€ HF Repo Analyzer
16
+
17
+ An AI-powered Hugging Face repository discovery and analysis tool that helps you find, evaluate, and explore the best repositories for your specific needs.
18
+
19
+ ![HF Repo Analyzer](https://img.shields.io/badge/Powered%20by-Gradio-orange)
20
+ ![Python](https://img.shields.io/badge/Python-3.8+-blue)
21
+ ![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Spaces-yellow)
22
+
23
+ ## ✨ Features
24
+
25
+ - πŸ€– **AI Assistant**: Intelligent conversation-based repository discovery
26
+ - πŸ” **Smart Search**: Auto-detection of repository IDs vs. keywords
27
+ - πŸ“Š **Automated Analysis**: LLM-powered repository evaluation and ranking
28
+ - πŸ† **Top 3 Selection**: AI-curated most relevant repositories
29
+ - πŸ’¬ **Repository Explorer**: Interactive chat with repository contents
30
+ - 🎯 **Requirements Extraction**: Automatic keyword extraction from conversations
31
+ - πŸ“‹ **Comprehensive Results**: Detailed analysis with strengths, weaknesses, and specialities
32
+
33
+ ## 🚦 Quick Start
34
+
35
+ ### Prerequisites
36
+
37
+ - Python 3.8+
38
+ - OpenAI API key (for LLM analysis)
39
+ - Hugging Face access (for repository downloads)
40
+
41
+ ### Installation
42
+
43
+ 1. **Clone the repository**
44
+ ```bash
45
+ git clone <repository-url>
46
+ cd Agentic_HF_Analyzer
47
+ ```
48
+
49
+ 2. **Install dependencies**
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ 3. **Set up environment variables**
55
+ ```bash
56
+ export modal_api="your_openai_api_key"
57
+ export base_url="your_openai_base_url"
58
+ ```
59
+
60
+ 4. **Run the application**
61
+ ```bash
62
+ python app.py
63
+ ```
64
+
65
+ 5. **Open your browser** to `http://localhost:7860`
66
+
67
+ ## πŸ“– User Guide
68
+
69
+ ### πŸ€– Using the AI Assistant (Recommended)
70
+
71
+ 1. **Start a Conversation**
72
+ - Navigate to the "πŸ€– AI Assistant" tab
73
+ - Describe your project: "I'm building a chatbot for customer service"
74
+ - The AI will ask clarifying questions about your needs
75
+
76
+ 2. **Automatic Discovery**
77
+ - When the AI has enough information, it will automatically:
78
+ - Extract relevant keywords from your conversation
79
+ - Search for matching repositories
80
+ - Analyze and rank them by relevance
81
+
82
+ 3. **Review Results**
83
+ - The interface automatically switches to "πŸ”¬ Analysis & Results"
84
+ - View the top 3 most relevant repositories
85
+ - Browse all analyzed repositories with detailed insights
86
+
87
+ ### πŸ“ Using Smart Search (Direct Input)
88
+
89
+ 1. **Repository IDs**
90
+ ```
91
+ microsoft/DialoGPT-medium
92
+ openai/whisper
93
+ huggingface/transformers
94
+ ```
95
+
96
+ 2. **Keywords**
97
+ ```
98
+ text generation
99
+ image classification
100
+ sentiment analysis
101
+ ```
102
+
103
+ 3. **Mixed Input**
104
+ - The system automatically detects the input type
105
+ - Repository IDs (containing `/`) are processed directly
106
+ - Keywords trigger automatic repository search
107
+
108
+ ### πŸ”¬ Analyzing Results
109
+
110
+ - **Top 3 Repositories**: AI-selected most relevant based on your requirements
111
+ - **Detailed Analysis**: Strengths, weaknesses, specialities, and relevance ratings
112
+ - **Quick Actions**: Click repository names to visit or explore them
113
+ - **Repository Explorer**: Deep dive into individual repositories with AI chat
114
+
115
+ ### πŸ” Repository Explorer
116
+
117
+ 1. **Access Methods**:
118
+ - Click "πŸ” Open in Repo Explorer" from repository actions
119
+ - Manually enter repository ID in the Repo Explorer tab
120
+
121
+ 2. **Features**:
122
+ - Automatic repository loading and analysis
123
+ - Interactive chat about repository contents
124
+ - File structure exploration
125
+ - Code analysis and explanations
126
+
127
+ ## πŸ› οΈ Technical Architecture
128
+
129
+ ### Core Components
130
+
131
+ ```
132
+ app.py # Main Gradio interface and orchestration
133
+ β”œβ”€β”€ analyzer.py # Repository analysis and LLM processing
134
+ β”œβ”€β”€ hf_utils.py # Hugging Face API interactions
135
+ β”œβ”€β”€ chatbot_page.py # AI assistant conversation logic
136
+ └── repo_explorer.py # Repository exploration interface
137
+ ```
138
+
139
+ ### Key Features Implementation
140
+
141
+ #### πŸ€– AI Assistant
142
+ - **System Prompt**: Focused on requirements gathering, not recommendations
143
+ - **Auto-Extraction**: Detects conversation readiness for keyword extraction
144
+ - **Smart Processing**: Converts natural language to actionable search queries
145
+
146
+ #### πŸ” Smart Input Detection
147
+ ```python
148
+ def is_repo_id_format(text: str) -> bool:
149
+ # Detects if input contains repository IDs (with /) vs keywords
150
+ lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
151
+ slash_count = sum(1 for line in lines if '/' in line)
152
+ return slash_count >= len(lines) * 0.5
153
+ ```
154
+
155
+ #### πŸ† LLM-Powered Repository Ranking
156
+ - **Model**: `Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ`
157
+ - **Criteria**: Requirements matching, strengths, relevance rating, speciality alignment
158
+ - **Output**: JSON-formatted repository rankings
159
+
160
+ #### πŸ“Š Analysis Pipeline
161
+ 1. **Download**: Repository files (`.py`, `.md`, `.txt`)
162
+ 2. **Combine**: Merge files into single analyzable document
163
+ 3. **Analyze**: LLM evaluation for strengths, weaknesses, specialities
164
+ 4. **Rank**: User requirement-based relevance scoring
165
+ 5. **Select**: Top 3 most relevant repositories
166
+
167
+ ### Data Flow
168
+
169
+ ```mermaid
170
+ graph TD
171
+ A[User Input] --> B{Input Type?}
172
+ B -->|Keywords| C[Repository Search]
173
+ B -->|Repo IDs| D[Direct Processing]
174
+ C --> E[Repository List]
175
+ D --> E
176
+ E --> F[Download & Analyze]
177
+ F --> G[LLM Evaluation]
178
+ G --> H[Ranking & Selection]
179
+ H --> I[Results Display]
180
+ I --> J[Repository Explorer]
181
+ ```
182
+
183
+ ### File Structure
184
+
185
+ ```
186
+ πŸ“¦ Agentic_HF_Analyzer/
187
+ β”œβ”€β”€ πŸ“„ app.py # Main application
188
+ β”œβ”€β”€ πŸ“„ analyzer.py # Repository analysis logic
189
+ β”œβ”€β”€ πŸ“„ hf_utils.py # Hugging Face utilities
190
+ β”œβ”€β”€ πŸ“„ chatbot_page.py # AI assistant functionality
191
+ β”œβ”€β”€ πŸ“„ repo_explorer.py # Repository exploration
192
+ β”œβ”€β”€ πŸ“„ requirements.txt # Python dependencies
193
+ β”œβ”€β”€ πŸ“„ README.md # Documentation
194
+ β”œβ”€β”€ πŸ“„ repo_ids.csv # Analysis results storage
195
+ └── πŸ“ repo_files/ # Temporary repository downloads
196
+ ```
197
+
198
+ ### Dependencies
199
+
200
+ ```
201
+ gradio>=4.0.0 # Web interface framework
202
+ pandas>=1.5.0 # Data manipulation
203
+ regex>=2022.0.0 # Advanced regex operations
204
+ openai>=1.0.0 # LLM API access
205
+ huggingface_hub>=0.16.0 # HF repository access
206
+ requests>=2.28.0 # HTTP requests
207
+ ```
208
+
209
+ ### Environment Variables
210
+
211
+ | Variable | Description | Required |
212
+ |----------|-------------|----------|
213
+ | `modal_api` | OpenAI API key for LLM analysis | βœ… |
214
+ | `base_url` | OpenAI API base URL | βœ… |
215
+
216
+ ### LLM Integration
217
+
218
+ #### Analysis Prompt Structure
219
+ ```python
220
+ ANALYSIS_PROMPT = """
221
+ Analyze this repository and provide:
222
+ 1. Strengths and capabilities
223
+ 2. Potential weaknesses or limitations
224
+ 3. Primary speciality/use case
225
+ 4. Relevance rating for: {user_requirements}
226
+
227
+ Return valid JSON with: strength, weaknesses, speciality, relevance rating
228
+ """
229
+ ```
230
+
231
+ #### Repository Ranking System
232
+ - **Input**: User requirements + repository analysis data
233
+ - **Processing**: LLM evaluates relevance and ranks repositories
234
+ - **Output**: Top 3 most relevant repositories in order
235
+
236
+ ### UI Components
237
+
238
+ #### Modern Design Features
239
+ - **Gradient Backgrounds**: Linear gradients for visual appeal
240
+ - **Glassmorphism**: Backdrop blur effects for modern look
241
+ - **Responsive Layout**: Adaptive to different screen sizes
242
+ - **Interactive Elements**: Hover effects and smooth transitions
243
+ - **Modal System**: Repository action selection popups
244
+
245
+ #### Tab Organization
246
+ 1. **πŸ€– AI Assistant**: Conversation-based discovery
247
+ 2. **πŸ“ Smart Search**: Direct input processing
248
+ 3. **πŸ”¬ Analysis & Results**: Comprehensive analysis display
249
+ 4. **πŸ” Repo Explorer**: Interactive repository exploration
250
+
251
+ ### Advanced Features
252
+
253
+ #### Auto-Navigation
254
+ - Automatic tab switching based on workflow state
255
+ - Smooth scrolling to top on tab changes
256
+ - Progressive disclosure of information
257
+
258
+ #### Error Handling
259
+ - Graceful fallbacks for LLM failures
260
+ - CSV update retry mechanisms
261
+ - User-friendly error messages
262
+
263
+ #### Performance Optimizations
264
+ - Parallel processing for multiple repositories
265
+ - Progress tracking for long operations
266
+ - Efficient file caching and cleanup
267
+
268
+ ## πŸ”§ Configuration
269
+
270
+ ### Customizing Analysis
271
+ - Modify `CHATBOT_SYSTEM_PROMPT` for different assistant behavior
272
+ - Adjust repository search limits in `search_top_spaces()`
273
+ - Configure analysis criteria in `get_top_relevant_repos()`
274
+
275
+ ### Adding File Types
276
+ ```python
277
+ # In analyzer.py
278
+ download_filtered_space_files(
279
+ repo_id,
280
+ local_dir="repo_files",
281
+ file_extensions=['.py', '.md', '.txt', '.js', '.ts'] # Add more
282
+ )
283
+ ```
284
+
285
+ ## 🀝 Contributing
286
+
287
+ 1. Fork the repository
288
+ 2. Create a feature branch
289
+ 3. Implement your changes
290
+ 4. Add tests if applicable
291
+ 5. Submit a pull request
292
+
293
+ ## πŸ“„ License
294
+
295
+ This project is licensed under the MIT License - see the LICENSE file for details.
296
+
297
+ ## πŸ™ Acknowledgments
298
+
299
+ - **Gradio**: For the amazing web interface framework
300
+ - **Hugging Face**: For the incredible repository ecosystem
301
+ - **OpenAI**: For powerful language model capabilities
302
+
303
+ ---
304
+
305
+ <div align="center">
306
+ <p>Built with ❀️ for the open source community</p>
307
+ <p>πŸš€ Happy repository hunting! πŸš€</p>
308
+ </div>
app.py CHANGED
@@ -603,15 +603,161 @@ def create_ui() -> gr.Blocks:
603
  """
604
  )
605
 
606
- # Global Reset Button - visible on all tabs
607
  with gr.Row():
608
- with gr.Column(scale=4):
609
  pass
 
 
610
  with gr.Column(scale=1):
611
  reset_all_btn = gr.Button("πŸ”„ Reset Everything", variant="stop", size="lg")
612
  with gr.Column(scale=1):
613
  pass
614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
615
  with gr.Tabs() as tabs:
616
  # --- AI Assistant Tab (moved to first) ---
617
  with gr.TabItem("πŸ€– AI Assistant", id="chatbot_tab"):
@@ -1374,6 +1520,17 @@ def create_ui() -> gr.Blocks:
1374
  outputs=[repo_ids_state, current_repo_idx_state, user_requirements_state, df_output, top_repos_df, top_repos_section, chatbot, status_box_input, current_requirements_display, extracted_keywords_output]
1375
  )
1376
 
 
 
 
 
 
 
 
 
 
 
 
1377
  return app
1378
 
1379
  if __name__ == "__main__":
 
603
  """
604
  )
605
 
606
+ # Global Reset and Help Buttons - visible on all tabs
607
  with gr.Row():
608
+ with gr.Column(scale=3):
609
  pass
610
+ with gr.Column(scale=1):
611
+ help_btn = gr.Button("❓ Help", variant="secondary", size="lg")
612
  with gr.Column(scale=1):
613
  reset_all_btn = gr.Button("πŸ”„ Reset Everything", variant="stop", size="lg")
614
  with gr.Column(scale=1):
615
  pass
616
 
617
+ # Help Modal - visible when help button is clicked
618
+ with gr.Row():
619
+ with gr.Column():
620
+ help_modal = gr.Column(visible=False)
621
+ with help_modal:
622
+ gr.Markdown(
623
+ """
624
+ <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 16px; text-align: center; margin-bottom: 20px;">
625
+ <h2 style="color: white; margin: 0; font-size: 2rem;">πŸ“š How to Use HF Repo Analyzer</h2>
626
+ <p style="color: rgba(255,255,255,0.9); margin: 10px 0 0 0;">Step-by-step guide to find and analyze repositories</p>
627
+ </div>
628
+ """
629
+ )
630
+
631
+ with gr.Accordion("πŸš€ Method 1: AI Assistant (Recommended)", open=True):
632
+ gr.Markdown(
633
+ """
634
+ ### **Step 1: Start Conversation**
635
+ - Go to the **πŸ€– AI Assistant** tab
636
+ - Describe your project: *"I'm building a sentiment analysis tool"*
637
+ - The AI will ask clarifying questions about your needs
638
+
639
+ ### **Step 2: Let AI Work Its Magic**
640
+ - Answer the AI's questions about your requirements
641
+ - When ready, the AI will automatically:
642
+ - Extract keywords from your conversation
643
+ - Search for matching repositories
644
+ - Analyze and rank them by relevance
645
+
646
+ ### **Step 3: Review Results**
647
+ - Interface automatically switches to **πŸ”¬ Analysis & Results**
648
+ - View **Top 3** most relevant repositories
649
+ - Browse detailed analysis with strengths/weaknesses
650
+ - Click repository names to visit or explore them
651
+
652
+ **πŸ’‘ Tip**: This method gives the best personalized results!
653
+ """
654
+ )
655
+
656
+ with gr.Accordion("πŸ“ Method 2: Smart Search (Direct Input)", open=False):
657
+ gr.Markdown(
658
+ """
659
+ ### **Step 1: Choose Input Type**
660
+ Go to **πŸ“ Smart Search** tab and enter either:
661
+
662
+ **Repository IDs** (with `/`):
663
+ ```
664
+ microsoft/DialoGPT-medium
665
+ openai/whisper
666
+ huggingface/transformers
667
+ ```
668
+
669
+ **Keywords** (no `/`):
670
+ ```
671
+ text generation
672
+ image classification
673
+ sentiment analysis
674
+ ```
675
+
676
+ ### **Step 2: Auto-Detection & Processing**
677
+ - System automatically detects input type
678
+ - Repository IDs β†’ Direct analysis
679
+ - Keywords β†’ Search + analysis
680
+ - Enable **πŸš€ Auto-analyze** for instant results
681
+
682
+ ### **Step 3: Get Results**
683
+ - Click **πŸ” Find & Process Repositories**
684
+ - View results in **πŸ”¬ Analysis & Results** tab
685
+ """
686
+ )
687
+
688
+ with gr.Accordion("πŸ”¬ Understanding Analysis Results", open=False):
689
+ gr.Markdown(
690
+ """
691
+ ### **πŸ† Top 3 Repositories**
692
+ - AI-selected most relevant for your needs
693
+ - Ranked by requirement matching and quality
694
+
695
+ ### **πŸ“Š Detailed Analysis Table**
696
+ - **Repository**: Click names to visit/explore
697
+ - **Strengths**: Key capabilities and advantages
698
+ - **Weaknesses**: Limitations and considerations
699
+ - **Speciality**: Primary use case and domain
700
+ - **Relevance**: How well it matches your needs
701
+
702
+ ### **πŸ”— Quick Actions**
703
+ Click repository names to:
704
+ - **🌐 Visit Hugging Face Space**: See live demo
705
+ - **πŸ” Open in Repo Explorer**: Deep dive analysis
706
+ """
707
+ )
708
+
709
+ with gr.Accordion("πŸ” Repository Explorer Deep Dive", open=False):
710
+ gr.Markdown(
711
+ """
712
+ ### **Access Repository Explorer**
713
+ - Click **πŸ” Open in Repo Explorer** from results
714
+ - Or manually enter repo ID in **πŸ” Repo Explorer** tab
715
+
716
+ ### **Features Available**
717
+ - **Auto-loading**: Repository content analysis
718
+ - **AI Chat**: Ask questions about the code
719
+ - **File Exploration**: Browse repository structure
720
+ - **Code Analysis**: Get explanations and insights
721
+
722
+ ### **Sample Questions to Ask**
723
+ - *"How do I use this repository?"*
724
+ - *"What are the main functions?"*
725
+ - *"Show me example usage"*
726
+ - *"Explain the architecture"*
727
+ """
728
+ )
729
+
730
+ with gr.Accordion("🎯 Pro Tips & Best Practices", open=False):
731
+ gr.Markdown(
732
+ """
733
+ ### **πŸ€– Getting Better AI Results**
734
+ - Be specific about your use case
735
+ - Mention programming language preferences
736
+ - Describe your experience level
737
+ - Include performance requirements
738
+
739
+ ### **πŸ” Search Optimization**
740
+ - Use multiple relevant keywords
741
+ - Try different keyword combinations
742
+ - Check both general and specific terms
743
+
744
+ ### **πŸ“Š Analyzing Results**
745
+ - Read both strengths AND weaknesses
746
+ - Check speciality alignment with your needs
747
+ - Use Repository Explorer for detailed investigation
748
+ - Compare multiple options before deciding
749
+
750
+ ### **πŸ”„ Workflow Tips**
751
+ - Start with AI Assistant for personalized results
752
+ - Use Smart Search for known repositories
753
+ - Explore multiple repositories before choosing
754
+ - Save interesting repositories for later comparison
755
+ """
756
+ )
757
+
758
+ with gr.Row():
759
+ close_help_btn = gr.Button("βœ… Got It, Let's Start!", variant="primary", size="lg")
760
+
761
  with gr.Tabs() as tabs:
762
  # --- AI Assistant Tab (moved to first) ---
763
  with gr.TabItem("πŸ€– AI Assistant", id="chatbot_tab"):
 
1520
  outputs=[repo_ids_state, current_repo_idx_state, user_requirements_state, df_output, top_repos_df, top_repos_section, chatbot, status_box_input, current_requirements_display, extracted_keywords_output]
1521
  )
1522
 
1523
+ # Help modal events
1524
+ help_btn.click(
1525
+ fn=lambda: gr.update(visible=True),
1526
+ outputs=[help_modal]
1527
+ )
1528
+
1529
+ close_help_btn.click(
1530
+ fn=lambda: gr.update(visible=False),
1531
+ outputs=[help_modal]
1532
+ )
1533
+
1534
  return app
1535
 
1536
  if __name__ == "__main__":