jomasego commited on
Commit
c8a7e17
Β·
1 Parent(s): b4c1755

Add MCP Video Analysis application with Claude AI integration

Browse files
Files changed (3) hide show
  1. README.md +30 -3
  2. app.py +340 -0
  3. requirements.txt +4 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Mcp Video Frontend
3
- emoji: πŸ“ˆ
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: gradio
@@ -8,7 +8,34 @@ sdk_version: 5.33.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: This is a chat interface to demonstrate the video-MCP
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: MCP Video Analysis with Claude AI
3
+ emoji: πŸŽ₯
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: gradio
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: Intelligent video content analysis powered by Modal backend and Anthropic Claude
12
  ---
13
 
14
+ # πŸŽ₯ MCP Video Analysis with Claude AI
15
+
16
+ This application provides comprehensive video analysis using the Model Context Protocol (MCP) to integrate multiple AI technologies:
17
+
18
+ ## πŸ”§ Technology Stack
19
+ - **Modal Backend**: Scalable cloud compute for video processing
20
+ - **Whisper**: Speech-to-text transcription
21
+ - **Computer Vision Models**: Object detection, action recognition, and captioning
22
+ - **Anthropic Claude**: Advanced AI for intelligent content analysis
23
+ - **MCP Protocol**: Model Context Protocol for seamless integration
24
+
25
+ ## 🎯 Features
26
+ - **Transcription**: Extract spoken content from videos
27
+ - **Visual Analysis**: Identify objects, actions, and scenes
28
+ - **Content Understanding**: AI-powered insights and summaries
29
+ - **Custom Queries**: Ask specific questions about video content
30
+
31
+ ## πŸš€ Usage
32
+ 1. Enter a video URL (YouTube or direct link)
33
+ 2. Optionally ask a specific question
34
+ 3. Click "Analyze Video" to get comprehensive insights
35
+ 4. Review both Claude's intelligent analysis and raw data
36
+
37
+ ## πŸ”’ Environment Variables Required
38
+ - `ANTHROPIC_API_KEY`: Your Anthropic API key for Claude integration
39
+ - `MODAL_VIDEO_ANALYSIS_ENDPOINT_URL`: Modal backend endpoint (optional, has default)
40
+
41
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,340 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ MCP Video Analysis Client with Anthropic Integration
4
+
5
+ This application serves as an MCP (Model Context Protocol) client that:
6
+ 1. Connects to video analysis tools via MCP
7
+ 2. Integrates with Anthropic's Claude for intelligent video understanding
8
+ 3. Provides a Gradio interface for user interaction
9
+ """
10
+
11
+ import os
12
+ import json
13
+ import asyncio
14
+ import logging
15
+ from typing import Dict, Any, List, Optional
16
+ import gradio as gr
17
+ import httpx
18
+ from anthropic import Anthropic
19
+
20
+ # Configure logging
21
+ logging.basicConfig(level=logging.INFO)
22
+ logger = logging.getLogger(__name__)
23
+
24
+ class MCPVideoAnalysisClient:
25
+ """MCP Client for video analysis with Anthropic integration."""
26
+
27
+ def __init__(self):
28
+ # Initialize Anthropic client
29
+ self.anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
30
+ if not self.anthropic_api_key:
31
+ raise ValueError("ANTHROPIC_API_KEY environment variable is required")
32
+
33
+ self.anthropic_client = Anthropic(api_key=self.anthropic_api_key)
34
+
35
+ # Modal backend endpoint
36
+ self.modal_endpoint = os.getenv(
37
+ "MODAL_VIDEO_ANALYSIS_ENDPOINT_URL",
38
+ "https://jomasego--video-analysis-gradio-pipeline-process-video-analysis.modal.run"
39
+ )
40
+
41
+ logger.info(f"Initialized MCP Video Analysis Client with Modal endpoint: {self.modal_endpoint}")
42
+
43
+ async def analyze_video_with_modal(self, video_url: str) -> Dict[str, Any]:
44
+ """Call the Modal backend for comprehensive video analysis."""
45
+ try:
46
+ async with httpx.AsyncClient(timeout=300.0) as client:
47
+ logger.info(f"Calling Modal backend for video analysis: {video_url}")
48
+ response = await client.post(
49
+ self.modal_endpoint,
50
+ json={"video_url": video_url},
51
+ headers={"Content-Type": "application/json"}
52
+ )
53
+ response.raise_for_status()
54
+ return response.json()
55
+ except Exception as e:
56
+ logger.error(f"Error calling Modal backend: {e}")
57
+ return {"error": f"Modal backend error: {str(e)}"}
58
+
59
+ def enhance_analysis_with_claude(self, video_analysis: Dict[str, Any], user_query: str = None) -> str:
60
+ """Use Claude to provide intelligent insights about the video analysis."""
61
+
62
+ # Prepare the analysis data for Claude
63
+ analysis_summary = self._format_analysis_for_claude(video_analysis)
64
+
65
+ # Create the prompt for Claude
66
+ system_prompt = """You are an expert video analyst with deep knowledge of multimedia content, storytelling, and visual communication. You excel at interpreting video analysis data and providing meaningful insights.
67
+
68
+ Your task is to analyze the provided video analysis data and give intelligent, actionable insights. Focus on:
69
+ 1. Content understanding and themes
70
+ 2. Visual storytelling elements
71
+ 3. Technical quality assessment
72
+ 4. Audience engagement potential
73
+ 5. Key moments and highlights
74
+ 6. Contextual relevance
75
+
76
+ Be concise but thorough, and tailor your response to be useful for content creators, marketers, or researchers."""
77
+
78
+ if user_query:
79
+ user_prompt = f"""Here is the video analysis data:
80
+
81
+ {analysis_summary}
82
+
83
+ User's specific question: {user_query}
84
+
85
+ Please provide a comprehensive analysis addressing the user's question while incorporating insights from all the available data."""
86
+ else:
87
+ user_prompt = f"""Here is the video analysis data:
88
+
89
+ {analysis_summary}
90
+
91
+ Please provide a comprehensive analysis of this video, highlighting the most important insights and potential applications."""
92
+
93
+ try:
94
+ response = self.anthropic_client.messages.create(
95
+ model="claude-3-5-sonnet-20241022",
96
+ max_tokens=2000,
97
+ temperature=0.3,
98
+ system=system_prompt,
99
+ messages=[{"role": "user", "content": user_prompt}]
100
+ )
101
+
102
+ return response.content[0].text
103
+
104
+ except Exception as e:
105
+ logger.error(f"Error calling Anthropic API: {e}")
106
+ return f"Error generating Claude analysis: {str(e)}"
107
+
108
+ def _format_analysis_for_claude(self, analysis: Dict[str, Any]) -> str:
109
+ """Format the video analysis data for Claude consumption."""
110
+ formatted = []
111
+
112
+ # Handle transcription
113
+ if "transcription" in analysis:
114
+ transcription = analysis["transcription"]
115
+ if isinstance(transcription, str) and not transcription.startswith("Error"):
116
+ formatted.append(f"**TRANSCRIPTION:**\n{transcription}\n")
117
+ else:
118
+ formatted.append(f"**TRANSCRIPTION:** {transcription}\n")
119
+
120
+ # Handle caption
121
+ if "caption" in analysis:
122
+ caption = analysis["caption"]
123
+ if isinstance(caption, str) and not caption.startswith("Error"):
124
+ formatted.append(f"**VIDEO CAPTION:**\n{caption}\n")
125
+ else:
126
+ formatted.append(f"**VIDEO CAPTION:** {caption}\n")
127
+
128
+ # Handle actions
129
+ if "actions" in analysis:
130
+ actions = analysis["actions"]
131
+ if isinstance(actions, list) and actions:
132
+ action_text = []
133
+ for action in actions:
134
+ if isinstance(action, dict):
135
+ if "error" in action:
136
+ action_text.append(f"Error: {action['error']}")
137
+ else:
138
+ # Format action detection results
139
+ action_text.append(str(action))
140
+ else:
141
+ action_text.append(str(action))
142
+ formatted.append(f"**ACTION RECOGNITION:**\n{'; '.join(action_text)}\n")
143
+ else:
144
+ formatted.append(f"**ACTION RECOGNITION:** {actions}\n")
145
+
146
+ # Handle objects
147
+ if "objects" in analysis:
148
+ objects = analysis["objects"]
149
+ if isinstance(objects, list) and objects:
150
+ object_text = []
151
+ for obj in objects:
152
+ if isinstance(obj, dict):
153
+ if "error" in obj:
154
+ object_text.append(f"Error: {obj['error']}")
155
+ else:
156
+ # Format object detection results
157
+ object_text.append(str(obj))
158
+ else:
159
+ object_text.append(str(obj))
160
+ formatted.append(f"**OBJECT DETECTION:**\n{'; '.join(object_text)}\n")
161
+ else:
162
+ formatted.append(f"**OBJECT DETECTION:** {objects}\n")
163
+
164
+ # Handle any errors
165
+ if "error" in analysis:
166
+ formatted.append(f"**ANALYSIS ERROR:**\n{analysis['error']}\n")
167
+
168
+ return "\n".join(formatted) if formatted else "No analysis data available."
169
+
170
+ async def process_video_request(self, video_url: str, user_query: str = None) -> tuple[str, str]:
171
+ """Process a complete video analysis request with Claude enhancement."""
172
+ if not video_url or not video_url.strip():
173
+ return "Please provide a valid video URL.", ""
174
+
175
+ try:
176
+ # Step 1: Get video analysis from Modal backend
177
+ logger.info(f"Starting video analysis for: {video_url}")
178
+ video_analysis = await self.analyze_video_with_modal(video_url.strip())
179
+
180
+ # Step 2: Format the raw analysis for display
181
+ raw_analysis = json.dumps(video_analysis, indent=2)
182
+
183
+ # Step 3: Enhance with Claude insights
184
+ logger.info("Generating Claude insights...")
185
+ claude_insights = self.enhance_analysis_with_claude(video_analysis, user_query)
186
+
187
+ return claude_insights, raw_analysis
188
+
189
+ except Exception as e:
190
+ error_msg = f"Error processing video request: {str(e)}"
191
+ logger.error(error_msg)
192
+ return error_msg, ""
193
+
194
+ # Initialize the MCP client
195
+ try:
196
+ mcp_client = MCPVideoAnalysisClient()
197
+ logger.info("MCP Video Analysis Client initialized successfully")
198
+ except Exception as e:
199
+ logger.error(f"Failed to initialize MCP client: {e}")
200
+ mcp_client = None
201
+
202
+ # Gradio Interface Functions
203
+ async def analyze_video_interface(video_url: str, user_query: str = None) -> tuple[str, str]:
204
+ """Gradio interface function for video analysis."""
205
+ if not mcp_client:
206
+ return "MCP Client not initialized. Please check your environment variables.", ""
207
+
208
+ return await mcp_client.process_video_request(video_url, user_query)
209
+
210
+ def create_gradio_interface():
211
+ """Create and configure the Gradio interface."""
212
+
213
+ with gr.Blocks(
214
+ title="MCP Video Analysis with Claude",
215
+ theme=gr.themes.Soft(),
216
+ css="""
217
+ .gradio-container {
218
+ max-width: 1200px !important;
219
+ }
220
+ .main-header {
221
+ text-align: center;
222
+ margin-bottom: 30px;
223
+ }
224
+ .analysis-output {
225
+ max-height: 600px;
226
+ overflow-y: auto;
227
+ }
228
+ """
229
+ ) as interface:
230
+
231
+ gr.HTML("""
232
+ <div class="main-header">
233
+ <h1>πŸŽ₯ MCP Video Analysis with Claude AI</h1>
234
+ <p>Intelligent video content analysis powered by Modal backend and Anthropic Claude</p>
235
+ </div>
236
+ """)
237
+
238
+ with gr.Tab("πŸ” Video Analysis"):
239
+ with gr.Row():
240
+ with gr.Column(scale=1):
241
+ video_url_input = gr.Textbox(
242
+ label="Video URL",
243
+ placeholder="Enter YouTube URL or direct video link...",
244
+ lines=2
245
+ )
246
+ user_query_input = gr.Textbox(
247
+ label="Specific Question (Optional)",
248
+ placeholder="Ask a specific question about the video...",
249
+ lines=2
250
+ )
251
+
252
+ with gr.Row():
253
+ analyze_btn = gr.Button("πŸš€ Analyze Video", variant="primary", size="lg")
254
+ clear_btn = gr.Button("πŸ—‘οΈ Clear", variant="secondary")
255
+
256
+ with gr.Column(scale=2):
257
+ claude_output = gr.Textbox(
258
+ label="πŸ€– Claude AI Insights",
259
+ lines=20,
260
+ elem_classes=["analysis-output"],
261
+ interactive=False
262
+ )
263
+
264
+ with gr.Row():
265
+ raw_analysis_output = gr.JSON(
266
+ label="πŸ“Š Raw Analysis Data",
267
+ elem_classes=["analysis-output"]
268
+ )
269
+
270
+ # Example videos
271
+ gr.HTML("<h3>πŸ“ Example Videos to Try:</h3>")
272
+ with gr.Row():
273
+ example_urls = [
274
+ "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
275
+ "https://www.youtube.com/watch?v=jNQXAC9IVRw",
276
+ "https://www.youtube.com/watch?v=9bZkp7q19f0"
277
+ ]
278
+ for i, url in enumerate(example_urls, 1):
279
+ gr.Button(f"Example {i}", size="sm").click(
280
+ lambda url=url: url, outputs=video_url_input
281
+ )
282
+
283
+ with gr.Tab("ℹ️ About"):
284
+ gr.Markdown("""
285
+ ## About MCP Video Analysis
286
+
287
+ This application combines multiple AI technologies to provide comprehensive video analysis:
288
+
289
+ ### πŸ”§ Technology Stack
290
+ - **Modal Backend**: Scalable cloud compute for video processing
291
+ - **Whisper**: Speech-to-text transcription
292
+ - **Computer Vision Models**: Object detection, action recognition, and captioning
293
+ - **Anthropic Claude**: Advanced AI for intelligent content analysis
294
+ - **MCP Protocol**: Model Context Protocol for seamless integration
295
+
296
+ ### 🎯 Features
297
+ - **Transcription**: Extract spoken content from videos
298
+ - **Visual Analysis**: Identify objects, actions, and scenes
299
+ - **Content Understanding**: AI-powered insights and summaries
300
+ - **Custom Queries**: Ask specific questions about video content
301
+
302
+ ### πŸš€ Usage
303
+ 1. Enter a video URL (YouTube or direct link)
304
+ 2. Optionally ask a specific question
305
+ 3. Click "Analyze Video" to get comprehensive insights
306
+ 4. Review both Claude's intelligent analysis and raw data
307
+
308
+ ### πŸ”’ Privacy & Security
309
+ - Video processing is handled securely in the cloud
310
+ - No video data is stored permanently
311
+ - API keys are handled securely via environment variables
312
+ """)
313
+
314
+ # Event handlers
315
+ def clear_all():
316
+ return "", "", "", ""
317
+
318
+ analyze_btn.click(
319
+ fn=analyze_video_interface,
320
+ inputs=[video_url_input, user_query_input],
321
+ outputs=[claude_output, raw_analysis_output],
322
+ show_progress=True
323
+ )
324
+
325
+ clear_btn.click(
326
+ fn=clear_all,
327
+ outputs=[video_url_input, user_query_input, claude_output, raw_analysis_output]
328
+ )
329
+
330
+ return interface
331
+
332
+ # Create and launch the interface
333
+ if __name__ == "__main__":
334
+ interface = create_gradio_interface()
335
+ interface.launch(
336
+ server_name="0.0.0.0",
337
+ server_port=7860,
338
+ share=False,
339
+ show_error=True
340
+ )
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ anthropic>=0.40.0
3
+ httpx>=0.25.0
4
+ asyncio-compat>=0.1.0