jomasego commited on
Commit
3e48648
Β·
1 Parent(s): 282ce8f

feat: Replace Anthropic with Llama 3 for video analysis

Browse files
Files changed (3) hide show
  1. README.md +7 -7
  2. app.py +59 -143
  3. requirements.txt +0 -1
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: MCP Video Analysis with Claude AI
3
  emoji: πŸŽ₯
4
  colorFrom: purple
5
  colorTo: blue
@@ -8,10 +8,10 @@ sdk_version: 5.33.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: AI-powered video analysis with Claude and Modal
12
  ---
13
 
14
- # πŸŽ₯ MCP Video Analysis with Claude AI
15
 
16
  This application provides comprehensive video analysis using the Model Context Protocol (MCP) to integrate multiple AI technologies:
17
 
@@ -19,7 +19,7 @@ This application provides comprehensive video analysis using the Model Context P
19
  - **Modal Backend**: Scalable cloud compute for video processing
20
  - **Whisper**: Speech-to-text transcription
21
  - **Computer Vision Models**: Object detection, action recognition, and captioning
22
- - **Anthropic Claude**: Advanced AI for intelligent content analysis
23
  - **MCP Protocol**: Model Context Protocol for seamless integration
24
 
25
  ## 🎯 Features
@@ -32,10 +32,10 @@ This application provides comprehensive video analysis using the Model Context P
32
  1. Enter a video URL (YouTube or direct link)
33
  2. Optionally ask a specific question
34
  3. Click "Analyze Video" to get comprehensive insights
35
- 4. Review both Claude's intelligent analysis and raw data
36
 
37
  ## πŸ”’ Environment Variables Required
38
- - `ANTHROPIC_API_KEY`: Your Anthropic API key for Claude integration
39
- - `MODAL_VIDEO_ANALYSIS_ENDPOINT_URL`: Modal backend endpoint (optional, has default)
40
 
41
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: MCP Video Analysis with Llama 3
3
  emoji: πŸŽ₯
4
  colorFrom: purple
5
  colorTo: blue
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: AI-powered video analysis with Llama 3 and Modal
12
  ---
13
 
14
+ # πŸŽ₯ MCP Video Analysis with Llama 3
15
 
16
  This application provides comprehensive video analysis using the Model Context Protocol (MCP) to integrate multiple AI technologies:
17
 
 
19
  - **Modal Backend**: Scalable cloud compute for video processing
20
  - **Whisper**: Speech-to-text transcription
21
  - **Computer Vision Models**: Object detection, action recognition, and captioning
22
+ - **Meta Llama 3**: Advanced AI for intelligent content analysis, hosted on Modal
23
  - **MCP Protocol**: Model Context Protocol for seamless integration
24
 
25
  ## 🎯 Features
 
32
  1. Enter a video URL (YouTube or direct link)
33
  2. Optionally ask a specific question
34
  3. Click "Analyze Video" to get comprehensive insights
35
+ 4. Review both Llama 3's intelligent analysis and raw data
36
 
37
  ## πŸ”’ Environment Variables Required
38
+ - `MODAL_LLAMA3_ENDPOINT_URL`: The URL for the deployed Llama 3 Modal service.
39
+ - `MODAL_VIDEO_ANALYSIS_ENDPOINT_URL`: The URL for the video processing Modal service (optional, has a default value).
40
 
41
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -1,174 +1,90 @@
1
  #!/usr/bin/env python3
2
  """
3
- MCP Video Analysis Client with Anthropic Integration
4
 
5
  This application serves as an MCP (Model Context Protocol) client that:
6
  1. Connects to video analysis tools via MCP
7
- 2. Integrates with Anthropic's Claude for intelligent video understanding
8
  3. Provides a Gradio interface for user interaction
9
  """
10
 
11
  import os
12
  import json
13
- import asyncio
14
  import logging
15
- from typing import Dict, Any, List, Optional
16
  import gradio as gr
17
  import httpx
18
- from anthropic import Anthropic
19
 
20
  # Configure logging
21
  logging.basicConfig(level=logging.INFO)
22
  logger = logging.getLogger(__name__)
23
 
24
  class MCPVideoAnalysisClient:
25
- """MCP Client for video analysis with Anthropic integration."""
26
 
27
  def __init__(self):
28
- # Initialize Anthropic client
29
- self.anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
30
- if not self.anthropic_api_key:
31
- raise ValueError("ANTHROPIC_API_KEY environment variable is required")
32
-
33
- self.anthropic_client = Anthropic(api_key=self.anthropic_api_key)
34
-
35
- # Modal backend endpoint
36
- self.modal_endpoint = os.getenv(
37
  "MODAL_VIDEO_ANALYSIS_ENDPOINT_URL",
38
  "https://jomasego--video-analysis-gradio-pipeline-process-video-analysis.modal.run"
39
  )
40
 
41
- logger.info(f"Initialized MCP Video Analysis Client with Modal endpoint: {self.modal_endpoint}")
42
-
 
 
 
 
 
 
 
 
 
 
 
 
43
  async def analyze_video_with_modal(self, video_url: str) -> Dict[str, Any]:
44
  """Call the Modal backend for comprehensive video analysis."""
45
  try:
46
  async with httpx.AsyncClient(timeout=300.0) as client:
47
- logger.info(f"Calling Modal backend for video analysis: {video_url}")
48
  response = await client.post(
49
- self.modal_endpoint,
50
  json={"video_url": video_url},
51
  headers={"Content-Type": "application/json"}
52
  )
53
  response.raise_for_status()
54
  return response.json()
55
  except Exception as e:
56
- logger.error(f"Error calling Modal backend: {e}")
57
- return {"error": f"Modal backend error: {str(e)}"}
58
 
59
- def enhance_analysis_with_claude(self, video_analysis: Dict[str, Any], user_query: str = None) -> str:
60
- """Use Claude to provide intelligent insights about the video analysis."""
61
-
62
- # Prepare the analysis data for Claude
63
- analysis_summary = self._format_analysis_for_claude(video_analysis)
64
-
65
- # Create the prompt for Claude
66
- system_prompt = """You are an expert video analyst with deep knowledge of multimedia content, storytelling, and visual communication. You excel at interpreting video analysis data and providing meaningful insights.
67
-
68
- Your task is to analyze the provided video analysis data and give intelligent, actionable insights. Focus on:
69
- 1. Content understanding and themes
70
- 2. Visual storytelling elements
71
- 3. Technical quality assessment
72
- 4. Audience engagement potential
73
- 5. Key moments and highlights
74
- 6. Contextual relevance
75
-
76
- Be concise but thorough, and tailor your response to be useful for content creators, marketers, or researchers."""
77
-
78
- if user_query:
79
- user_prompt = f"""Here is the video analysis data:
80
-
81
- {analysis_summary}
82
-
83
- User's specific question: {user_query}
84
-
85
- Please provide a comprehensive analysis addressing the user's question while incorporating insights from all the available data."""
86
- else:
87
- user_prompt = f"""Here is the video analysis data:
88
-
89
- {analysis_summary}
90
-
91
- Please provide a comprehensive analysis of this video, highlighting the most important insights and potential applications."""
92
 
93
  try:
94
- response = self.anthropic_client.messages.create(
95
- model="claude-3-5-sonnet-20241022",
96
- max_tokens=2000,
97
- temperature=0.3,
98
- system=system_prompt,
99
- messages=[{"role": "user", "content": user_prompt}]
100
- )
101
-
102
- return response.content[0].text
103
-
 
 
 
 
104
  except Exception as e:
105
- logger.error(f"Error calling Anthropic API: {e}")
106
- return f"Error generating Claude analysis: {str(e)}"
107
-
108
- def _format_analysis_for_claude(self, analysis: Dict[str, Any]) -> str:
109
- """Format the video analysis data for Claude consumption."""
110
- formatted = []
111
-
112
- # Handle transcription
113
- if "transcription" in analysis:
114
- transcription = analysis["transcription"]
115
- if isinstance(transcription, str) and not transcription.startswith("Error"):
116
- formatted.append(f"**TRANSCRIPTION:**\n{transcription}\n")
117
- else:
118
- formatted.append(f"**TRANSCRIPTION:** {transcription}\n")
119
-
120
- # Handle caption
121
- if "caption" in analysis:
122
- caption = analysis["caption"]
123
- if isinstance(caption, str) and not caption.startswith("Error"):
124
- formatted.append(f"**VIDEO CAPTION:**\n{caption}\n")
125
- else:
126
- formatted.append(f"**VIDEO CAPTION:** {caption}\n")
127
-
128
- # Handle actions
129
- if "actions" in analysis:
130
- actions = analysis["actions"]
131
- if isinstance(actions, list) and actions:
132
- action_text = []
133
- for action in actions:
134
- if isinstance(action, dict):
135
- if "error" in action:
136
- action_text.append(f"Error: {action['error']}")
137
- else:
138
- # Format action detection results
139
- action_text.append(str(action))
140
- else:
141
- action_text.append(str(action))
142
- formatted.append(f"**ACTION RECOGNITION:**\n{'; '.join(action_text)}\n")
143
- else:
144
- formatted.append(f"**ACTION RECOGNITION:** {actions}\n")
145
-
146
- # Handle objects
147
- if "objects" in analysis:
148
- objects = analysis["objects"]
149
- if isinstance(objects, list) and objects:
150
- object_text = []
151
- for obj in objects:
152
- if isinstance(obj, dict):
153
- if "error" in obj:
154
- object_text.append(f"Error: {obj['error']}")
155
- else:
156
- # Format object detection results
157
- object_text.append(str(obj))
158
- else:
159
- object_text.append(str(obj))
160
- formatted.append(f"**OBJECT DETECTION:**\n{'; '.join(object_text)}\n")
161
- else:
162
- formatted.append(f"**OBJECT DETECTION:** {objects}\n")
163
-
164
- # Handle any errors
165
- if "error" in analysis:
166
- formatted.append(f"**ANALYSIS ERROR:**\n{analysis['error']}\n")
167
-
168
- return "\n".join(formatted) if formatted else "No analysis data available."
169
 
170
  async def process_video_request(self, video_url: str, user_query: str = None) -> tuple[str, str]:
171
- """Process a complete video analysis request with Claude enhancement."""
172
  if not video_url or not video_url.strip():
173
  return "Please provide a valid video URL.", ""
174
 
@@ -180,11 +96,11 @@ Please provide a comprehensive analysis of this video, highlighting the most imp
180
  # Step 2: Format the raw analysis for display
181
  raw_analysis = json.dumps(video_analysis, indent=2)
182
 
183
- # Step 3: Enhance with Claude insights
184
- logger.info("Generating Claude insights...")
185
- claude_insights = self.enhance_analysis_with_claude(video_analysis, user_query)
186
 
187
- return claude_insights, raw_analysis
188
 
189
  except Exception as e:
190
  error_msg = f"Error processing video request: {str(e)}"
@@ -211,7 +127,7 @@ def create_gradio_interface():
211
  """Create and configure the Gradio interface."""
212
 
213
  with gr.Blocks(
214
- title="MCP Video Analysis with Claude",
215
  theme=gr.themes.Soft(),
216
  css="""
217
  .gradio-container {
@@ -230,8 +146,8 @@ def create_gradio_interface():
230
 
231
  gr.HTML("""
232
  <div class="main-header">
233
- <h1>πŸŽ₯ MCP Video Analysis with Claude AI</h1>
234
- <p>Intelligent video content analysis powered by Modal backend and Anthropic Claude</p>
235
  </div>
236
  """)
237
 
@@ -254,8 +170,8 @@ def create_gradio_interface():
254
  clear_btn = gr.Button("πŸ—‘οΈ Clear", variant="secondary")
255
 
256
  with gr.Column(scale=2):
257
- claude_output = gr.Textbox(
258
- label="πŸ€– Claude AI Insights",
259
  lines=20,
260
  elem_classes=["analysis-output"],
261
  interactive=False
@@ -287,10 +203,10 @@ def create_gradio_interface():
287
  This application combines multiple AI technologies to provide comprehensive video analysis:
288
 
289
  ### πŸ”§ Technology Stack
290
- - **Modal Backend**: Scalable cloud compute for video processing
291
  - **Whisper**: Speech-to-text transcription
292
  - **Computer Vision Models**: Object detection, action recognition, and captioning
293
- - **Anthropic Claude**: Advanced AI for intelligent content analysis
294
  - **MCP Protocol**: Model Context Protocol for seamless integration
295
 
296
  ### 🎯 Features
@@ -303,7 +219,7 @@ def create_gradio_interface():
303
  1. Enter a video URL (YouTube or direct link)
304
  2. Optionally ask a specific question
305
  3. Click "Analyze Video" to get comprehensive insights
306
- 4. Review both Claude's intelligent analysis and raw data
307
 
308
  ### πŸ”’ Privacy & Security
309
  - Video processing is handled securely in the cloud
@@ -318,13 +234,13 @@ def create_gradio_interface():
318
  analyze_btn.click(
319
  fn=analyze_video_interface,
320
  inputs=[video_url_input, user_query_input],
321
- outputs=[claude_output, raw_analysis_output],
322
  show_progress=True
323
  )
324
 
325
  clear_btn.click(
326
  fn=clear_all,
327
- outputs=[video_url_input, user_query_input, claude_output, raw_analysis_output]
328
  )
329
 
330
  return interface
 
1
  #!/usr/bin/env python3
2
  """
3
+ MCP Video Analysis Client with Llama 3 Integration
4
 
5
  This application serves as an MCP (Model Context Protocol) client that:
6
  1. Connects to video analysis tools via MCP
7
+ 2. Integrates with a Llama 3 model hosted on Modal for intelligent video understanding
8
  3. Provides a Gradio interface for user interaction
9
  """
10
 
11
  import os
12
  import json
 
13
  import logging
14
+ from typing import Dict, Any, Optional
15
  import gradio as gr
16
  import httpx
 
17
 
18
  # Configure logging
19
  logging.basicConfig(level=logging.INFO)
20
  logger = logging.getLogger(__name__)
21
 
22
  class MCPVideoAnalysisClient:
23
+ """MCP Client for video analysis with Llama 3 integration."""
24
 
25
  def __init__(self):
26
+ # Modal backend for video processing
27
+ self.video_analysis_endpoint = os.getenv(
 
 
 
 
 
 
 
28
  "MODAL_VIDEO_ANALYSIS_ENDPOINT_URL",
29
  "https://jomasego--video-analysis-gradio-pipeline-process-video-analysis.modal.run"
30
  )
31
 
32
+ # Modal backend for Llama 3 insights
33
+ self.llama_endpoint = os.getenv(
34
+ "MODAL_LLAMA3_ENDPOINT_URL"
35
+ # This will be set to the deployed Llama 3 app URL.
36
+ # e.g., "https://jomasego--llama3-inference-service-summarize.modal.run"
37
+ )
38
+
39
+ logger.info(f"Initialized MCP Client.")
40
+ logger.info(f"Video Analysis Endpoint: {self.video_analysis_endpoint}")
41
+ if not self.llama_endpoint:
42
+ logger.warning("MODAL_LLAMA3_ENDPOINT_URL not set. LLM insights will be unavailable.")
43
+ else:
44
+ logger.info(f"Llama 3 Endpoint: {self.llama_endpoint}")
45
+
46
  async def analyze_video_with_modal(self, video_url: str) -> Dict[str, Any]:
47
  """Call the Modal backend for comprehensive video analysis."""
48
  try:
49
  async with httpx.AsyncClient(timeout=300.0) as client:
50
+ logger.info(f"Calling video analysis backend: {video_url}")
51
  response = await client.post(
52
+ self.video_analysis_endpoint,
53
  json={"video_url": video_url},
54
  headers={"Content-Type": "application/json"}
55
  )
56
  response.raise_for_status()
57
  return response.json()
58
  except Exception as e:
59
+ logger.error(f"Error calling video analysis backend: {e}")
60
+ return {"error": f"Video analysis backend error: {str(e)}"}
61
 
62
+ async def get_insights_from_llama3(self, analysis_data: Dict[str, Any], user_query: Optional[str] = None) -> str:
63
+ """Call the Llama 3 Modal backend for intelligent insights."""
64
+ if not self.llama_endpoint:
65
+ return "Llama 3 endpoint is not configured. Cannot generate insights."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  try:
68
+ payload = {
69
+ "analysis_data": analysis_data,
70
+ "user_query": user_query
71
+ }
72
+ async with httpx.AsyncClient(timeout=300.0) as client:
73
+ logger.info(f"Calling Llama 3 Modal backend for insights.")
74
+ response = await client.post(
75
+ self.llama_endpoint,
76
+ json=payload,
77
+ headers={"Content-Type": "application/json"}
78
+ )
79
+ response.raise_for_status()
80
+ result = response.json()
81
+ return result.get("summary", "No summary returned from Llama 3 service.")
82
  except Exception as e:
83
+ logger.error(f"Error calling Llama 3 backend: {e}")
84
+ return f"Error generating Llama 3 insights: {str(e)}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  async def process_video_request(self, video_url: str, user_query: str = None) -> tuple[str, str]:
87
+ """Process a complete video analysis request with Llama 3 enhancement."""
88
  if not video_url or not video_url.strip():
89
  return "Please provide a valid video URL.", ""
90
 
 
96
  # Step 2: Format the raw analysis for display
97
  raw_analysis = json.dumps(video_analysis, indent=2)
98
 
99
+ # Step 3: Enhance with Llama 3 insights
100
+ logger.info("Generating Llama 3 insights...")
101
+ llama_insights = await self.get_insights_from_llama3(video_analysis, user_query)
102
 
103
+ return llama_insights, raw_analysis
104
 
105
  except Exception as e:
106
  error_msg = f"Error processing video request: {str(e)}"
 
127
  """Create and configure the Gradio interface."""
128
 
129
  with gr.Blocks(
130
+ title="MCP Video Analysis with Llama 3",
131
  theme=gr.themes.Soft(),
132
  css="""
133
  .gradio-container {
 
146
 
147
  gr.HTML("""
148
  <div class="main-header">
149
+ <h1>πŸŽ₯ MCP Video Analysis with Llama 3 AI</h1>
150
+ <p>Intelligent video content analysis powered by a Modal backend and Llama 3</p>
151
  </div>
152
  """)
153
 
 
170
  clear_btn = gr.Button("πŸ—‘οΈ Clear", variant="secondary")
171
 
172
  with gr.Column(scale=2):
173
+ llama_output = gr.Textbox(
174
+ label="πŸ€– Llama 3 AI Insights",
175
  lines=20,
176
  elem_classes=["analysis-output"],
177
  interactive=False
 
203
  This application combines multiple AI technologies to provide comprehensive video analysis:
204
 
205
  ### πŸ”§ Technology Stack
206
+ - **Modal Backend**: Scalable cloud compute for video processing and LLM inference
207
  - **Whisper**: Speech-to-text transcription
208
  - **Computer Vision Models**: Object detection, action recognition, and captioning
209
+ - **Meta Llama 3**: Advanced AI for intelligent content analysis
210
  - **MCP Protocol**: Model Context Protocol for seamless integration
211
 
212
  ### 🎯 Features
 
219
  1. Enter a video URL (YouTube or direct link)
220
  2. Optionally ask a specific question
221
  3. Click "Analyze Video" to get comprehensive insights
222
+ 4. Review both Llama 3's intelligent analysis and raw data
223
 
224
  ### πŸ”’ Privacy & Security
225
  - Video processing is handled securely in the cloud
 
234
  analyze_btn.click(
235
  fn=analyze_video_interface,
236
  inputs=[video_url_input, user_query_input],
237
+ outputs=[llama_output, raw_analysis_output],
238
  show_progress=True
239
  )
240
 
241
  clear_btn.click(
242
  fn=clear_all,
243
+ outputs=[video_url_input, user_query_input, llama_output, raw_analysis_output]
244
  )
245
 
246
  return interface
requirements.txt CHANGED
@@ -1,4 +1,3 @@
1
  gradio>=4.0.0
2
- anthropic>=0.40.0
3
  httpx>=0.25.0
4
  asyncio-compat>=0.1.0
 
1
  gradio>=4.0.0
 
2
  httpx>=0.25.0
3
  asyncio-compat>=0.1.0