Spaces:

jomasego
/

mcp-video-frontend

Sleeping

App Files Files Community

jomasego commited on Jun 10

Commit

3e48648

1 Parent(s): 282ce8f

feat: Replace Anthropic with Llama 3 for video analysis

Browse files

Files changed (3) hide show

README.md +7 -7
app.py +59 -143
requirements.txt +0 -1

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: MCP Video Analysis with Claude AI
 emoji: 🎥
 colorFrom: purple
 colorTo: blue
@@ -8,10 +8,10 @@ sdk_version: 5.33.1
 app_file: app.py
 pinned: false
 license: mit
-short_description: AI-powered video analysis with Claude and Modal
 ---
-# 🎥 MCP Video Analysis with Claude AI
 This application provides comprehensive video analysis using the Model Context Protocol (MCP) to integrate multiple AI technologies:
@@ -19,7 +19,7 @@ This application provides comprehensive video analysis using the Model Context P
 - **Modal Backend**: Scalable cloud compute for video processing
 - **Whisper**: Speech-to-text transcription
 - **Computer Vision Models**: Object detection, action recognition, and captioning
-- **Anthropic Claude**: Advanced AI for intelligent content analysis
 - **MCP Protocol**: Model Context Protocol for seamless integration
 ## 🎯 Features
@@ -32,10 +32,10 @@ This application provides comprehensive video analysis using the Model Context P
 1. Enter a video URL (YouTube or direct link)
 2. Optionally ask a specific question
 3. Click "Analyze Video" to get comprehensive insights
-4. Review both Claude's intelligent analysis and raw data
 ## 🔒 Environment Variables Required
-- `ANTHROPIC_API_KEY`: Your Anthropic API key for Claude integration
-- `MODAL_VIDEO_ANALYSIS_ENDPOINT_URL`: Modal backend endpoint (optional, has default)
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: MCP Video Analysis with Llama 3
 emoji: 🎥
 colorFrom: purple
 colorTo: blue
 app_file: app.py
 pinned: false
 license: mit
+short_description: AI-powered video analysis with Llama 3 and Modal
 ---
+# 🎥 MCP Video Analysis with Llama 3
 This application provides comprehensive video analysis using the Model Context Protocol (MCP) to integrate multiple AI technologies:
 - **Modal Backend**: Scalable cloud compute for video processing
 - **Whisper**: Speech-to-text transcription
 - **Computer Vision Models**: Object detection, action recognition, and captioning
+- **Meta Llama 3**: Advanced AI for intelligent content analysis, hosted on Modal
 - **MCP Protocol**: Model Context Protocol for seamless integration
 ## 🎯 Features
 1. Enter a video URL (YouTube or direct link)
 2. Optionally ask a specific question
 3. Click "Analyze Video" to get comprehensive insights
+4. Review both Llama 3's intelligent analysis and raw data
 ## 🔒 Environment Variables Required
+- `MODAL_LLAMA3_ENDPOINT_URL`: The URL for the deployed Llama 3 Modal service.
+- `MODAL_VIDEO_ANALYSIS_ENDPOINT_URL`: The URL for the video processing Modal service (optional, has a default value).
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,174 +1,90 @@
 #!/usr/bin/env python3
 """
-MCP Video Analysis Client with Anthropic Integration
 This application serves as an MCP (Model Context Protocol) client that:
 1. Connects to video analysis tools via MCP
-2. Integrates with Anthropic's Claude for intelligent video understanding
 3. Provides a Gradio interface for user interaction
 """
 import os
 import json
-import asyncio
 import logging
-from typing import Dict, Any, List, Optional
 import gradio as gr
 import httpx
-from anthropic import Anthropic
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 class MCPVideoAnalysisClient:
-    """MCP Client for video analysis with Anthropic integration."""
     def __init__(self):
-        # Initialize Anthropic client
-        self.anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
-        if not self.anthropic_api_key:
-            raise ValueError("ANTHROPIC_API_KEY environment variable is required")
-        self.anthropic_client = Anthropic(api_key=self.anthropic_api_key)
-        # Modal backend endpoint
-        self.modal_endpoint = os.getenv(
             "MODAL_VIDEO_ANALYSIS_ENDPOINT_URL",
             "https://jomasego--video-analysis-gradio-pipeline-process-video-analysis.modal.run"
         )
-        logger.info(f"Initialized MCP Video Analysis Client with Modal endpoint: {self.modal_endpoint}")
     async def analyze_video_with_modal(self, video_url: str) -> Dict[str, Any]:
         """Call the Modal backend for comprehensive video analysis."""
         try:
             async with httpx.AsyncClient(timeout=300.0) as client:
-                logger.info(f"Calling Modal backend for video analysis: {video_url}")
                 response = await client.post(
-                    self.modal_endpoint,
                     json={"video_url": video_url},
                     headers={"Content-Type": "application/json"}
                 )
                 response.raise_for_status()
                 return response.json()
         except Exception as e:
-            logger.error(f"Error calling Modal backend: {e}")
-            return {"error": f"Modal backend error: {str(e)}"}
-    def enhance_analysis_with_claude(self, video_analysis: Dict[str, Any], user_query: str = None) -> str:
-        """Use Claude to provide intelligent insights about the video analysis."""
-        # Prepare the analysis data for Claude
-        analysis_summary = self._format_analysis_for_claude(video_analysis)
-        # Create the prompt for Claude
-        system_prompt = """You are an expert video analyst with deep knowledge of multimedia content, storytelling, and visual communication. You excel at interpreting video analysis data and providing meaningful insights.
-Your task is to analyze the provided video analysis data and give intelligent, actionable insights. Focus on:
-1. Content understanding and themes
-2. Visual storytelling elements
-3. Technical quality assessment
-4. Audience engagement potential
-5. Key moments and highlights
-6. Contextual relevance
-Be concise but thorough, and tailor your response to be useful for content creators, marketers, or researchers."""
-        if user_query:
-            user_prompt = f"""Here is the video analysis data:
-{analysis_summary}
-User's specific question: {user_query}
-Please provide a comprehensive analysis addressing the user's question while incorporating insights from all the available data."""
-        else:
-            user_prompt = f"""Here is the video analysis data:
-{analysis_summary}
-Please provide a comprehensive analysis of this video, highlighting the most important insights and potential applications."""
         try:
-            response = self.anthropic_client.messages.create(
-                model="claude-3-5-sonnet-20241022",
-                max_tokens=2000,
-                temperature=0.3,
-                system=system_prompt,
-                messages=[{"role": "user", "content": user_prompt}]
-            )
-            return response.content[0].text
         except Exception as e:
-            logger.error(f"Error calling Anthropic API: {e}")
-            return f"Error generating Claude analysis: {str(e)}"
-    def _format_analysis_for_claude(self, analysis: Dict[str, Any]) -> str:
-        """Format the video analysis data for Claude consumption."""
-        formatted = []
-        # Handle transcription
-        if "transcription" in analysis:
-            transcription = analysis["transcription"]
-            if isinstance(transcription, str) and not transcription.startswith("Error"):
-                formatted.append(f"**TRANSCRIPTION:**\n{transcription}\n")
-            else:
-                formatted.append(f"**TRANSCRIPTION:** {transcription}\n")
-        # Handle caption
-        if "caption" in analysis:
-            caption = analysis["caption"]
-            if isinstance(caption, str) and not caption.startswith("Error"):
-                formatted.append(f"**VIDEO CAPTION:**\n{caption}\n")
-            else:
-                formatted.append(f"**VIDEO CAPTION:** {caption}\n")
-        # Handle actions
-        if "actions" in analysis:
-            actions = analysis["actions"]
-            if isinstance(actions, list) and actions:
-                action_text = []
-                for action in actions:
-                    if isinstance(action, dict):
-                        if "error" in action:
-                            action_text.append(f"Error: {action['error']}")
-                        else:
-                            # Format action detection results
-                            action_text.append(str(action))
-                    else:
-                        action_text.append(str(action))
-                formatted.append(f"**ACTION RECOGNITION:**\n{'; '.join(action_text)}\n")
-            else:
-                formatted.append(f"**ACTION RECOGNITION:** {actions}\n")
-        # Handle objects
-        if "objects" in analysis:
-            objects = analysis["objects"]
-            if isinstance(objects, list) and objects:
-                object_text = []
-                for obj in objects:
-                    if isinstance(obj, dict):
-                        if "error" in obj:
-                            object_text.append(f"Error: {obj['error']}")
-                        else:
-                            # Format object detection results
-                            object_text.append(str(obj))
-                    else:
-                        object_text.append(str(obj))
-                formatted.append(f"**OBJECT DETECTION:**\n{'; '.join(object_text)}\n")
-            else:
-                formatted.append(f"**OBJECT DETECTION:** {objects}\n")
-        # Handle any errors
-        if "error" in analysis:
-            formatted.append(f"**ANALYSIS ERROR:**\n{analysis['error']}\n")
-        return "\n".join(formatted) if formatted else "No analysis data available."
     async def process_video_request(self, video_url: str, user_query: str = None) -> tuple[str, str]:
-        """Process a complete video analysis request with Claude enhancement."""
         if not video_url or not video_url.strip():
             return "Please provide a valid video URL.", ""
@@ -180,11 +96,11 @@ Please provide a comprehensive analysis of this video, highlighting the most imp
             # Step 2: Format the raw analysis for display
             raw_analysis = json.dumps(video_analysis, indent=2)
-            # Step 3: Enhance with Claude insights
-            logger.info("Generating Claude insights...")
-            claude_insights = self.enhance_analysis_with_claude(video_analysis, user_query)
-            return claude_insights, raw_analysis
         except Exception as e:
             error_msg = f"Error processing video request: {str(e)}"
@@ -211,7 +127,7 @@ def create_gradio_interface():
     """Create and configure the Gradio interface."""
     with gr.Blocks(
-        title="MCP Video Analysis with Claude",
         theme=gr.themes.Soft(),
         css="""
         .gradio-container {
@@ -230,8 +146,8 @@ def create_gradio_interface():
         gr.HTML("""
         <div class="main-header">
-            <h1>🎥 MCP Video Analysis with Claude AI</h1>
-            <p>Intelligent video content analysis powered by Modal backend and Anthropic Claude</p>
         </div>
         """)
@@ -254,8 +170,8 @@ def create_gradio_interface():
                         clear_btn = gr.Button("🗑️ Clear", variant="secondary")
                 with gr.Column(scale=2):
-                    claude_output = gr.Textbox(
-                        label="🤖 Claude AI Insights",
                         lines=20,
                         elem_classes=["analysis-output"],
                         interactive=False
@@ -287,10 +203,10 @@ def create_gradio_interface():
             This application combines multiple AI technologies to provide comprehensive video analysis:
             ### 🔧 Technology Stack
-            - **Modal Backend**: Scalable cloud compute for video processing
             - **Whisper**: Speech-to-text transcription
             - **Computer Vision Models**: Object detection, action recognition, and captioning
-            - **Anthropic Claude**: Advanced AI for intelligent content analysis
             - **MCP Protocol**: Model Context Protocol for seamless integration
             ### 🎯 Features
@@ -303,7 +219,7 @@ def create_gradio_interface():
             1. Enter a video URL (YouTube or direct link)
             2. Optionally ask a specific question
             3. Click "Analyze Video" to get comprehensive insights
-            4. Review both Claude's intelligent analysis and raw data
             ### 🔒 Privacy & Security
             - Video processing is handled securely in the cloud
@@ -318,13 +234,13 @@ def create_gradio_interface():
         analyze_btn.click(
             fn=analyze_video_interface,
             inputs=[video_url_input, user_query_input],
-            outputs=[claude_output, raw_analysis_output],
             show_progress=True
         )
         clear_btn.click(
             fn=clear_all,
-            outputs=[video_url_input, user_query_input, claude_output, raw_analysis_output]
         )
     return interface

 #!/usr/bin/env python3
 """
+MCP Video Analysis Client with Llama 3 Integration
 This application serves as an MCP (Model Context Protocol) client that:
 1. Connects to video analysis tools via MCP
+2. Integrates with a Llama 3 model hosted on Modal for intelligent video understanding
 3. Provides a Gradio interface for user interaction
 """
 import os
 import json
 import logging
+from typing import Dict, Any, Optional
 import gradio as gr
 import httpx
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 class MCPVideoAnalysisClient:
+    """MCP Client for video analysis with Llama 3 integration."""
     def __init__(self):
+        # Modal backend for video processing
+        self.video_analysis_endpoint = os.getenv(
             "MODAL_VIDEO_ANALYSIS_ENDPOINT_URL",
             "https://jomasego--video-analysis-gradio-pipeline-process-video-analysis.modal.run"
         )
+        # Modal backend for Llama 3 insights
+        self.llama_endpoint = os.getenv(
+            "MODAL_LLAMA3_ENDPOINT_URL"
+            # This will be set to the deployed Llama 3 app URL.
+            # e.g., "https://jomasego--llama3-inference-service-summarize.modal.run"
+        )
+        logger.info(f"Initialized MCP Client.")
+        logger.info(f"Video Analysis Endpoint: {self.video_analysis_endpoint}")
+        if not self.llama_endpoint:
+            logger.warning("MODAL_LLAMA3_ENDPOINT_URL not set. LLM insights will be unavailable.")
+        else:
+            logger.info(f"Llama 3 Endpoint: {self.llama_endpoint}")
     async def analyze_video_with_modal(self, video_url: str) -> Dict[str, Any]:
         """Call the Modal backend for comprehensive video analysis."""
         try:
             async with httpx.AsyncClient(timeout=300.0) as client:
+                logger.info(f"Calling video analysis backend: {video_url}")
                 response = await client.post(
+                    self.video_analysis_endpoint,
                     json={"video_url": video_url},
                     headers={"Content-Type": "application/json"}
                 )
                 response.raise_for_status()
                 return response.json()
         except Exception as e:
+            logger.error(f"Error calling video analysis backend: {e}")
+            return {"error": f"Video analysis backend error: {str(e)}"}
+    async def get_insights_from_llama3(self, analysis_data: Dict[str, Any], user_query: Optional[str] = None) -> str:
+        """Call the Llama 3 Modal backend for intelligent insights."""
+        if not self.llama_endpoint:
+            return "Llama 3 endpoint is not configured. Cannot generate insights."
         try:
+            payload = {
+                "analysis_data": analysis_data,
+                "user_query": user_query
+            }
+            async with httpx.AsyncClient(timeout=300.0) as client:
+                logger.info(f"Calling Llama 3 Modal backend for insights.")
+                response = await client.post(
+                    self.llama_endpoint,
+                    json=payload,
+                    headers={"Content-Type": "application/json"}
+                )
+                response.raise_for_status()
+                result = response.json()
+                return result.get("summary", "No summary returned from Llama 3 service.")
         except Exception as e:
+            logger.error(f"Error calling Llama 3 backend: {e}")
+            return f"Error generating Llama 3 insights: {str(e)}"
     async def process_video_request(self, video_url: str, user_query: str = None) -> tuple[str, str]:
+        """Process a complete video analysis request with Llama 3 enhancement."""
         if not video_url or not video_url.strip():
             return "Please provide a valid video URL.", ""
             # Step 2: Format the raw analysis for display
             raw_analysis = json.dumps(video_analysis, indent=2)
+            # Step 3: Enhance with Llama 3 insights
+            logger.info("Generating Llama 3 insights...")
+            llama_insights = await self.get_insights_from_llama3(video_analysis, user_query)
+            return llama_insights, raw_analysis
         except Exception as e:
             error_msg = f"Error processing video request: {str(e)}"
     """Create and configure the Gradio interface."""
     with gr.Blocks(
+        title="MCP Video Analysis with Llama 3",
         theme=gr.themes.Soft(),
         css="""
         .gradio-container {
         gr.HTML("""
         <div class="main-header">
+            <h1>🎥 MCP Video Analysis with Llama 3 AI</h1>
+            <p>Intelligent video content analysis powered by a Modal backend and Llama 3</p>
         </div>
         """)
                         clear_btn = gr.Button("🗑️ Clear", variant="secondary")
                 with gr.Column(scale=2):
+                    llama_output = gr.Textbox(
+                        label="🤖 Llama 3 AI Insights",
                         lines=20,
                         elem_classes=["analysis-output"],
                         interactive=False
             This application combines multiple AI technologies to provide comprehensive video analysis:
             ### 🔧 Technology Stack
+            - **Modal Backend**: Scalable cloud compute for video processing and LLM inference
             - **Whisper**: Speech-to-text transcription
             - **Computer Vision Models**: Object detection, action recognition, and captioning
+            - **Meta Llama 3**: Advanced AI for intelligent content analysis
             - **MCP Protocol**: Model Context Protocol for seamless integration
             ### 🎯 Features
             1. Enter a video URL (YouTube or direct link)
             2. Optionally ask a specific question
             3. Click "Analyze Video" to get comprehensive insights
+            4. Review both Llama 3's intelligent analysis and raw data
             ### 🔒 Privacy & Security
             - Video processing is handled securely in the cloud
         analyze_btn.click(
             fn=analyze_video_interface,
             inputs=[video_url_input, user_query_input],
+            outputs=[llama_output, raw_analysis_output],
             show_progress=True
         )
         clear_btn.click(
             fn=clear_all,
+            outputs=[video_url_input, user_query_input, llama_output, raw_analysis_output]
         )
     return interface

requirements.txt CHANGED Viewed

@@ -1,4 +1,3 @@
 gradio>=4.0.0
-anthropic>=0.40.0
 httpx>=0.25.0
 asyncio-compat>=0.1.0

 gradio>=4.0.0
 httpx>=0.25.0
 asyncio-compat>=0.1.0