Final_Assignment_Template

Sleeping

App Files Files Community

Ahmud commited on 9 days ago

Commit

9c9b3ff

1 Parent(s): 81917a3

agents

Browse files

Files changed (8) hide show

.env.example +4 -0
.gitignore +1 -0
README.md +65 -13
agent.py +129 -0
app.py +33 -18
prompt.py +21 -0
requirements.txt +14 -1
tools.py +225 -0

.env.example ADDED Viewed

	@@ -0,0 +1,4 @@

+LANGSMITH_API_KEY=""
+LANGSMITH_TRACING=true
+OPENROUTER_API_KEY=""
+BRAVE_SEARCH_API=""

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .env

README.md CHANGED Viewed

@@ -1,15 +1,67 @@
----
-title: Template Final Assignment
-emoji: 🕵🏻‍♂️
-colorFrom: indigo
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.25.2
-app_file: app.py
-pinned: false
-hf_oauth: true
-# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
-hf_oauth_expiration_minutes: 480
----
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Hugging Face AI Agents Course - Final Exam Agent
+This project contains an AI agent developed for the final exam of the Hugging Face AI Agents Course. The agent is designed to answer a variety of questions by leveraging a suite of powerful tools and a language model.
+##  Overview
+This agent is built using the `LangGraph` library to create a robust and stateful agent. It can perform a variety of tasks, including web searches, calculations, code execution, and processing different types of media like audio, images, and documents. The project includes a Gradio application for evaluating the agent's performance on a set of questions provided by the course.
+##  Features
+*   **Multi-tool Integration**: The agent can use a wide range of tools to solve complex problems.
+*   **Conversational AI**: Powered by a capable language model from OpenRouter.
+*   **Stateful Execution**: Uses `LangGraph` to manage the conversation flow and tool execution in a structured manner.
+*   **Web Interface**: A Gradio app (`app.py`) is provided to test and evaluate the agent.
+*   **Extensible**: New tools can be easily added to enhance the agent's capabilities.
+##  Tools
+The agent has access to the following tools:
+### Community Tools
+*   **Brave Search**: Performs web searches to find up-to-date information.
+*   **Python REPL**: Executes Python code to solve logic and math problems.
+### Custom Tools
+*   **Calculator**:
+    *   `add(a, b)`: Adds two numbers.
+    *   `subtract(a, b)`: Subtracts two numbers.
+    *   `multiply(a, b)`: Multiplies two numbers.
+    *   `divide(a, b)`: Divides two numbers.
+    *   `power(a, b)`: Calculates `a` to the power of `b`.
+*   **Date & Time**:
+    *   `current_date()`: Returns the current date.
+    *   `day_of_week()`: Returns the current day of the week.
+    *   `days_until(date_str)`: Calculates the number of days until a given date.
+*   **Media Processing**:
+    *   `transcribe_audio(audio_file, file_extension)`: Transcribes audio files.
+    *   `transcribe_youtube(youtube_url)`: Transcribes YouTube videos.
+    *   `query_image(query, image_url)`: Answers questions about an image.
+*   **Web & Document Content**:
+    *   `webpage_content(url)`: Extracts text from webpages and PDF files.
+    *   `read_excel(file_path, sheet_name, query)`: Reads data from an Excel file and answers a query about it.
+##  How It Works
+The agent's logic is defined in `agent.py`. It uses a `StateGraph` from the `LangGraph` library to manage its execution flow. The graph has two main nodes:
+1.  **`llm_call`**: This node calls the language model with the current conversation history and a system prompt (`prompt.py`). The LLM decides whether to respond directly to the user or to use one of the available tools.
+2.  **`environment`**: If the LLM decides to use a tool, this node executes the tool with the arguments provided by the LLM.
+The agent alternates between these two nodes until the LLM generates a final answer for the user.
+## Usage
+### 1. Installation
+Clone the repository and install the required dependencies:
+```bash
+git clone https://huggingface.co/spaces/YOUR_SPACE_HERE
+cd YOUR_REPO
+pip install -r requirements.txt
+```
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

agent.py ADDED Viewed

	@@ -0,0 +1,129 @@

+import os
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage
+from langgraph.graph.message import add_messages
+from langgraph.graph import MessagesState
+from langgraph.graph import StateGraph, START, END
+from typing import TypedDict, Annotated, Literal
+from langchain_community.tools import BraveSearch  # web search
+from langchain_experimental.tools.python.tool import PythonAstREPLTool  # for logic/math problems
+from tools import (calculator_basic, datetime_tools, transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel)
+from prompt import system_prompt
+from langchain_core.runnables import RunnableConfig  # for LangSmith tracking
+# LangSmith to observe the agent
+langsmith_api_key = os.getenv("LANGSMITH_API_KEY")
+langsmith_tracing = os.getenv("LANGSMITH_TRACING")
+llm = ChatOpenAI(
+    base_url="https://openrouter.ai/api/v1",
+    api_key=os.getenv("OPENROUTER_API_KEY"),
+    model="qwen/qwen3-coder:free", # Model must support function calling in OpenRouter
+    temperature=1
+)
+python_tool = PythonAstREPLTool()
+search_tool = BraveSearch.from_api_key(
+                    api_key=os.getenv("BRAVE_SEARCH_API"),
+                    search_kwargs={"count": 4},  # returns the 4 best results and their URL
+                    description="Web search using Brave"
+)
+community_tools = [search_tool, python_tool]
+custom_tools = calculator_basic + datetime_tools + [transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel]
+tools = community_tools + custom_tools
+llm_with_tools = llm.bind_tools(tools)
+# Prepare tools by name
+tools_by_name = {tool.name: tool for tool in tools}
+class MessagesState(TypedDict):  # creates the state (is like the agent's memory at any moment)
+    messages: Annotated[list[AnyMessage], add_messages]
+# LLM node
+def llm_call(state: MessagesState):
+    return {
+        "messages": [
+            llm_with_tools.invoke(
+                [SystemMessage(content=system_prompt)] + state["messages"]
+            )
+        ]
+    }
+# Tool node
+def tool_node(state: MessagesState):
+    """Executes the tools"""
+    result = []
+    for tool_call in state["messages"][-1].tool_calls:   # gives a list of the tools the LLM decided to call
+        tool = tools_by_name[tool_call["name"]]   # look up the actual tool function using a dictionary
+        observation = tool.invoke(tool_call["args"])   # executes the tool
+        result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"]))   # the result from the tool is added to the memory
+    return {"messages": result}   # thanks to add_messages, LangGraph will automatically append the result to the agent's message history
+# Conditional edge function to route to the tool node or end based upon whether the LLM made a tool call
+def should_continue(state: MessagesState) -> Literal["Action", END]:
+    """Decide if we should continue the loop or stop based upon whether the LLM made a tool call"""
+    last_message = state["messages"][-1]  # looks at the last message (usually from the LLM)
+    # If the LLM makes a tool call, then perform an action
+    if last_message.tool_calls:
+        return "Action"
+    # Otherwise, we stop (reply to the user)
+    return END
+# Build workflow
+builder = StateGraph(MessagesState)
+# Add nodes
+builder.add_node("llm_call", llm_call)
+builder.add_node("environment", tool_node)
+# Add edges to connect nodes
+builder.add_edge(START, "llm_call")
+builder.add_conditional_edges(
+    "llm_call",
+    should_continue,
+    {"Action": "environment",  # name returned by should_continue : Name of the next node
+     END: END}
+)
+    # If tool calls -> "Action" -> environment (executes the tool)
+    # If no tool calls -> END
+builder.add_edge("environment", "llm_call")  # after running the tools go back to the LLM for another round of reasoning
+gaia_agent = builder.compile()  # converts my builder into a runnable agent by using gaia_agent.invoke()
+# Wrapper class to initialize and call the LangGraph agent with a user question
+class LangGraphAgent:
+    def __init__(self):
+        print("LangGraphAgent initialized.")
+    def __call__(self, question: str) -> str:
+        input_state = {"messages": [HumanMessage(content=question)]}  # prepare the initial user message
+        print(f"Running LangGraphAgent with input: {question[:150]}...")
+        # tracing configuration for LangSmith
+        config = RunnableConfig(
+            config={
+                "run_name": "GAIA Agent",
+                "tags": ["gaia", "langgraph", "agent"],
+                "metadata": {"user_input": question},
+                "recursion_limit": 30,
+                "tracing": True
+            }
+        )
+        result = gaia_agent.invoke(input_state, config)  # prevents infinite looping when the LLM keeps calling tools over and over
+        final_response = result["messages"][-1].content
+        try:
+            return final_response.split("FINAL ANSWER:")[-1].strip()  #  parse out only what's after "FINAL ANSWER:"
+        except Exception:
+            print("Could not split on 'FINAL ANSWER:'")
+            return final_response

app.py CHANGED Viewed

@@ -3,22 +3,12 @@ import gradio as gr
 import requests
 import inspect
 import pandas as pd
-# (Keep Constants as is)
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
-# --- Basic Agent Definition ---
-# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
-class BasicAgent:
-    def __init__(self):
-        print("BasicAgent initialized.")
-    def __call__(self, question: str) -> str:
-        print(f"Agent received question (first 50 chars): {question[:50]}...")
-        fixed_answer = "This is a default answer."
-        print(f"Agent returning fixed answer: {fixed_answer}")
-        return fixed_answer
 def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
     Fetches all questions, runs the BasicAgent on them, submits all answers,
@@ -40,7 +30,7 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
     # 1. Instantiate Agent ( modify this part to create your agent)
     try:
-        agent = BasicAgent()
     except Exception as e:
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
@@ -72,20 +62,44 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
     # 3. Run your Agent
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
-    for item in questions_data:
-        task_id = item.get("task_id")
-        question_text = item.get("question")
         if not task_id or question_text is None:
-            print(f"Skipping item with missing task_id or question: {item}")
             continue
         try:
             submitted_answer = agent(question_text)
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:
              print(f"Error running agent on task {task_id}: {e}")
              results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
     if not answers_payload:
         print("Agent did not produce any answers to submit.")
@@ -193,4 +207,5 @@ if __name__ == "__main__":
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
-    demo.launch(debug=True, share=False)

 import requests
 import inspect
 import pandas as pd
+from time import sleep
+from agent import LangGraphAgent
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
     Fetches all questions, runs the BasicAgent on them, submits all answers,
     # 1. Instantiate Agent ( modify this part to create your agent)
     try:
+        agent = LangGraphAgent()
     except Exception as e:
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
     # 3. Run your Agent
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
+    for question in questions_data:
+        task_id = question.get("task_id")
+        question_text = question.get("question")
+        file_name = question.get("file_name")
         if not task_id or question_text is None:
+            print(f"Skipping question with missing task_id or question: {question}")
             continue
         try:
+            # append file URL and extension (if available) to the question to help the agent
+            if file_name:
+                file_url = f"{DEFAULT_API_URL}/files/{task_id}"
+                question_text += f'\nFile URL: "{file_url}"'
+                try:
+                    extension = file_name.split('.')[-1]
+                    question_text += f" (.{extension} file)"
+                except Exception as e:
+                    print(f"Warning: couldn't extract extension from {file_name}: {e}")
+            # call the agent
             submitted_answer = agent(question_text)
+            # store result
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:
              print(f"Error running agent on task {task_id}: {e}")
              results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
+        finally:
+            # wait 10 seconds between calls to avoid API rate limit
+            print('\n\n-> Waiting 10 seconds to avoid API rate limit')
+            sleep(10)
     if not answers_payload:
         print("Agent did not produce any answers to submit.")
     print("-"*(60 + len(" App Starting ")) + "\n")
     print("Launching Gradio Interface for Basic Agent Evaluation...")
+    demo.launch(debug=True, share=False)

prompt.py ADDED Viewed

	@@ -0,0 +1,21 @@

+system_prompt = """\
+You are an AI assistant.
+When presented with a question, always:
+- Briefly state your reasoning in natural language.
+- Conclude your response with this explicit format:
+  FINAL ANSWER: [YOUR FINAL ANSWER]
+Formatting rules for YOUR FINAL ANSWER:
+- If a number is expected:
+  - Write the number without commas or spaces.
+  - Do not use units or symbols (like $ or %) unless specifically requested.
+- If a string is expected:
+  - Omit articles (the, a, an).
+  - Do not use abbreviations (write full names, e.g. "Paris" not "Par.").
+  - Write out all digits as numerals.
+- If a comma-separated list is required:
+  - Apply the corresponding rules for each element (number or string) as above.
+Be precise, succinct, and strictly follow these output rules.
+"""

requirements.txt CHANGED Viewed

@@ -1,2 +1,15 @@
 gradio
-requests

 gradio
+requests
+openai
+pytube
+openpyxl
+pypdf2
+beautifulsoup4
+youtube-transcript-api
+langsmith
+langgraph
+langchain
+langchain-core
+langchain-openai
+langchain-community
+langchain-experimental

tools.py ADDED Viewed

	@@ -0,0 +1,225 @@

+from langchain_core.tools import tool
+import datetime
+import requests
+import openai
+import os
+import tempfile
+import pandas as pd
+from urllib.parse import urlparse, parse_qs
+from openai import OpenAI
+from youtube_transcript_api import YouTubeTranscriptApi
+from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound, VideoUnavailable
+from pytube import extract
+from openai import OpenAI
+from bs4 import BeautifulSoup
+from io import BytesIO
+from PyPDF2 import PdfReader
+@tool
+def add(a: float, b: float) -> float:
+    """ Adds two numbers.
+    Args:
+        a (float): first number
+        b (float): second number
+    """
+    return a + b
+@tool
+def subtract(a: float, b: float) -> int:
+    """ Subtracts two numbers.
+    Args:
+        a (float): first number
+        b (float): second number
+    """
+    return a - b
+@tool
+def multiply(a: float, b: float) -> float:
+    """ Multiplies two numbers.
+    Args:
+        a (float): first number
+        b (float): second number
+    """
+    return a * b
+@tool
+def divide(a: float, b: float) -> float:
+    """ Divides two numbers.
+    Args:
+        a (float): first number
+        b (float): second number
+    """
+    if b == 0:
+        raise ValueError("Cannot divide by zero.")
+    return a / b
+@tool
+def power(a: float, b: float) -> float:
+    """ Calculates the power of two numbers.
+    Args:
+        a (float): first number
+        b (float): second number
+    """
+    return a**b
+calculator_basic = [add, subtract, multiply, divide, power]
+@tool
+def current_date(_) -> str :
+    """ Returns the current date in YYYY-MM-DD format """
+    return datetime.datetime.now().strftime("%Y-%m-%d")
+@tool
+def day_of_week(_) -> str :
+    """ Returns the current day of the week (e.g., Monday, Tuesday) """
+    return datetime.datetime.now().strftime("%A")
+@tool
+def days_until(date_str: str) -> str :
+    """ Returns the number of days from today until a given date (input format: YYYY-MM-DD) """
+    try:
+        future_date = datetime.datetime.strptime(date_str, "%Y-%m-%d").date()
+        today = datetime.date.today()
+        delta_days = (future_date - today).days
+        return f"{delta_days} days until {date_str}"
+    except Exception as e:
+        return f"Error parsing date: {str(e)}"
+datetime_tools = [current_date, day_of_week, days_until]
+@tool
+def transcribe_audio(audio_file: str, file_extension: str) -> str:
+    """ Transcribes an audio file to text
+    Args:
+        audio_file (str): local file path to the audio file (.mp3, .m4a, etc.)
+        file_extension (str): file extension of the audio, e.g. mp3
+    Returns:
+        str: The transcribed text from the audio.
+    """
+    try:
+        response = requests.get(audio_file)  # download the audio_file
+        response.raise_for_status()  # check if the http request was successful
+        # clean file extension and save to disk
+        file_extension = file_extension.replace('.','')
+        filename = f'tmp.{file_extension}'
+        with open(filename, 'wb') as file:  # opens a new file for writing with a name like, e.g. tmp.mp3
+            file.write(response.content)    # write(w) the binary(b) contents (audio file) to disk
+        # transcribe audio with OpenAI Whisper
+        client = OpenAI()
+        # read(r) the audio file from disk in binary(b) mode "rb"; the "with" block ensures the file is automatically closed afterward
+        with open(filename, "rb") as audio_content:
+            transcription = client.audio.transcriptions.create(
+                model="whisper-1",
+                file=audio_content
+            )
+        return transcription.text
+    except Exception as e:
+        return f"transcribe_audio failed: {e}"
+@tool
+def transcribe_youtube(youtube_url: str) -> str:
+    """ Transcribes a YouTube video
+    Args:
+        youtube_url (str): youtube video's url
+    Returns:
+        str: The transcribed text from the video.
+    """
+    try:
+        query = urlparse(youtube_url).query
+        video_id = parse_qs(query)['v'][0]
+    except Exception:
+        return "invalid YouTube URL"
+    try:
+        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
+        transcript = transcript_list.find_transcript(['en']).fetch()
+        # keep only text
+        text = '\n'.join([t['text'] for t in transcript])
+        return text
+    except (TranscriptsDisabled, NoTranscriptFound, VideoUnavailable) as e:
+        return f"transcript unavailable: {str(e)}"
+    except Exception as e:
+        return f"transcribe_youtube failed: {e}"
+@tool
+def query_image(query: str, image_url: str) -> str:
+    """ Ask anything about an image using a Vision Language Model
+    Args:
+        query (str): the query about the image, e.g. how many animals are on the image?
+        image_url (str): the image's URL
+    """
+    try:
+        client = OpenAI()
+        response = client.responses.create(
+            model="gpt-4o-mini",
+            input=[
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "input_text", "text": query},
+                        {"type": "input_image","image_url": image_url},
+                    ],
+                }
+            ],
+        )
+        return response.output_text
+    except Exception as e:
+        return f"query_image failed: {e}"
+@tool
+def webpage_content(url: str) -> str:
+    """ Fetch text from a webpage or PDF file.
+    Args:
+        url (str): The URL of the webpage to fetch.
+    Returns:
+        str: Extracted text.
+    """
+    try:
+        response = requests.get(url)
+        response.raise_for_status()
+        content_type = response.headers.get("Content-Type", "")
+        # PDF file
+        if "pdf" in content_type:
+            pdf_content = BytesIO(response.content)
+            reader = PdfReader(pdf_content)
+            return "\n".join(page.extract_text() or "" for page in reader.pages)
+        # HTML file
+        soup = BeautifulSoup(response.text, "html.parser")
+        body = soup.body
+        return body.get_text(separator="\n", strip=True) if body else soup.get_text(strip=True)
+    except Exception as e:
+        return f"webpage_content failed: {e}"
+@tool
+def read_excel(file_url: str) -> str:
+    """ Reads an Excel file from a URL and returns the content as CSV text.
+    Args:
+        file_url (str): URL to the Excel file (.xlsx, .xls)
+    Returns:
+        str: Content of the Excel file as CSV text.
+    """
+    try:
+        response = requests.get(file_url)
+        response.raise_for_status()
+        excel_content = BytesIO(response.content)
+        df = pd.read_excel(excel_content)
+        return df.to_csv(index=False)  # convert dataframe to CSV string for easy processing
+    except Exception as e:
+        return f"read_excel failed: {str(e)}"