Ahmud commited on
Commit
9c9b3ff
·
1 Parent(s): 81917a3
Files changed (8) hide show
  1. .env.example +4 -0
  2. .gitignore +1 -0
  3. README.md +65 -13
  4. agent.py +129 -0
  5. app.py +33 -18
  6. prompt.py +21 -0
  7. requirements.txt +14 -1
  8. tools.py +225 -0
.env.example ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ LANGSMITH_API_KEY=""
2
+ LANGSMITH_TRACING=true
3
+ OPENROUTER_API_KEY=""
4
+ BRAVE_SEARCH_API=""
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .env
README.md CHANGED
@@ -1,15 +1,67 @@
1
- ---
2
- title: Template Final Assignment
3
- emoji: 🕵🏻‍♂️
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.25.2
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
- hf_oauth_expiration_minutes: 480
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # Hugging Face AI Agents Course - Final Exam Agent
2
+
3
+ This project contains an AI agent developed for the final exam of the Hugging Face AI Agents Course. The agent is designed to answer a variety of questions by leveraging a suite of powerful tools and a language model.
4
+
5
+ ## Overview
6
+
7
+ This agent is built using the `LangGraph` library to create a robust and stateful agent. It can perform a variety of tasks, including web searches, calculations, code execution, and processing different types of media like audio, images, and documents. The project includes a Gradio application for evaluating the agent's performance on a set of questions provided by the course.
8
+
9
+ ## Features
10
+
11
+ * **Multi-tool Integration**: The agent can use a wide range of tools to solve complex problems.
12
+ * **Conversational AI**: Powered by a capable language model from OpenRouter.
13
+ * **Stateful Execution**: Uses `LangGraph` to manage the conversation flow and tool execution in a structured manner.
14
+ * **Web Interface**: A Gradio app (`app.py`) is provided to test and evaluate the agent.
15
+ * **Extensible**: New tools can be easily added to enhance the agent's capabilities.
16
+
17
+ ## Tools
18
+
19
+ The agent has access to the following tools:
20
+
21
+ ### Community Tools
22
+
23
+ * **Brave Search**: Performs web searches to find up-to-date information.
24
+ * **Python REPL**: Executes Python code to solve logic and math problems.
25
+
26
+ ### Custom Tools
27
+
28
+ * **Calculator**:
29
+ * `add(a, b)`: Adds two numbers.
30
+ * `subtract(a, b)`: Subtracts two numbers.
31
+ * `multiply(a, b)`: Multiplies two numbers.
32
+ * `divide(a, b)`: Divides two numbers.
33
+ * `power(a, b)`: Calculates `a` to the power of `b`.
34
+ * **Date & Time**:
35
+ * `current_date()`: Returns the current date.
36
+ * `day_of_week()`: Returns the current day of the week.
37
+ * `days_until(date_str)`: Calculates the number of days until a given date.
38
+ * **Media Processing**:
39
+ * `transcribe_audio(audio_file, file_extension)`: Transcribes audio files.
40
+ * `transcribe_youtube(youtube_url)`: Transcribes YouTube videos.
41
+ * `query_image(query, image_url)`: Answers questions about an image.
42
+ * **Web & Document Content**:
43
+ * `webpage_content(url)`: Extracts text from webpages and PDF files.
44
+ * `read_excel(file_path, sheet_name, query)`: Reads data from an Excel file and answers a query about it.
45
+
46
+ ## How It Works
47
+
48
+ The agent's logic is defined in `agent.py`. It uses a `StateGraph` from the `LangGraph` library to manage its execution flow. The graph has two main nodes:
49
+
50
+ 1. **`llm_call`**: This node calls the language model with the current conversation history and a system prompt (`prompt.py`). The LLM decides whether to respond directly to the user or to use one of the available tools.
51
+ 2. **`environment`**: If the LLM decides to use a tool, this node executes the tool with the arguments provided by the LLM.
52
+
53
+ The agent alternates between these two nodes until the LLM generates a final answer for the user.
54
+
55
+ ## Usage
56
+
57
+ ### 1. Installation
58
+
59
+ Clone the repository and install the required dependencies:
60
+
61
+ ```bash
62
+ git clone https://huggingface.co/spaces/YOUR_SPACE_HERE
63
+ cd YOUR_REPO
64
+ pip install -r requirements.txt
65
+ ```
66
 
67
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
agent.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ from langchain_openai import ChatOpenAI
4
+ from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage
5
+ from langgraph.graph.message import add_messages
6
+ from langgraph.graph import MessagesState
7
+ from langgraph.graph import StateGraph, START, END
8
+ from typing import TypedDict, Annotated, Literal
9
+
10
+ from langchain_community.tools import BraveSearch # web search
11
+ from langchain_experimental.tools.python.tool import PythonAstREPLTool # for logic/math problems
12
+
13
+ from tools import (calculator_basic, datetime_tools, transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel)
14
+ from prompt import system_prompt
15
+
16
+ from langchain_core.runnables import RunnableConfig # for LangSmith tracking
17
+
18
+ # LangSmith to observe the agent
19
+ langsmith_api_key = os.getenv("LANGSMITH_API_KEY")
20
+ langsmith_tracing = os.getenv("LANGSMITH_TRACING")
21
+
22
+ llm = ChatOpenAI(
23
+ base_url="https://openrouter.ai/api/v1",
24
+ api_key=os.getenv("OPENROUTER_API_KEY"),
25
+ model="qwen/qwen3-coder:free", # Model must support function calling in OpenRouter
26
+ temperature=1
27
+ )
28
+
29
+ python_tool = PythonAstREPLTool()
30
+ search_tool = BraveSearch.from_api_key(
31
+ api_key=os.getenv("BRAVE_SEARCH_API"),
32
+ search_kwargs={"count": 4}, # returns the 4 best results and their URL
33
+ description="Web search using Brave"
34
+ )
35
+
36
+ community_tools = [search_tool, python_tool]
37
+ custom_tools = calculator_basic + datetime_tools + [transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel]
38
+
39
+ tools = community_tools + custom_tools
40
+ llm_with_tools = llm.bind_tools(tools)
41
+
42
+ # Prepare tools by name
43
+ tools_by_name = {tool.name: tool for tool in tools}
44
+
45
+ class MessagesState(TypedDict): # creates the state (is like the agent's memory at any moment)
46
+ messages: Annotated[list[AnyMessage], add_messages]
47
+
48
+ # LLM node
49
+ def llm_call(state: MessagesState):
50
+ return {
51
+ "messages": [
52
+ llm_with_tools.invoke(
53
+ [SystemMessage(content=system_prompt)] + state["messages"]
54
+ )
55
+ ]
56
+ }
57
+
58
+ # Tool node
59
+ def tool_node(state: MessagesState):
60
+ """Executes the tools"""
61
+
62
+ result = []
63
+ for tool_call in state["messages"][-1].tool_calls: # gives a list of the tools the LLM decided to call
64
+ tool = tools_by_name[tool_call["name"]] # look up the actual tool function using a dictionary
65
+ observation = tool.invoke(tool_call["args"]) # executes the tool
66
+ result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"])) # the result from the tool is added to the memory
67
+ return {"messages": result} # thanks to add_messages, LangGraph will automatically append the result to the agent's message history
68
+
69
+ # Conditional edge function to route to the tool node or end based upon whether the LLM made a tool call
70
+ def should_continue(state: MessagesState) -> Literal["Action", END]:
71
+ """Decide if we should continue the loop or stop based upon whether the LLM made a tool call"""
72
+
73
+ last_message = state["messages"][-1] # looks at the last message (usually from the LLM)
74
+
75
+ # If the LLM makes a tool call, then perform an action
76
+ if last_message.tool_calls:
77
+ return "Action"
78
+ # Otherwise, we stop (reply to the user)
79
+ return END
80
+
81
+ # Build workflow
82
+ builder = StateGraph(MessagesState)
83
+
84
+ # Add nodes
85
+ builder.add_node("llm_call", llm_call)
86
+ builder.add_node("environment", tool_node)
87
+
88
+ # Add edges to connect nodes
89
+ builder.add_edge(START, "llm_call")
90
+ builder.add_conditional_edges(
91
+ "llm_call",
92
+ should_continue,
93
+ {"Action": "environment", # name returned by should_continue : Name of the next node
94
+ END: END}
95
+ )
96
+ # If tool calls -> "Action" -> environment (executes the tool)
97
+ # If no tool calls -> END
98
+
99
+ builder.add_edge("environment", "llm_call") # after running the tools go back to the LLM for another round of reasoning
100
+
101
+ gaia_agent = builder.compile() # converts my builder into a runnable agent by using gaia_agent.invoke()
102
+
103
+ # Wrapper class to initialize and call the LangGraph agent with a user question
104
+ class LangGraphAgent:
105
+ def __init__(self):
106
+ print("LangGraphAgent initialized.")
107
+
108
+ def __call__(self, question: str) -> str:
109
+ input_state = {"messages": [HumanMessage(content=question)]} # prepare the initial user message
110
+ print(f"Running LangGraphAgent with input: {question[:150]}...")
111
+
112
+ # tracing configuration for LangSmith
113
+ config = RunnableConfig(
114
+ config={
115
+ "run_name": "GAIA Agent",
116
+ "tags": ["gaia", "langgraph", "agent"],
117
+ "metadata": {"user_input": question},
118
+ "recursion_limit": 30,
119
+ "tracing": True
120
+ }
121
+ )
122
+ result = gaia_agent.invoke(input_state, config) # prevents infinite looping when the LLM keeps calling tools over and over
123
+ final_response = result["messages"][-1].content
124
+
125
+ try:
126
+ return final_response.split("FINAL ANSWER:")[-1].strip() # parse out only what's after "FINAL ANSWER:"
127
+ except Exception:
128
+ print("Could not split on 'FINAL ANSWER:'")
129
+ return final_response
app.py CHANGED
@@ -3,22 +3,12 @@ import gradio as gr
3
  import requests
4
  import inspect
5
  import pandas as pd
 
 
6
 
7
- # (Keep Constants as is)
8
  # --- Constants ---
9
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
10
 
11
- # --- Basic Agent Definition ---
12
- # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
13
- class BasicAgent:
14
- def __init__(self):
15
- print("BasicAgent initialized.")
16
- def __call__(self, question: str) -> str:
17
- print(f"Agent received question (first 50 chars): {question[:50]}...")
18
- fixed_answer = "This is a default answer."
19
- print(f"Agent returning fixed answer: {fixed_answer}")
20
- return fixed_answer
21
-
22
  def run_and_submit_all( profile: gr.OAuthProfile | None):
23
  """
24
  Fetches all questions, runs the BasicAgent on them, submits all answers,
@@ -40,7 +30,7 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
40
 
41
  # 1. Instantiate Agent ( modify this part to create your agent)
42
  try:
43
- agent = BasicAgent()
44
  except Exception as e:
45
  print(f"Error instantiating agent: {e}")
46
  return f"Error initializing agent: {e}", None
@@ -72,20 +62,44 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
72
  # 3. Run your Agent
73
  results_log = []
74
  answers_payload = []
 
75
  print(f"Running agent on {len(questions_data)} questions...")
76
- for item in questions_data:
77
- task_id = item.get("task_id")
78
- question_text = item.get("question")
 
 
 
79
  if not task_id or question_text is None:
80
- print(f"Skipping item with missing task_id or question: {item}")
81
  continue
 
82
  try:
 
 
 
 
 
 
 
 
 
 
 
83
  submitted_answer = agent(question_text)
 
 
84
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
85
  results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
 
86
  except Exception as e:
87
  print(f"Error running agent on task {task_id}: {e}")
88
  results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
 
 
 
 
 
89
 
90
  if not answers_payload:
91
  print("Agent did not produce any answers to submit.")
@@ -193,4 +207,5 @@ if __name__ == "__main__":
193
  print("-"*(60 + len(" App Starting ")) + "\n")
194
 
195
  print("Launching Gradio Interface for Basic Agent Evaluation...")
196
- demo.launch(debug=True, share=False)
 
 
3
  import requests
4
  import inspect
5
  import pandas as pd
6
+ from time import sleep
7
+ from agent import LangGraphAgent
8
 
 
9
  # --- Constants ---
10
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
11
 
 
 
 
 
 
 
 
 
 
 
 
12
  def run_and_submit_all( profile: gr.OAuthProfile | None):
13
  """
14
  Fetches all questions, runs the BasicAgent on them, submits all answers,
 
30
 
31
  # 1. Instantiate Agent ( modify this part to create your agent)
32
  try:
33
+ agent = LangGraphAgent()
34
  except Exception as e:
35
  print(f"Error instantiating agent: {e}")
36
  return f"Error initializing agent: {e}", None
 
62
  # 3. Run your Agent
63
  results_log = []
64
  answers_payload = []
65
+
66
  print(f"Running agent on {len(questions_data)} questions...")
67
+
68
+ for question in questions_data:
69
+ task_id = question.get("task_id")
70
+ question_text = question.get("question")
71
+ file_name = question.get("file_name")
72
+
73
  if not task_id or question_text is None:
74
+ print(f"Skipping question with missing task_id or question: {question}")
75
  continue
76
+
77
  try:
78
+ # append file URL and extension (if available) to the question to help the agent
79
+ if file_name:
80
+ file_url = f"{DEFAULT_API_URL}/files/{task_id}"
81
+ question_text += f'\nFile URL: "{file_url}"'
82
+ try:
83
+ extension = file_name.split('.')[-1]
84
+ question_text += f" (.{extension} file)"
85
+ except Exception as e:
86
+ print(f"Warning: couldn't extract extension from {file_name}: {e}")
87
+
88
+ # call the agent
89
  submitted_answer = agent(question_text)
90
+
91
+ # store result
92
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
93
  results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
94
+
95
  except Exception as e:
96
  print(f"Error running agent on task {task_id}: {e}")
97
  results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
98
+
99
+ finally:
100
+ # wait 10 seconds between calls to avoid API rate limit
101
+ print('\n\n-> Waiting 10 seconds to avoid API rate limit')
102
+ sleep(10)
103
 
104
  if not answers_payload:
105
  print("Agent did not produce any answers to submit.")
 
207
  print("-"*(60 + len(" App Starting ")) + "\n")
208
 
209
  print("Launching Gradio Interface for Basic Agent Evaluation...")
210
+ demo.launch(debug=True, share=False)
211
+
prompt.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ system_prompt = """\
2
+ You are an AI assistant.
3
+
4
+ When presented with a question, always:
5
+ - Briefly state your reasoning in natural language.
6
+ - Conclude your response with this explicit format:
7
+ FINAL ANSWER: [YOUR FINAL ANSWER]
8
+
9
+ Formatting rules for YOUR FINAL ANSWER:
10
+ - If a number is expected:
11
+ - Write the number without commas or spaces.
12
+ - Do not use units or symbols (like $ or %) unless specifically requested.
13
+ - If a string is expected:
14
+ - Omit articles (the, a, an).
15
+ - Do not use abbreviations (write full names, e.g. "Paris" not "Par.").
16
+ - Write out all digits as numerals.
17
+ - If a comma-separated list is required:
18
+ - Apply the corresponding rules for each element (number or string) as above.
19
+
20
+ Be precise, succinct, and strictly follow these output rules.
21
+ """
requirements.txt CHANGED
@@ -1,2 +1,15 @@
1
  gradio
2
- requests
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  gradio
2
+ requests
3
+ openai
4
+ pytube
5
+ openpyxl
6
+ pypdf2
7
+ beautifulsoup4
8
+ youtube-transcript-api
9
+ langsmith
10
+ langgraph
11
+ langchain
12
+ langchain-core
13
+ langchain-openai
14
+ langchain-community
15
+ langchain-experimental
tools.py ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain_core.tools import tool
2
+ import datetime
3
+ import requests
4
+ import openai
5
+ import os
6
+ import tempfile
7
+ import pandas as pd
8
+ from urllib.parse import urlparse, parse_qs
9
+ from openai import OpenAI
10
+ from youtube_transcript_api import YouTubeTranscriptApi
11
+ from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound, VideoUnavailable
12
+ from pytube import extract
13
+ from openai import OpenAI
14
+ from bs4 import BeautifulSoup
15
+ from io import BytesIO
16
+ from PyPDF2 import PdfReader
17
+
18
+ @tool
19
+ def add(a: float, b: float) -> float:
20
+ """ Adds two numbers.
21
+ Args:
22
+ a (float): first number
23
+ b (float): second number
24
+ """
25
+ return a + b
26
+
27
+ @tool
28
+ def subtract(a: float, b: float) -> int:
29
+ """ Subtracts two numbers.
30
+ Args:
31
+ a (float): first number
32
+ b (float): second number
33
+ """
34
+ return a - b
35
+
36
+ @tool
37
+ def multiply(a: float, b: float) -> float:
38
+ """ Multiplies two numbers.
39
+ Args:
40
+ a (float): first number
41
+ b (float): second number
42
+ """
43
+ return a * b
44
+
45
+ @tool
46
+ def divide(a: float, b: float) -> float:
47
+ """ Divides two numbers.
48
+ Args:
49
+ a (float): first number
50
+ b (float): second number
51
+ """
52
+ if b == 0:
53
+ raise ValueError("Cannot divide by zero.")
54
+ return a / b
55
+
56
+ @tool
57
+ def power(a: float, b: float) -> float:
58
+ """ Calculates the power of two numbers.
59
+ Args:
60
+ a (float): first number
61
+ b (float): second number
62
+ """
63
+ return a**b
64
+
65
+ calculator_basic = [add, subtract, multiply, divide, power]
66
+
67
+
68
+ @tool
69
+ def current_date(_) -> str :
70
+ """ Returns the current date in YYYY-MM-DD format """
71
+ return datetime.datetime.now().strftime("%Y-%m-%d")
72
+
73
+ @tool
74
+ def day_of_week(_) -> str :
75
+ """ Returns the current day of the week (e.g., Monday, Tuesday) """
76
+ return datetime.datetime.now().strftime("%A")
77
+
78
+ @tool
79
+ def days_until(date_str: str) -> str :
80
+ """ Returns the number of days from today until a given date (input format: YYYY-MM-DD) """
81
+ try:
82
+ future_date = datetime.datetime.strptime(date_str, "%Y-%m-%d").date()
83
+ today = datetime.date.today()
84
+
85
+ delta_days = (future_date - today).days
86
+ return f"{delta_days} days until {date_str}"
87
+ except Exception as e:
88
+ return f"Error parsing date: {str(e)}"
89
+
90
+ datetime_tools = [current_date, day_of_week, days_until]
91
+
92
+
93
+ @tool
94
+ def transcribe_audio(audio_file: str, file_extension: str) -> str:
95
+ """ Transcribes an audio file to text
96
+ Args:
97
+ audio_file (str): local file path to the audio file (.mp3, .m4a, etc.)
98
+ file_extension (str): file extension of the audio, e.g. mp3
99
+ Returns:
100
+ str: The transcribed text from the audio.
101
+ """
102
+ try:
103
+ response = requests.get(audio_file) # download the audio_file
104
+ response.raise_for_status() # check if the http request was successful
105
+
106
+ # clean file extension and save to disk
107
+ file_extension = file_extension.replace('.','')
108
+ filename = f'tmp.{file_extension}'
109
+ with open(filename, 'wb') as file: # opens a new file for writing with a name like, e.g. tmp.mp3
110
+ file.write(response.content) # write(w) the binary(b) contents (audio file) to disk
111
+
112
+ # transcribe audio with OpenAI Whisper
113
+ client = OpenAI()
114
+
115
+ # read(r) the audio file from disk in binary(b) mode "rb"; the "with" block ensures the file is automatically closed afterward
116
+ with open(filename, "rb") as audio_content:
117
+ transcription = client.audio.transcriptions.create(
118
+ model="whisper-1",
119
+ file=audio_content
120
+ )
121
+ return transcription.text
122
+
123
+ except Exception as e:
124
+ return f"transcribe_audio failed: {e}"
125
+
126
+ @tool
127
+ def transcribe_youtube(youtube_url: str) -> str:
128
+ """ Transcribes a YouTube video
129
+ Args:
130
+ youtube_url (str): youtube video's url
131
+ Returns:
132
+ str: The transcribed text from the video.
133
+ """
134
+ try:
135
+ query = urlparse(youtube_url).query
136
+ video_id = parse_qs(query)['v'][0]
137
+ except Exception:
138
+ return "invalid YouTube URL"
139
+
140
+ try:
141
+ transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
142
+ transcript = transcript_list.find_transcript(['en']).fetch()
143
+ # keep only text
144
+ text = '\n'.join([t['text'] for t in transcript])
145
+ return text
146
+
147
+ except (TranscriptsDisabled, NoTranscriptFound, VideoUnavailable) as e:
148
+ return f"transcript unavailable: {str(e)}"
149
+
150
+ except Exception as e:
151
+ return f"transcribe_youtube failed: {e}"
152
+
153
+ @tool
154
+ def query_image(query: str, image_url: str) -> str:
155
+ """ Ask anything about an image using a Vision Language Model
156
+ Args:
157
+ query (str): the query about the image, e.g. how many animals are on the image?
158
+ image_url (str): the image's URL
159
+ """
160
+ try:
161
+ client = OpenAI()
162
+ response = client.responses.create(
163
+ model="gpt-4o-mini",
164
+ input=[
165
+ {
166
+ "role": "user",
167
+ "content": [
168
+ {"type": "input_text", "text": query},
169
+ {"type": "input_image","image_url": image_url},
170
+ ],
171
+ }
172
+ ],
173
+ )
174
+ return response.output_text
175
+
176
+ except Exception as e:
177
+ return f"query_image failed: {e}"
178
+
179
+ @tool
180
+ def webpage_content(url: str) -> str:
181
+ """ Fetch text from a webpage or PDF file.
182
+ Args:
183
+ url (str): The URL of the webpage to fetch.
184
+ Returns:
185
+ str: Extracted text.
186
+ """
187
+ try:
188
+ response = requests.get(url)
189
+ response.raise_for_status()
190
+
191
+ content_type = response.headers.get("Content-Type", "")
192
+
193
+ # PDF file
194
+ if "pdf" in content_type:
195
+ pdf_content = BytesIO(response.content)
196
+ reader = PdfReader(pdf_content)
197
+ return "\n".join(page.extract_text() or "" for page in reader.pages)
198
+
199
+ # HTML file
200
+ soup = BeautifulSoup(response.text, "html.parser")
201
+ body = soup.body
202
+ return body.get_text(separator="\n", strip=True) if body else soup.get_text(strip=True)
203
+
204
+ except Exception as e:
205
+ return f"webpage_content failed: {e}"
206
+
207
+ @tool
208
+ def read_excel(file_url: str) -> str:
209
+ """ Reads an Excel file from a URL and returns the content as CSV text.
210
+ Args:
211
+ file_url (str): URL to the Excel file (.xlsx, .xls)
212
+ Returns:
213
+ str: Content of the Excel file as CSV text.
214
+ """
215
+ try:
216
+ response = requests.get(file_url)
217
+ response.raise_for_status()
218
+
219
+ excel_content = BytesIO(response.content)
220
+ df = pd.read_excel(excel_content)
221
+
222
+ return df.to_csv(index=False) # convert dataframe to CSV string for easy processing
223
+
224
+ except Exception as e:
225
+ return f"read_excel failed: {str(e)}"