Spaces:
Sleeping
Sleeping
agents
Browse files- .env.example +4 -0
- .gitignore +1 -0
- README.md +65 -13
- agent.py +129 -0
- app.py +33 -18
- prompt.py +21 -0
- requirements.txt +14 -1
- tools.py +225 -0
.env.example
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
LANGSMITH_API_KEY=""
|
2 |
+
LANGSMITH_TRACING=true
|
3 |
+
OPENROUTER_API_KEY=""
|
4 |
+
BRAVE_SEARCH_API=""
|
.gitignore
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
.env
|
README.md
CHANGED
@@ -1,15 +1,67 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
1 |
+
# Hugging Face AI Agents Course - Final Exam Agent
|
2 |
+
|
3 |
+
This project contains an AI agent developed for the final exam of the Hugging Face AI Agents Course. The agent is designed to answer a variety of questions by leveraging a suite of powerful tools and a language model.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
|
7 |
+
This agent is built using the `LangGraph` library to create a robust and stateful agent. It can perform a variety of tasks, including web searches, calculations, code execution, and processing different types of media like audio, images, and documents. The project includes a Gradio application for evaluating the agent's performance on a set of questions provided by the course.
|
8 |
+
|
9 |
+
## Features
|
10 |
+
|
11 |
+
* **Multi-tool Integration**: The agent can use a wide range of tools to solve complex problems.
|
12 |
+
* **Conversational AI**: Powered by a capable language model from OpenRouter.
|
13 |
+
* **Stateful Execution**: Uses `LangGraph` to manage the conversation flow and tool execution in a structured manner.
|
14 |
+
* **Web Interface**: A Gradio app (`app.py`) is provided to test and evaluate the agent.
|
15 |
+
* **Extensible**: New tools can be easily added to enhance the agent's capabilities.
|
16 |
+
|
17 |
+
## Tools
|
18 |
+
|
19 |
+
The agent has access to the following tools:
|
20 |
+
|
21 |
+
### Community Tools
|
22 |
+
|
23 |
+
* **Brave Search**: Performs web searches to find up-to-date information.
|
24 |
+
* **Python REPL**: Executes Python code to solve logic and math problems.
|
25 |
+
|
26 |
+
### Custom Tools
|
27 |
+
|
28 |
+
* **Calculator**:
|
29 |
+
* `add(a, b)`: Adds two numbers.
|
30 |
+
* `subtract(a, b)`: Subtracts two numbers.
|
31 |
+
* `multiply(a, b)`: Multiplies two numbers.
|
32 |
+
* `divide(a, b)`: Divides two numbers.
|
33 |
+
* `power(a, b)`: Calculates `a` to the power of `b`.
|
34 |
+
* **Date & Time**:
|
35 |
+
* `current_date()`: Returns the current date.
|
36 |
+
* `day_of_week()`: Returns the current day of the week.
|
37 |
+
* `days_until(date_str)`: Calculates the number of days until a given date.
|
38 |
+
* **Media Processing**:
|
39 |
+
* `transcribe_audio(audio_file, file_extension)`: Transcribes audio files.
|
40 |
+
* `transcribe_youtube(youtube_url)`: Transcribes YouTube videos.
|
41 |
+
* `query_image(query, image_url)`: Answers questions about an image.
|
42 |
+
* **Web & Document Content**:
|
43 |
+
* `webpage_content(url)`: Extracts text from webpages and PDF files.
|
44 |
+
* `read_excel(file_path, sheet_name, query)`: Reads data from an Excel file and answers a query about it.
|
45 |
+
|
46 |
+
## How It Works
|
47 |
+
|
48 |
+
The agent's logic is defined in `agent.py`. It uses a `StateGraph` from the `LangGraph` library to manage its execution flow. The graph has two main nodes:
|
49 |
+
|
50 |
+
1. **`llm_call`**: This node calls the language model with the current conversation history and a system prompt (`prompt.py`). The LLM decides whether to respond directly to the user or to use one of the available tools.
|
51 |
+
2. **`environment`**: If the LLM decides to use a tool, this node executes the tool with the arguments provided by the LLM.
|
52 |
+
|
53 |
+
The agent alternates between these two nodes until the LLM generates a final answer for the user.
|
54 |
+
|
55 |
+
## Usage
|
56 |
+
|
57 |
+
### 1. Installation
|
58 |
+
|
59 |
+
Clone the repository and install the required dependencies:
|
60 |
+
|
61 |
+
```bash
|
62 |
+
git clone https://huggingface.co/spaces/YOUR_SPACE_HERE
|
63 |
+
cd YOUR_REPO
|
64 |
+
pip install -r requirements.txt
|
65 |
+
```
|
66 |
|
67 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
agent.py
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
|
3 |
+
from langchain_openai import ChatOpenAI
|
4 |
+
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage
|
5 |
+
from langgraph.graph.message import add_messages
|
6 |
+
from langgraph.graph import MessagesState
|
7 |
+
from langgraph.graph import StateGraph, START, END
|
8 |
+
from typing import TypedDict, Annotated, Literal
|
9 |
+
|
10 |
+
from langchain_community.tools import BraveSearch # web search
|
11 |
+
from langchain_experimental.tools.python.tool import PythonAstREPLTool # for logic/math problems
|
12 |
+
|
13 |
+
from tools import (calculator_basic, datetime_tools, transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel)
|
14 |
+
from prompt import system_prompt
|
15 |
+
|
16 |
+
from langchain_core.runnables import RunnableConfig # for LangSmith tracking
|
17 |
+
|
18 |
+
# LangSmith to observe the agent
|
19 |
+
langsmith_api_key = os.getenv("LANGSMITH_API_KEY")
|
20 |
+
langsmith_tracing = os.getenv("LANGSMITH_TRACING")
|
21 |
+
|
22 |
+
llm = ChatOpenAI(
|
23 |
+
base_url="https://openrouter.ai/api/v1",
|
24 |
+
api_key=os.getenv("OPENROUTER_API_KEY"),
|
25 |
+
model="qwen/qwen3-coder:free", # Model must support function calling in OpenRouter
|
26 |
+
temperature=1
|
27 |
+
)
|
28 |
+
|
29 |
+
python_tool = PythonAstREPLTool()
|
30 |
+
search_tool = BraveSearch.from_api_key(
|
31 |
+
api_key=os.getenv("BRAVE_SEARCH_API"),
|
32 |
+
search_kwargs={"count": 4}, # returns the 4 best results and their URL
|
33 |
+
description="Web search using Brave"
|
34 |
+
)
|
35 |
+
|
36 |
+
community_tools = [search_tool, python_tool]
|
37 |
+
custom_tools = calculator_basic + datetime_tools + [transcribe_audio, transcribe_youtube, query_image, webpage_content, read_excel]
|
38 |
+
|
39 |
+
tools = community_tools + custom_tools
|
40 |
+
llm_with_tools = llm.bind_tools(tools)
|
41 |
+
|
42 |
+
# Prepare tools by name
|
43 |
+
tools_by_name = {tool.name: tool for tool in tools}
|
44 |
+
|
45 |
+
class MessagesState(TypedDict): # creates the state (is like the agent's memory at any moment)
|
46 |
+
messages: Annotated[list[AnyMessage], add_messages]
|
47 |
+
|
48 |
+
# LLM node
|
49 |
+
def llm_call(state: MessagesState):
|
50 |
+
return {
|
51 |
+
"messages": [
|
52 |
+
llm_with_tools.invoke(
|
53 |
+
[SystemMessage(content=system_prompt)] + state["messages"]
|
54 |
+
)
|
55 |
+
]
|
56 |
+
}
|
57 |
+
|
58 |
+
# Tool node
|
59 |
+
def tool_node(state: MessagesState):
|
60 |
+
"""Executes the tools"""
|
61 |
+
|
62 |
+
result = []
|
63 |
+
for tool_call in state["messages"][-1].tool_calls: # gives a list of the tools the LLM decided to call
|
64 |
+
tool = tools_by_name[tool_call["name"]] # look up the actual tool function using a dictionary
|
65 |
+
observation = tool.invoke(tool_call["args"]) # executes the tool
|
66 |
+
result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"])) # the result from the tool is added to the memory
|
67 |
+
return {"messages": result} # thanks to add_messages, LangGraph will automatically append the result to the agent's message history
|
68 |
+
|
69 |
+
# Conditional edge function to route to the tool node or end based upon whether the LLM made a tool call
|
70 |
+
def should_continue(state: MessagesState) -> Literal["Action", END]:
|
71 |
+
"""Decide if we should continue the loop or stop based upon whether the LLM made a tool call"""
|
72 |
+
|
73 |
+
last_message = state["messages"][-1] # looks at the last message (usually from the LLM)
|
74 |
+
|
75 |
+
# If the LLM makes a tool call, then perform an action
|
76 |
+
if last_message.tool_calls:
|
77 |
+
return "Action"
|
78 |
+
# Otherwise, we stop (reply to the user)
|
79 |
+
return END
|
80 |
+
|
81 |
+
# Build workflow
|
82 |
+
builder = StateGraph(MessagesState)
|
83 |
+
|
84 |
+
# Add nodes
|
85 |
+
builder.add_node("llm_call", llm_call)
|
86 |
+
builder.add_node("environment", tool_node)
|
87 |
+
|
88 |
+
# Add edges to connect nodes
|
89 |
+
builder.add_edge(START, "llm_call")
|
90 |
+
builder.add_conditional_edges(
|
91 |
+
"llm_call",
|
92 |
+
should_continue,
|
93 |
+
{"Action": "environment", # name returned by should_continue : Name of the next node
|
94 |
+
END: END}
|
95 |
+
)
|
96 |
+
# If tool calls -> "Action" -> environment (executes the tool)
|
97 |
+
# If no tool calls -> END
|
98 |
+
|
99 |
+
builder.add_edge("environment", "llm_call") # after running the tools go back to the LLM for another round of reasoning
|
100 |
+
|
101 |
+
gaia_agent = builder.compile() # converts my builder into a runnable agent by using gaia_agent.invoke()
|
102 |
+
|
103 |
+
# Wrapper class to initialize and call the LangGraph agent with a user question
|
104 |
+
class LangGraphAgent:
|
105 |
+
def __init__(self):
|
106 |
+
print("LangGraphAgent initialized.")
|
107 |
+
|
108 |
+
def __call__(self, question: str) -> str:
|
109 |
+
input_state = {"messages": [HumanMessage(content=question)]} # prepare the initial user message
|
110 |
+
print(f"Running LangGraphAgent with input: {question[:150]}...")
|
111 |
+
|
112 |
+
# tracing configuration for LangSmith
|
113 |
+
config = RunnableConfig(
|
114 |
+
config={
|
115 |
+
"run_name": "GAIA Agent",
|
116 |
+
"tags": ["gaia", "langgraph", "agent"],
|
117 |
+
"metadata": {"user_input": question},
|
118 |
+
"recursion_limit": 30,
|
119 |
+
"tracing": True
|
120 |
+
}
|
121 |
+
)
|
122 |
+
result = gaia_agent.invoke(input_state, config) # prevents infinite looping when the LLM keeps calling tools over and over
|
123 |
+
final_response = result["messages"][-1].content
|
124 |
+
|
125 |
+
try:
|
126 |
+
return final_response.split("FINAL ANSWER:")[-1].strip() # parse out only what's after "FINAL ANSWER:"
|
127 |
+
except Exception:
|
128 |
+
print("Could not split on 'FINAL ANSWER:'")
|
129 |
+
return final_response
|
app.py
CHANGED
@@ -3,22 +3,12 @@ import gradio as gr
|
|
3 |
import requests
|
4 |
import inspect
|
5 |
import pandas as pd
|
|
|
|
|
6 |
|
7 |
-
# (Keep Constants as is)
|
8 |
# --- Constants ---
|
9 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
10 |
|
11 |
-
# --- Basic Agent Definition ---
|
12 |
-
# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
|
13 |
-
class BasicAgent:
|
14 |
-
def __init__(self):
|
15 |
-
print("BasicAgent initialized.")
|
16 |
-
def __call__(self, question: str) -> str:
|
17 |
-
print(f"Agent received question (first 50 chars): {question[:50]}...")
|
18 |
-
fixed_answer = "This is a default answer."
|
19 |
-
print(f"Agent returning fixed answer: {fixed_answer}")
|
20 |
-
return fixed_answer
|
21 |
-
|
22 |
def run_and_submit_all( profile: gr.OAuthProfile | None):
|
23 |
"""
|
24 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
@@ -40,7 +30,7 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
|
|
40 |
|
41 |
# 1. Instantiate Agent ( modify this part to create your agent)
|
42 |
try:
|
43 |
-
agent =
|
44 |
except Exception as e:
|
45 |
print(f"Error instantiating agent: {e}")
|
46 |
return f"Error initializing agent: {e}", None
|
@@ -72,20 +62,44 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
|
|
72 |
# 3. Run your Agent
|
73 |
results_log = []
|
74 |
answers_payload = []
|
|
|
75 |
print(f"Running agent on {len(questions_data)} questions...")
|
76 |
-
|
77 |
-
|
78 |
-
|
|
|
|
|
|
|
79 |
if not task_id or question_text is None:
|
80 |
-
print(f"Skipping
|
81 |
continue
|
|
|
82 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
submitted_answer = agent(question_text)
|
|
|
|
|
84 |
answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
|
85 |
results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
|
|
|
86 |
except Exception as e:
|
87 |
print(f"Error running agent on task {task_id}: {e}")
|
88 |
results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
if not answers_payload:
|
91 |
print("Agent did not produce any answers to submit.")
|
@@ -193,4 +207,5 @@ if __name__ == "__main__":
|
|
193 |
print("-"*(60 + len(" App Starting ")) + "\n")
|
194 |
|
195 |
print("Launching Gradio Interface for Basic Agent Evaluation...")
|
196 |
-
demo.launch(debug=True, share=False)
|
|
|
|
3 |
import requests
|
4 |
import inspect
|
5 |
import pandas as pd
|
6 |
+
from time import sleep
|
7 |
+
from agent import LangGraphAgent
|
8 |
|
|
|
9 |
# --- Constants ---
|
10 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
def run_and_submit_all( profile: gr.OAuthProfile | None):
|
13 |
"""
|
14 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
|
|
30 |
|
31 |
# 1. Instantiate Agent ( modify this part to create your agent)
|
32 |
try:
|
33 |
+
agent = LangGraphAgent()
|
34 |
except Exception as e:
|
35 |
print(f"Error instantiating agent: {e}")
|
36 |
return f"Error initializing agent: {e}", None
|
|
|
62 |
# 3. Run your Agent
|
63 |
results_log = []
|
64 |
answers_payload = []
|
65 |
+
|
66 |
print(f"Running agent on {len(questions_data)} questions...")
|
67 |
+
|
68 |
+
for question in questions_data:
|
69 |
+
task_id = question.get("task_id")
|
70 |
+
question_text = question.get("question")
|
71 |
+
file_name = question.get("file_name")
|
72 |
+
|
73 |
if not task_id or question_text is None:
|
74 |
+
print(f"Skipping question with missing task_id or question: {question}")
|
75 |
continue
|
76 |
+
|
77 |
try:
|
78 |
+
# append file URL and extension (if available) to the question to help the agent
|
79 |
+
if file_name:
|
80 |
+
file_url = f"{DEFAULT_API_URL}/files/{task_id}"
|
81 |
+
question_text += f'\nFile URL: "{file_url}"'
|
82 |
+
try:
|
83 |
+
extension = file_name.split('.')[-1]
|
84 |
+
question_text += f" (.{extension} file)"
|
85 |
+
except Exception as e:
|
86 |
+
print(f"Warning: couldn't extract extension from {file_name}: {e}")
|
87 |
+
|
88 |
+
# call the agent
|
89 |
submitted_answer = agent(question_text)
|
90 |
+
|
91 |
+
# store result
|
92 |
answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
|
93 |
results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
|
94 |
+
|
95 |
except Exception as e:
|
96 |
print(f"Error running agent on task {task_id}: {e}")
|
97 |
results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
|
98 |
+
|
99 |
+
finally:
|
100 |
+
# wait 10 seconds between calls to avoid API rate limit
|
101 |
+
print('\n\n-> Waiting 10 seconds to avoid API rate limit')
|
102 |
+
sleep(10)
|
103 |
|
104 |
if not answers_payload:
|
105 |
print("Agent did not produce any answers to submit.")
|
|
|
207 |
print("-"*(60 + len(" App Starting ")) + "\n")
|
208 |
|
209 |
print("Launching Gradio Interface for Basic Agent Evaluation...")
|
210 |
+
demo.launch(debug=True, share=False)
|
211 |
+
|
prompt.py
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
system_prompt = """\
|
2 |
+
You are an AI assistant.
|
3 |
+
|
4 |
+
When presented with a question, always:
|
5 |
+
- Briefly state your reasoning in natural language.
|
6 |
+
- Conclude your response with this explicit format:
|
7 |
+
FINAL ANSWER: [YOUR FINAL ANSWER]
|
8 |
+
|
9 |
+
Formatting rules for YOUR FINAL ANSWER:
|
10 |
+
- If a number is expected:
|
11 |
+
- Write the number without commas or spaces.
|
12 |
+
- Do not use units or symbols (like $ or %) unless specifically requested.
|
13 |
+
- If a string is expected:
|
14 |
+
- Omit articles (the, a, an).
|
15 |
+
- Do not use abbreviations (write full names, e.g. "Paris" not "Par.").
|
16 |
+
- Write out all digits as numerals.
|
17 |
+
- If a comma-separated list is required:
|
18 |
+
- Apply the corresponding rules for each element (number or string) as above.
|
19 |
+
|
20 |
+
Be precise, succinct, and strictly follow these output rules.
|
21 |
+
"""
|
requirements.txt
CHANGED
@@ -1,2 +1,15 @@
|
|
1 |
gradio
|
2 |
-
requests
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
gradio
|
2 |
+
requests
|
3 |
+
openai
|
4 |
+
pytube
|
5 |
+
openpyxl
|
6 |
+
pypdf2
|
7 |
+
beautifulsoup4
|
8 |
+
youtube-transcript-api
|
9 |
+
langsmith
|
10 |
+
langgraph
|
11 |
+
langchain
|
12 |
+
langchain-core
|
13 |
+
langchain-openai
|
14 |
+
langchain-community
|
15 |
+
langchain-experimental
|
tools.py
ADDED
@@ -0,0 +1,225 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from langchain_core.tools import tool
|
2 |
+
import datetime
|
3 |
+
import requests
|
4 |
+
import openai
|
5 |
+
import os
|
6 |
+
import tempfile
|
7 |
+
import pandas as pd
|
8 |
+
from urllib.parse import urlparse, parse_qs
|
9 |
+
from openai import OpenAI
|
10 |
+
from youtube_transcript_api import YouTubeTranscriptApi
|
11 |
+
from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound, VideoUnavailable
|
12 |
+
from pytube import extract
|
13 |
+
from openai import OpenAI
|
14 |
+
from bs4 import BeautifulSoup
|
15 |
+
from io import BytesIO
|
16 |
+
from PyPDF2 import PdfReader
|
17 |
+
|
18 |
+
@tool
|
19 |
+
def add(a: float, b: float) -> float:
|
20 |
+
""" Adds two numbers.
|
21 |
+
Args:
|
22 |
+
a (float): first number
|
23 |
+
b (float): second number
|
24 |
+
"""
|
25 |
+
return a + b
|
26 |
+
|
27 |
+
@tool
|
28 |
+
def subtract(a: float, b: float) -> int:
|
29 |
+
""" Subtracts two numbers.
|
30 |
+
Args:
|
31 |
+
a (float): first number
|
32 |
+
b (float): second number
|
33 |
+
"""
|
34 |
+
return a - b
|
35 |
+
|
36 |
+
@tool
|
37 |
+
def multiply(a: float, b: float) -> float:
|
38 |
+
""" Multiplies two numbers.
|
39 |
+
Args:
|
40 |
+
a (float): first number
|
41 |
+
b (float): second number
|
42 |
+
"""
|
43 |
+
return a * b
|
44 |
+
|
45 |
+
@tool
|
46 |
+
def divide(a: float, b: float) -> float:
|
47 |
+
""" Divides two numbers.
|
48 |
+
Args:
|
49 |
+
a (float): first number
|
50 |
+
b (float): second number
|
51 |
+
"""
|
52 |
+
if b == 0:
|
53 |
+
raise ValueError("Cannot divide by zero.")
|
54 |
+
return a / b
|
55 |
+
|
56 |
+
@tool
|
57 |
+
def power(a: float, b: float) -> float:
|
58 |
+
""" Calculates the power of two numbers.
|
59 |
+
Args:
|
60 |
+
a (float): first number
|
61 |
+
b (float): second number
|
62 |
+
"""
|
63 |
+
return a**b
|
64 |
+
|
65 |
+
calculator_basic = [add, subtract, multiply, divide, power]
|
66 |
+
|
67 |
+
|
68 |
+
@tool
|
69 |
+
def current_date(_) -> str :
|
70 |
+
""" Returns the current date in YYYY-MM-DD format """
|
71 |
+
return datetime.datetime.now().strftime("%Y-%m-%d")
|
72 |
+
|
73 |
+
@tool
|
74 |
+
def day_of_week(_) -> str :
|
75 |
+
""" Returns the current day of the week (e.g., Monday, Tuesday) """
|
76 |
+
return datetime.datetime.now().strftime("%A")
|
77 |
+
|
78 |
+
@tool
|
79 |
+
def days_until(date_str: str) -> str :
|
80 |
+
""" Returns the number of days from today until a given date (input format: YYYY-MM-DD) """
|
81 |
+
try:
|
82 |
+
future_date = datetime.datetime.strptime(date_str, "%Y-%m-%d").date()
|
83 |
+
today = datetime.date.today()
|
84 |
+
|
85 |
+
delta_days = (future_date - today).days
|
86 |
+
return f"{delta_days} days until {date_str}"
|
87 |
+
except Exception as e:
|
88 |
+
return f"Error parsing date: {str(e)}"
|
89 |
+
|
90 |
+
datetime_tools = [current_date, day_of_week, days_until]
|
91 |
+
|
92 |
+
|
93 |
+
@tool
|
94 |
+
def transcribe_audio(audio_file: str, file_extension: str) -> str:
|
95 |
+
""" Transcribes an audio file to text
|
96 |
+
Args:
|
97 |
+
audio_file (str): local file path to the audio file (.mp3, .m4a, etc.)
|
98 |
+
file_extension (str): file extension of the audio, e.g. mp3
|
99 |
+
Returns:
|
100 |
+
str: The transcribed text from the audio.
|
101 |
+
"""
|
102 |
+
try:
|
103 |
+
response = requests.get(audio_file) # download the audio_file
|
104 |
+
response.raise_for_status() # check if the http request was successful
|
105 |
+
|
106 |
+
# clean file extension and save to disk
|
107 |
+
file_extension = file_extension.replace('.','')
|
108 |
+
filename = f'tmp.{file_extension}'
|
109 |
+
with open(filename, 'wb') as file: # opens a new file for writing with a name like, e.g. tmp.mp3
|
110 |
+
file.write(response.content) # write(w) the binary(b) contents (audio file) to disk
|
111 |
+
|
112 |
+
# transcribe audio with OpenAI Whisper
|
113 |
+
client = OpenAI()
|
114 |
+
|
115 |
+
# read(r) the audio file from disk in binary(b) mode "rb"; the "with" block ensures the file is automatically closed afterward
|
116 |
+
with open(filename, "rb") as audio_content:
|
117 |
+
transcription = client.audio.transcriptions.create(
|
118 |
+
model="whisper-1",
|
119 |
+
file=audio_content
|
120 |
+
)
|
121 |
+
return transcription.text
|
122 |
+
|
123 |
+
except Exception as e:
|
124 |
+
return f"transcribe_audio failed: {e}"
|
125 |
+
|
126 |
+
@tool
|
127 |
+
def transcribe_youtube(youtube_url: str) -> str:
|
128 |
+
""" Transcribes a YouTube video
|
129 |
+
Args:
|
130 |
+
youtube_url (str): youtube video's url
|
131 |
+
Returns:
|
132 |
+
str: The transcribed text from the video.
|
133 |
+
"""
|
134 |
+
try:
|
135 |
+
query = urlparse(youtube_url).query
|
136 |
+
video_id = parse_qs(query)['v'][0]
|
137 |
+
except Exception:
|
138 |
+
return "invalid YouTube URL"
|
139 |
+
|
140 |
+
try:
|
141 |
+
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
|
142 |
+
transcript = transcript_list.find_transcript(['en']).fetch()
|
143 |
+
# keep only text
|
144 |
+
text = '\n'.join([t['text'] for t in transcript])
|
145 |
+
return text
|
146 |
+
|
147 |
+
except (TranscriptsDisabled, NoTranscriptFound, VideoUnavailable) as e:
|
148 |
+
return f"transcript unavailable: {str(e)}"
|
149 |
+
|
150 |
+
except Exception as e:
|
151 |
+
return f"transcribe_youtube failed: {e}"
|
152 |
+
|
153 |
+
@tool
|
154 |
+
def query_image(query: str, image_url: str) -> str:
|
155 |
+
""" Ask anything about an image using a Vision Language Model
|
156 |
+
Args:
|
157 |
+
query (str): the query about the image, e.g. how many animals are on the image?
|
158 |
+
image_url (str): the image's URL
|
159 |
+
"""
|
160 |
+
try:
|
161 |
+
client = OpenAI()
|
162 |
+
response = client.responses.create(
|
163 |
+
model="gpt-4o-mini",
|
164 |
+
input=[
|
165 |
+
{
|
166 |
+
"role": "user",
|
167 |
+
"content": [
|
168 |
+
{"type": "input_text", "text": query},
|
169 |
+
{"type": "input_image","image_url": image_url},
|
170 |
+
],
|
171 |
+
}
|
172 |
+
],
|
173 |
+
)
|
174 |
+
return response.output_text
|
175 |
+
|
176 |
+
except Exception as e:
|
177 |
+
return f"query_image failed: {e}"
|
178 |
+
|
179 |
+
@tool
|
180 |
+
def webpage_content(url: str) -> str:
|
181 |
+
""" Fetch text from a webpage or PDF file.
|
182 |
+
Args:
|
183 |
+
url (str): The URL of the webpage to fetch.
|
184 |
+
Returns:
|
185 |
+
str: Extracted text.
|
186 |
+
"""
|
187 |
+
try:
|
188 |
+
response = requests.get(url)
|
189 |
+
response.raise_for_status()
|
190 |
+
|
191 |
+
content_type = response.headers.get("Content-Type", "")
|
192 |
+
|
193 |
+
# PDF file
|
194 |
+
if "pdf" in content_type:
|
195 |
+
pdf_content = BytesIO(response.content)
|
196 |
+
reader = PdfReader(pdf_content)
|
197 |
+
return "\n".join(page.extract_text() or "" for page in reader.pages)
|
198 |
+
|
199 |
+
# HTML file
|
200 |
+
soup = BeautifulSoup(response.text, "html.parser")
|
201 |
+
body = soup.body
|
202 |
+
return body.get_text(separator="\n", strip=True) if body else soup.get_text(strip=True)
|
203 |
+
|
204 |
+
except Exception as e:
|
205 |
+
return f"webpage_content failed: {e}"
|
206 |
+
|
207 |
+
@tool
|
208 |
+
def read_excel(file_url: str) -> str:
|
209 |
+
""" Reads an Excel file from a URL and returns the content as CSV text.
|
210 |
+
Args:
|
211 |
+
file_url (str): URL to the Excel file (.xlsx, .xls)
|
212 |
+
Returns:
|
213 |
+
str: Content of the Excel file as CSV text.
|
214 |
+
"""
|
215 |
+
try:
|
216 |
+
response = requests.get(file_url)
|
217 |
+
response.raise_for_status()
|
218 |
+
|
219 |
+
excel_content = BytesIO(response.content)
|
220 |
+
df = pd.read_excel(excel_content)
|
221 |
+
|
222 |
+
return df.to_csv(index=False) # convert dataframe to CSV string for easy processing
|
223 |
+
|
224 |
+
except Exception as e:
|
225 |
+
return f"read_excel failed: {str(e)}"
|