Daniil Bogdanov
commited on
Commit
Β·
a225ae4
1
Parent(s):
932fded
Release v5
Browse files- README.md +7 -8
- agent.py +4 -1
- app.py +24 -8
- requirements.txt +3 -1
- tools.py +0 -100
- tools/describe_image_tool.py +112 -0
- tools/openai_speech_to_text_tool.py +35 -0
- tools/read_file_tool.py +27 -0
- tools/tools.py +34 -0
- tools/youtube_transcription_tool.py +26 -0
README.md
CHANGED
@@ -16,9 +16,9 @@ hf_oauth_expiration_minutes: 480
|
|
16 |
|
17 |
**Final assessment for the Hugging Face AI Agents course**
|
18 |
|
19 |
-
This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/
|
20 |
|
21 |
-
##
|
22 |
- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
|
23 |
- **Features:**
|
24 |
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
|
@@ -26,7 +26,7 @@ This repository contains a fully implemented autonomous agent designed to solve
|
|
26 |
- Handles file-based and multi-modal tasks
|
27 |
- Submits results and displays scores in a user-friendly Gradio interface
|
28 |
|
29 |
-
##
|
30 |
|
31 |
**On HuggingFace Spaces:**
|
32 |
- Log in with your HuggingFace account.
|
@@ -38,21 +38,20 @@ pip install -r requirements.txt
|
|
38 |
python app.py
|
39 |
```
|
40 |
|
41 |
-
##
|
42 |
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
|
43 |
|
44 |
-
##
|
45 |
- `app.py` β Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
|
46 |
- `agent.py` β Main `Agent` class. Implements reasoning, tool use, and answer formatting
|
47 |
- `model.py` β Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
|
48 |
- `tools.py` β Implements external tools
|
49 |
- `utils/logger.py` β Logging utility
|
50 |
-
- `requirements.txt` β All dependencies for local and Spaces deployment
|
51 |
|
52 |
-
##
|
53 |
Some models require API keys. Set these in your Space or local environment:
|
54 |
- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
|
55 |
- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
|
56 |
|
57 |
-
##
|
58 |
All required packages are listed in `requirements.txt`
|
|
|
16 |
|
17 |
**Final assessment for the Hugging Face AI Agents course**
|
18 |
|
19 |
+
This repository contains a fully implemented autonomous agent designed to solve the [GAIA benchmark](https://arxiv.org/abs/2311.12983) - level 1. The agent leverages large language models and a suite of external tools to tackle complex, real-world, multi-modal tasks. It is ready to run and submit answers to the GAIA evaluation server, and is deployable as a HuggingFace Space with a Gradio interface.
|
20 |
|
21 |
+
## Project Summary
|
22 |
- **Purpose:** Automatically solve and submit answers for the GAIA benchmark, which evaluates generalist AI agents on tasks requiring reasoning, code execution, web search, data analysis, and more.
|
23 |
- **Features:**
|
24 |
- Uses LLMs (OpenAI, HuggingFace, etc.) for reasoning and planning
|
|
|
26 |
- Handles file-based and multi-modal tasks
|
27 |
- Submits results and displays scores in a user-friendly Gradio interface
|
28 |
|
29 |
+
## How to Run
|
30 |
|
31 |
**On HuggingFace Spaces:**
|
32 |
- Log in with your HuggingFace account.
|
|
|
38 |
python app.py
|
39 |
```
|
40 |
|
41 |
+
## About GAIA
|
42 |
GAIA is a challenging benchmark for evaluating the capabilities of generalist AI agents on real-world, multi-step, and multi-modal tasks. Each task may require code execution, web search, data analysis, or other tool use. This agent is designed to autonomously solve such tasks and submit answers for evaluation.
|
43 |
|
44 |
+
## Architecture
|
45 |
- `app.py` β Gradio app and evaluation logic. Fetches questions, runs the agent, and submits answers
|
46 |
- `agent.py` β Main `Agent` class. Implements reasoning, tool use, and answer formatting
|
47 |
- `model.py` β Loads and manages LLM backends (OpenAI, HuggingFace, LiteLLM, etc.)
|
48 |
- `tools.py` β Implements external tools
|
49 |
- `utils/logger.py` β Logging utility
|
|
|
50 |
|
51 |
+
## Environment Variables
|
52 |
Some models require API keys. Set these in your Space or local environment:
|
53 |
- `OPENAI_API_KEY` and `OPENAI_API_BASE` (for OpenAI models)
|
54 |
- `HUGGINGFACEHUB_API_TOKEN` (for HuggingFace Hub models)
|
55 |
|
56 |
+
## Dependencies
|
57 |
All required packages are listed in `requirements.txt`
|
agent.py
CHANGED
@@ -38,6 +38,7 @@ class Agent:
|
|
38 |
"re",
|
39 |
"openpyxl",
|
40 |
"pathlib",
|
|
|
41 |
]
|
42 |
self.agent = CodeAgent(
|
43 |
model=self.model,
|
@@ -55,12 +56,14 @@ class Agent:
|
|
55 |
- Reason step-by-step. Think through the solution logically and plan your actions carefully before answering.
|
56 |
- Validate information. Always verify facts when possible instead of guessing.
|
57 |
- Use code if needed. For calculations, parsing, or transformations, generate Python code and execute it. But be careful, some questions contains time-consuming tasks, so you should be careful with the code you run. Better analyze the question and think about the best way to solve it.
|
|
|
|
|
58 |
|
59 |
IMPORTANT: When giving the final answer, output only the direct required result without any extra text like "Final Answer:" or explanations. YOU MUST RESPOND IN THE EXACT FORMAT AS THE QUESTION.
|
60 |
|
61 |
QUESTION: {question}
|
62 |
|
63 |
-
|
64 |
|
65 |
ANSWER:
|
66 |
"""
|
|
|
38 |
"re",
|
39 |
"openpyxl",
|
40 |
"pathlib",
|
41 |
+
"sys",
|
42 |
]
|
43 |
self.agent = CodeAgent(
|
44 |
model=self.model,
|
|
|
56 |
- Reason step-by-step. Think through the solution logically and plan your actions carefully before answering.
|
57 |
- Validate information. Always verify facts when possible instead of guessing.
|
58 |
- Use code if needed. For calculations, parsing, or transformations, generate Python code and execute it. But be careful, some questions contains time-consuming tasks, so you should be careful with the code you run. Better analyze the question and think about the best way to solve it.
|
59 |
+
- Don't forget to use `final_answer` to give the final answer.
|
60 |
+
- Use name of file ONLY FROM "FILE:" section. THIS IF ALWAYS A FILE.
|
61 |
|
62 |
IMPORTANT: When giving the final answer, output only the direct required result without any extra text like "Final Answer:" or explanations. YOU MUST RESPOND IN THE EXACT FORMAT AS THE QUESTION.
|
63 |
|
64 |
QUESTION: {question}
|
65 |
|
66 |
+
FILE: {context}
|
67 |
|
68 |
ANSWER:
|
69 |
"""
|
app.py
CHANGED
@@ -9,7 +9,7 @@ import requests
|
|
9 |
|
10 |
from agent import Agent
|
11 |
from model import get_model
|
12 |
-
from tools import get_tools
|
13 |
|
14 |
# (Keep Constants as is)
|
15 |
# --- Constants ---
|
@@ -33,7 +33,7 @@ def run_and_submit_all(
|
|
33 |
Tuple[str, Optional[pd.DataFrame]]: Status message and DataFrame of results.
|
34 |
"""
|
35 |
# --- Determine HF Space Runtime URL and Repo URL ---
|
36 |
-
space_id = "
|
37 |
|
38 |
if profile:
|
39 |
username = f"{profile.username}"
|
@@ -95,10 +95,26 @@ def run_and_submit_all(
|
|
95 |
try:
|
96 |
file_response = requests.get(f"{files_url}/{task_id}", timeout=15)
|
97 |
if file_response.status_code == 200 and file_response.content:
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
else:
|
103 |
print(f"No file for task {task_id} or file is empty.")
|
104 |
except Exception as e:
|
@@ -213,8 +229,8 @@ with gr.Blocks() as demo:
|
|
213 |
if __name__ == "__main__":
|
214 |
print("\n" + "-" * 30 + " App Starting " + "-" * 30)
|
215 |
# Check for SPACE_HOST and SPACE_ID at startup for information
|
216 |
-
space_host_startup =
|
217 |
-
space_id_startup = "
|
218 |
|
219 |
if space_host_startup:
|
220 |
print(f"β
SPACE_HOST found: {space_host_startup}")
|
|
|
9 |
|
10 |
from agent import Agent
|
11 |
from model import get_model
|
12 |
+
from tools.tools import get_tools
|
13 |
|
14 |
# (Keep Constants as is)
|
15 |
# --- Constants ---
|
|
|
33 |
Tuple[str, Optional[pd.DataFrame]]: Status message and DataFrame of results.
|
34 |
"""
|
35 |
# --- Determine HF Space Runtime URL and Repo URL ---
|
36 |
+
space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
|
37 |
|
38 |
if profile:
|
39 |
username = f"{profile.username}"
|
|
|
95 |
try:
|
96 |
file_response = requests.get(f"{files_url}/{task_id}", timeout=15)
|
97 |
if file_response.status_code == 200 and file_response.content:
|
98 |
+
# Get filename from Content-Disposition header or URL
|
99 |
+
filename = None
|
100 |
+
content_disposition = file_response.headers.get(
|
101 |
+
"Content-Disposition"
|
102 |
+
)
|
103 |
+
if content_disposition and "filename=" in content_disposition:
|
104 |
+
filename = content_disposition.split("filename=")[-1].strip('"')
|
105 |
+
else:
|
106 |
+
# Try to get filename from URL
|
107 |
+
url = file_response.url
|
108 |
+
filename = url.split("/")[-1]
|
109 |
+
if not filename or filename == str(task_id):
|
110 |
+
filename = f"file_{task_id}"
|
111 |
+
|
112 |
+
# Create temp directory and save file with original name
|
113 |
+
temp_dir = tempfile.mkdtemp()
|
114 |
+
file_path = os.path.join(temp_dir, filename)
|
115 |
+
with open(file_path, "wb") as f:
|
116 |
+
f.write(file_response.content)
|
117 |
+
print(f"Downloaded file for task {task_id} to {file_path}")
|
118 |
else:
|
119 |
print(f"No file for task {task_id} or file is empty.")
|
120 |
except Exception as e:
|
|
|
229 |
if __name__ == "__main__":
|
230 |
print("\n" + "-" * 30 + " App Starting " + "-" * 30)
|
231 |
# Check for SPACE_HOST and SPACE_ID at startup for information
|
232 |
+
space_host_startup = os.getenv("SPACE_HOST")
|
233 |
+
space_id_startup = os.getenv("SPACE_ID")
|
234 |
|
235 |
if space_host_startup:
|
236 |
print(f"β
SPACE_HOST found: {space_host_startup}")
|
requirements.txt
CHANGED
@@ -11,4 +11,6 @@ smolagents[openai]
|
|
11 |
smolagents[transformers]
|
12 |
transformers
|
13 |
wikipedia-api
|
14 |
-
youtube-transcript-api
|
|
|
|
|
|
11 |
smolagents[transformers]
|
12 |
transformers
|
13 |
wikipedia-api
|
14 |
+
youtube-transcript-api
|
15 |
+
openai-whisper
|
16 |
+
openai
|
tools.py
DELETED
@@ -1,100 +0,0 @@
|
|
1 |
-
from typing import Any, List
|
2 |
-
|
3 |
-
import pytesseract
|
4 |
-
from PIL import Image
|
5 |
-
from smolagents import (
|
6 |
-
DuckDuckGoSearchTool,
|
7 |
-
PythonInterpreterTool,
|
8 |
-
SpeechToTextTool,
|
9 |
-
Tool,
|
10 |
-
VisitWebpageTool,
|
11 |
-
WikipediaSearchTool,
|
12 |
-
)
|
13 |
-
from youtube_transcript_api import YouTubeTranscriptApi
|
14 |
-
|
15 |
-
|
16 |
-
class YouTubeTranscriptionTool(Tool):
|
17 |
-
"""
|
18 |
-
Tool to fetch the transcript of a YouTube video given its URL.
|
19 |
-
|
20 |
-
Args:
|
21 |
-
video_url (str): YouTube video URL.
|
22 |
-
|
23 |
-
Returns:
|
24 |
-
str: Transcript of the video as a single string.
|
25 |
-
"""
|
26 |
-
|
27 |
-
name = "youtube_transcription"
|
28 |
-
description = "Fetches the transcript of a YouTube video given its URL"
|
29 |
-
inputs = {
|
30 |
-
"video_url": {"type": "string", "description": "YouTube video URL"},
|
31 |
-
}
|
32 |
-
output_type = "string"
|
33 |
-
|
34 |
-
def forward(self, video_url: str) -> str:
|
35 |
-
video_id = video_url.strip().split("v=")[-1]
|
36 |
-
transcript = YouTubeTranscriptApi.get_transcript(video_id)
|
37 |
-
return " ".join([entry["text"] for entry in transcript])
|
38 |
-
|
39 |
-
|
40 |
-
class ReadFileTool(Tool):
|
41 |
-
"""
|
42 |
-
Tool to read a file and return its content.
|
43 |
-
|
44 |
-
Args:
|
45 |
-
file_path (str): Path to the file to read.
|
46 |
-
|
47 |
-
Returns:
|
48 |
-
str: Content of the file or error message.
|
49 |
-
"""
|
50 |
-
|
51 |
-
name = "read_file"
|
52 |
-
description = "Reads a file and returns its content"
|
53 |
-
inputs = {
|
54 |
-
"file_path": {"type": "string", "description": "Path to the file to read"},
|
55 |
-
}
|
56 |
-
output_type = "string"
|
57 |
-
|
58 |
-
def forward(self, file_path: str) -> str:
|
59 |
-
try:
|
60 |
-
with open(file_path, "r") as file:
|
61 |
-
return file.read()
|
62 |
-
except Exception as e:
|
63 |
-
return f"Error reading file: {str(e)}"
|
64 |
-
|
65 |
-
|
66 |
-
class ExtractTextFromImageTool(Tool):
|
67 |
-
name = "extract_text_from_image"
|
68 |
-
description = "Extracts text from an image using pytesseract"
|
69 |
-
inputs = {
|
70 |
-
"image_path": {"type": "string", "description": "Path to the image file"},
|
71 |
-
}
|
72 |
-
output_type = "string"
|
73 |
-
|
74 |
-
def forward(self, image_path: str) -> str:
|
75 |
-
try:
|
76 |
-
image = Image.open(image_path)
|
77 |
-
text = pytesseract.image_to_string(image)
|
78 |
-
return text
|
79 |
-
except Exception as e:
|
80 |
-
return f"Error extracting text from image: {str(e)}"
|
81 |
-
|
82 |
-
|
83 |
-
def get_tools() -> List[Tool]:
|
84 |
-
"""
|
85 |
-
Returns a list of available tools for the agent.
|
86 |
-
|
87 |
-
Returns:
|
88 |
-
List[Tool]: List of initialized tool instances.
|
89 |
-
"""
|
90 |
-
tools = [
|
91 |
-
DuckDuckGoSearchTool(),
|
92 |
-
PythonInterpreterTool(),
|
93 |
-
WikipediaSearchTool(),
|
94 |
-
VisitWebpageTool(),
|
95 |
-
SpeechToTextTool(),
|
96 |
-
YouTubeTranscriptionTool(),
|
97 |
-
ReadFileTool(),
|
98 |
-
ExtractTextFromImageTool(),
|
99 |
-
]
|
100 |
-
return tools
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tools/describe_image_tool.py
ADDED
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import base64
|
2 |
+
import os
|
3 |
+
|
4 |
+
from openai import OpenAI
|
5 |
+
from smolagents import Tool
|
6 |
+
|
7 |
+
client = OpenAI()
|
8 |
+
|
9 |
+
|
10 |
+
class DescribeImageTool(Tool):
|
11 |
+
"""
|
12 |
+
Tool to analyze and describe any image using GPT-4 Vision API.
|
13 |
+
|
14 |
+
Args:
|
15 |
+
image_path (str): Path to the image file.
|
16 |
+
description_type (str): Type of description to generate. Options:
|
17 |
+
- "general": General description of the image
|
18 |
+
- "detailed": Detailed analysis of the image
|
19 |
+
- "chess": Analysis of a chess position
|
20 |
+
- "text": Extract and describe text from the image
|
21 |
+
- "custom": Custom description based on user prompt
|
22 |
+
|
23 |
+
Returns:
|
24 |
+
str: Description of the image based on the requested type.
|
25 |
+
"""
|
26 |
+
|
27 |
+
name = "describe_image"
|
28 |
+
description = "Analyzes and describes images using GPT-4 Vision API"
|
29 |
+
inputs = {
|
30 |
+
"image_path": {"type": "string", "description": "Path to the image file"},
|
31 |
+
"description_type": {
|
32 |
+
"type": "string",
|
33 |
+
"description": "Type of description to generate (general, detailed, chess, text, custom)",
|
34 |
+
"nullable": True,
|
35 |
+
},
|
36 |
+
"custom_prompt": {
|
37 |
+
"type": "string",
|
38 |
+
"description": "Custom prompt for description (only used when description_type is 'custom')",
|
39 |
+
"nullable": True,
|
40 |
+
},
|
41 |
+
}
|
42 |
+
output_type = "string"
|
43 |
+
|
44 |
+
def encode_image(self, image_path: str) -> str:
|
45 |
+
"""Encode image to base64 string."""
|
46 |
+
with open(image_path, "rb") as image_file:
|
47 |
+
return base64.b64encode(image_file.read()).decode("utf-8")
|
48 |
+
|
49 |
+
def get_prompt(self, description_type: str, custom_prompt: str = None) -> str:
|
50 |
+
"""Get appropriate prompt based on description type."""
|
51 |
+
prompts = {
|
52 |
+
"general": "Provide a general description of this image. Focus on the main subjects, colors, and overall scene.",
|
53 |
+
"detailed": """Analyze this image in detail. Include:
|
54 |
+
1. Main subjects and their relationships
|
55 |
+
2. Colors, lighting, and composition
|
56 |
+
3. Any text or symbols present
|
57 |
+
4. Context or possible meaning
|
58 |
+
5. Notable details or interesting elements""",
|
59 |
+
"chess": """Analyze this chess position and provide a detailed description including:
|
60 |
+
1. List of pieces on the board for both white and black
|
61 |
+
2. Whose turn it is to move
|
62 |
+
3. Basic evaluation of the position
|
63 |
+
4. Any immediate tactical opportunities or threats
|
64 |
+
5. Suggested next moves with brief explanations""",
|
65 |
+
"text": "Extract and describe any text present in this image. If there are multiple pieces of text, organize them clearly.",
|
66 |
+
}
|
67 |
+
return (
|
68 |
+
custom_prompt
|
69 |
+
if description_type == "custom"
|
70 |
+
else prompts.get(description_type, prompts["general"])
|
71 |
+
)
|
72 |
+
|
73 |
+
def forward(
|
74 |
+
self,
|
75 |
+
image_path: str,
|
76 |
+
description_type: str = "general",
|
77 |
+
custom_prompt: str = None,
|
78 |
+
) -> str:
|
79 |
+
try:
|
80 |
+
if not os.path.exists(image_path):
|
81 |
+
return f"Error: Image file not found at {image_path}"
|
82 |
+
|
83 |
+
# Encode the image
|
84 |
+
base64_image = self.encode_image(image_path)
|
85 |
+
|
86 |
+
# Get appropriate prompt
|
87 |
+
prompt = self.get_prompt(description_type, custom_prompt)
|
88 |
+
|
89 |
+
# Make the API call
|
90 |
+
response = client.chat.completions.create(
|
91 |
+
model="gpt-4.1",
|
92 |
+
messages=[
|
93 |
+
{
|
94 |
+
"role": "user",
|
95 |
+
"content": [
|
96 |
+
{"type": "text", "text": prompt},
|
97 |
+
{
|
98 |
+
"type": "image_url",
|
99 |
+
"image_url": {
|
100 |
+
"url": f"data:image/jpeg;base64,{base64_image}"
|
101 |
+
},
|
102 |
+
},
|
103 |
+
],
|
104 |
+
}
|
105 |
+
],
|
106 |
+
max_tokens=1000,
|
107 |
+
)
|
108 |
+
|
109 |
+
return response.choices[0].message.content
|
110 |
+
|
111 |
+
except Exception as e:
|
112 |
+
return f"Error analyzing image: {str(e)}"
|
tools/openai_speech_to_text_tool.py
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
|
3 |
+
import whisper
|
4 |
+
from smolagents import Tool
|
5 |
+
|
6 |
+
|
7 |
+
class OpenAISpeechToTextTool(Tool):
|
8 |
+
"""
|
9 |
+
Tool to convert speech to text using OpenAI's Whisper model.
|
10 |
+
|
11 |
+
Args:
|
12 |
+
audio_path (str): Path to the audio file.
|
13 |
+
|
14 |
+
Returns:
|
15 |
+
str: Transcribed text from the audio file.
|
16 |
+
"""
|
17 |
+
|
18 |
+
name = "transcribe_audio"
|
19 |
+
description = "Transcribes audio to text and returns the text"
|
20 |
+
inputs = {
|
21 |
+
"audio_path": {"type": "string", "description": "Path to the audio file"},
|
22 |
+
}
|
23 |
+
output_type = "string"
|
24 |
+
|
25 |
+
def forward(self, audio_path: str) -> str:
|
26 |
+
try:
|
27 |
+
model = whisper.load_model("small")
|
28 |
+
|
29 |
+
if not os.path.exists(audio_path):
|
30 |
+
return f"Error: Audio file not found at {audio_path}"
|
31 |
+
|
32 |
+
result = model.transcribe(audio_path)
|
33 |
+
return result["text"]
|
34 |
+
except Exception as e:
|
35 |
+
return f"Error transcribing audio: {str(e)}"
|
tools/read_file_tool.py
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from smolagents import Tool
|
2 |
+
|
3 |
+
|
4 |
+
class ReadFileTool(Tool):
|
5 |
+
"""
|
6 |
+
Tool to read a file and return its content.
|
7 |
+
|
8 |
+
Args:
|
9 |
+
file_path (str): Path to the file to read.
|
10 |
+
|
11 |
+
Returns:
|
12 |
+
str: Content of the file or error message.
|
13 |
+
"""
|
14 |
+
|
15 |
+
name = "read_file"
|
16 |
+
description = "Reads a file and returns its content"
|
17 |
+
inputs = {
|
18 |
+
"file_path": {"type": "string", "description": "Path to the file to read"},
|
19 |
+
}
|
20 |
+
output_type = "string"
|
21 |
+
|
22 |
+
def forward(self, file_path: str) -> str:
|
23 |
+
try:
|
24 |
+
with open(file_path, "r") as file:
|
25 |
+
return file.read()
|
26 |
+
except Exception as e:
|
27 |
+
return f"Error reading file: {str(e)}"
|
tools/tools.py
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import List
|
2 |
+
|
3 |
+
from smolagents import (
|
4 |
+
DuckDuckGoSearchTool,
|
5 |
+
PythonInterpreterTool,
|
6 |
+
Tool,
|
7 |
+
VisitWebpageTool,
|
8 |
+
WikipediaSearchTool,
|
9 |
+
)
|
10 |
+
|
11 |
+
from .describe_image_tool import DescribeImageTool
|
12 |
+
from .openai_speech_to_text_tool import OpenAISpeechToTextTool
|
13 |
+
from .read_file_tool import ReadFileTool
|
14 |
+
from .youtube_transcription_tool import YouTubeTranscriptionTool
|
15 |
+
|
16 |
+
|
17 |
+
def get_tools() -> List[Tool]:
|
18 |
+
"""
|
19 |
+
Returns a list of available tools for the agent.
|
20 |
+
|
21 |
+
Returns:
|
22 |
+
List[Tool]: List of initialized tool instances.
|
23 |
+
"""
|
24 |
+
tools = [
|
25 |
+
DuckDuckGoSearchTool(),
|
26 |
+
PythonInterpreterTool(),
|
27 |
+
WikipediaSearchTool(),
|
28 |
+
VisitWebpageTool(),
|
29 |
+
OpenAISpeechToTextTool(),
|
30 |
+
YouTubeTranscriptionTool(),
|
31 |
+
ReadFileTool(),
|
32 |
+
DescribeImageTool(),
|
33 |
+
]
|
34 |
+
return tools
|
tools/youtube_transcription_tool.py
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from smolagents import Tool
|
2 |
+
from youtube_transcript_api import YouTubeTranscriptApi
|
3 |
+
|
4 |
+
|
5 |
+
class YouTubeTranscriptionTool(Tool):
|
6 |
+
"""
|
7 |
+
Tool to fetch the transcript of a YouTube video given its URL.
|
8 |
+
|
9 |
+
Args:
|
10 |
+
video_url (str): YouTube video URL.
|
11 |
+
|
12 |
+
Returns:
|
13 |
+
str: Transcript of the video as a single string.
|
14 |
+
"""
|
15 |
+
|
16 |
+
name = "youtube_transcription"
|
17 |
+
description = "Fetches the transcript of a YouTube video given its URL"
|
18 |
+
inputs = {
|
19 |
+
"video_url": {"type": "string", "description": "YouTube video URL"},
|
20 |
+
}
|
21 |
+
output_type = "string"
|
22 |
+
|
23 |
+
def forward(self, video_url: str) -> str:
|
24 |
+
video_id = video_url.strip().split("v=")[-1]
|
25 |
+
transcript = YouTubeTranscriptApi.get_transcript(video_id)
|
26 |
+
return " ".join([entry["text"] for entry in transcript])
|