Duibonduil's picture
Upload 7 files
68e0793 verified

A newer version of the Gradio SDK is available: 5.35.0

Upgrade

Open Deep Research

Welcome to this open replication of OpenAI's Deep Research! This agent attempts to replicate OpenAI's model and achieve similar performance on research tasks.

Read more about this implementation's goal and methods in our blog post.

This agent achieves 55% pass@1 on the GAIA validation set, compared to 67% for the original Deep Research.

Setup

To get started, follow the steps below:

Clone the repository

git clone https://github.com/huggingface/smolagents.git
cd smolagents/examples/open_deep_research

Install dependencies

Run the following command to install the required dependencies from the requirements.txt file:

pip install -r requirements.txt

Install the development version of smolagents

pip install -e ../../.[dev]

Set up environment variables

The agent uses the GoogleSearchTool for web search, which requires an environment variable with the corresponding API key, based on the selected provider:

Depending on the model you want to use, you may need to set environment variables. For example, to use the default o1 model, you need to set the OPENAI_API_KEY environment variable. Sign up here to get a key.

The use of the default o1 model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini

Usage

Then you're good to go! Run the run.py script, as in:

python run.py --model-id "o1" "Your question here!"

Full reproducibility of results

The data used in our submissions to GAIA was augmented in this way:

  • For each single-page .pdf or .xls file, it was opened in a file reader (MacOS Sonoma Numbers or Preview), and a ".png" screenshot was taken and added to the folder.
  • Then for any file used in a question, the file loading system checks if there is a ".png" extension version of the file, and loads it instead of the original if it exists.

This process was done manually but could be automatized.

After processing, the annotated was uploaded to a new dataset. You need to request access (granted instantly).