# Open Deep Research Welcome to this open replication of [OpenAI's Deep Research](https://openai.com/index/introducing-deep-research/)! This agent attempts to replicate OpenAI's model and achieve similar performance on research tasks. Read more about this implementation's goal and methods in our [blog post](https://huggingface.co/blog/open-deep-research). This agent achieves **55% pass@1** on the GAIA validation set, compared to **67%** for the original Deep Research. ## Setup To get started, follow the steps below: ### Clone the repository ```bash git clone https://github.com/huggingface/smolagents.git cd smolagents/examples/open_deep_research ``` ### Install dependencies Run the following command to install the required dependencies from the `requirements.txt` file: ```bash pip install -r requirements.txt ``` ### Install the development version of `smolagents` ```bash pip install -e ../../.[dev] ``` ### Set up environment variables The agent uses the `GoogleSearchTool` for web search, which requires an environment variable with the corresponding API key, based on the selected provider: - `SERPAPI_API_KEY` for SerpApi: [Sign up here to get a key](https://serpapi.com/users/sign_up) - `SERPER_API_KEY` for Serper: [Sign up here to get a key](https://serper.dev/signup) Depending on the model you want to use, you may need to set environment variables. For example, to use the default `o1` model, you need to set the `OPENAI_API_KEY` environment variable. [Sign up here to get a key](https://platform.openai.com/signup). > [!WARNING] > The use of the default `o1` model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini ## Usage Then you're good to go! Run the run.py script, as in: ```bash python run.py --model-id "o1" "Your question here!" ``` ## Full reproducibility of results The data used in our submissions to GAIA was augmented in this way: - For each single-page .pdf or .xls file, it was opened in a file reader (MacOS Sonoma Numbers or Preview), and a ".png" screenshot was taken and added to the folder. - Then for any file used in a question, the file loading system checks if there is a ".png" extension version of the file, and loads it instead of the original if it exists. This process was done manually but could be automatized. After processing, the annotated was uploaded to a [new dataset](https://huggingface.co/datasets/smolagents/GAIA-annotated). You need to request access (granted instantly).