|
# Open Deep Research |
|
|
|
Welcome to this open replication of [OpenAI's Deep Research](https://openai.com/index/introducing-deep-research/)! This agent attempts to replicate OpenAI's model and achieve similar performance on research tasks. |
|
|
|
Read more about this implementation's goal and methods in our [blog post](https://huggingface.co/blog/open-deep-research). |
|
|
|
|
|
This agent achieves **55% pass@1** on the GAIA validation set, compared to **67%** for the original Deep Research. |
|
|
|
## Setup |
|
|
|
To get started, follow the steps below: |
|
|
|
### Clone the repository |
|
|
|
```bash |
|
git clone https://github.com/huggingface/smolagents.git |
|
cd smolagents/examples/open_deep_research |
|
``` |
|
|
|
### Install dependencies |
|
|
|
Run the following command to install the required dependencies from the `requirements.txt` file: |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Install the development version of `smolagents` |
|
|
|
```bash |
|
pip install -e ../../.[dev] |
|
``` |
|
|
|
### Set up environment variables |
|
|
|
The agent uses the `GoogleSearchTool` for web search, which requires an environment variable with the corresponding API key, based on the selected provider: |
|
- `SERPAPI_API_KEY` for SerpApi: [Sign up here to get a key](https://serpapi.com/users/sign_up) |
|
- `SERPER_API_KEY` for Serper: [Sign up here to get a key](https://serper.dev/signup) |
|
|
|
Depending on the model you want to use, you may need to set environment variables. |
|
For example, to use the default `o1` model, you need to set the `OPENAI_API_KEY` environment variable. |
|
[Sign up here to get a key](https://platform.openai.com/signup). |
|
|
|
> [!WARNING] |
|
> The use of the default `o1` model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini |
|
|
|
|
|
## Usage |
|
|
|
Then you're good to go! Run the run.py script, as in: |
|
```bash |
|
python run.py --model-id "o1" "Your question here!" |
|
``` |
|
|
|
## Full reproducibility of results |
|
|
|
The data used in our submissions to GAIA was augmented in this way: |
|
- For each single-page .pdf or .xls file, it was opened in a file reader (MacOS Sonoma Numbers or Preview), and a ".png" screenshot was taken and added to the folder. |
|
- Then for any file used in a question, the file loading system checks if there is a ".png" extension version of the file, and loads it instead of the original if it exists. |
|
|
|
This process was done manually but could be automatized. |
|
|
|
After processing, the annotated was uploaded to a [new dataset](https://huggingface.co/datasets/smolagents/GAIA-annotated). You need to request access (granted instantly). |