File size: 7,416 Bytes

3c7edc4
 
 
 
807a009
3c7edc4
807a009
3c7edc4

---
license: mit
pipeline_tag: robotics
---

# ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

This repository contains the VMWare file of our pre-made Ubuntu mirror, which serves as the environment for **ScienceBoard**.

*   **Paper**: [ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows](https://huggingface.co/papers/2505.19897)
*   **Project Page**: [https://qiushisun.github.io/ScienceBoard-Home/](https://qiushisun.github.io/ScienceBoard-Home/)
*   **Code (main repository)**: [https://github.com/OS-Copilot/ScienceBoard](https://github.com/OS-Copilot/ScienceBoard)

## Overview

ScienceBoard introduces a realistic, multi-domain environment and a challenging benchmark for evaluating multimodal autonomous agents. It features dynamic and visually rich scientific workflows with integrated professional software, enabling agents to autonomously interact via different interfaces to accelerate complex research tasks and experiments.

The benchmark consists of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. This environment is designed for computer-using agents, capable of interacting with operating systems as humans do, to solve complex scientific problems.

<div align="center">
  <img src="https://github.com/OS-Copilot/ScienceBoard/blob/main/static/scienceboard_badge_v0.png" alt="ScienceBoard Overview" style="zoom:80%;" />
</div>

## Usage

### Installation

The infrastructure of the framework is based on [OSWorld](https://github.com/xlang-ai/OSWorld) together with VMware Workstation Pro (which is free for personal use since May, 2024) in Ubuntu or Windows. Please make sure that your device meets the minimal requirements of these preliminaries.

1.  Download [VMware Workstation Pro 17](https://support.broadcom.com/group/ecx/productdownloads?subfamily=VMware%20Workstation%20Pro&freeDownloads=true) and our pre-made images from [Hugging Face](https://huggingface.co/OS-Copilot/ScienceBoard-Env/blob/main/VM.zip).
2.  Clone the main ScienceBoard repository and install packages needed:

    ```shell
    git clone https://github.com/OS-Copilot/ScienceBoard
    cd ScienceBoard
    conda create -n sci python=3.11
    conda activate sci
    pip install -r requirements.txt
    ```

3.  We recommend you to change evaluating process in [`main.py`](https://github.com/OS-Copilot/ScienceBoard/blob/main/main.py) directly with some sensitive information hidden in environment variables, especially when it comes to complicate configs concerning [`community`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/community.py?plain=1#L232).

> [!NOTE]  
> [`Community`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/community.py?plain=1#L16) specifies the form of cooperation in which one or more models completes the tasks. You can customize your own multi-agents system by creating a new class inheriting `Community` together with the method of `__call__()`.

### Environment Configuration

The `env.sh` file or environment variables are used as a storage location for sensitive information and configuration.
For example:
-   `VM_PATH`: path to vmware .vmx file; will be automatically extracted (repeatedly) if set to path of `VM.zip`.
-   `HTTPX_PROXY`: proxy URL if needed; avoid clashes with `HTTP_PROXY` and `HTTPS_PROXY` on Linux.
-   `OPENAI_API_KEY`: API key for OpenAI GPT.
-   `GOOGLE_API_KEY`: API key for Google Gemini.
-   `ANTHROPIC_API_KEY`: API key for Anthropic Claude.

And variables for open-source models:

| Model       | Base URL        | Name             |
| :----------: | :---------------: | :----------------: |
| QwenVL       | `QWEN_VL_URL`   | `QWEN_VL_NAME`   |
| InternVL     | `INTERN_VL_URL` | `INTERN_VL_NAME` |
| QVQ          | `QVQ_VL_URL`    | `QVQ_VL_NAME`    |
| OS-Atlas     | `OS_ACT_URL`    | `OS_ACT_NAME`    |
| GUI-Actor    | `GUI_ACTOR_URL` | `GUI_ACTOR_NAME` |
| UI-Tars      | `TARS_DPO_URL`  | `TARS_DPO_NAME`  |

Detailed configurations for specific applications like Lean 4 REPL, Qt6, KAlgebra, Celestia, and Grass GIS are defined in `sci/Presets.py` in the main code repository.

### Parameter Configuration

1.  [`Automata`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/Tester.py?plain=1#L87): a simple encapsulation for [`Model`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/model.py?plain=1#L144) and [`Agent`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/agent.py?plain=1#L51)
    *   `model_style`: affect the request format and response processing of model calling; you can customize your own style by adding `_request_{style}()` and `_access_{style}()` under [`Model`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/model.py?plain=1#L144);
    *   `overflow_style`: affect the way we detect overflow of token; you can customize your own style by adding `{style}()` under [`Overflow`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/agent.py?plain=1#L24);
    *   `code_style`: affect the way we process code blocks when communicating with models; you can customize your own style by adding `wrap_{style}()` and `extract_{style}()` under [`CodeLike`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/prompt.py?plain=1#L84).
2.  [`Tester`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/Tester.py?plain=1#L225): `__init__()` only register a new config. use `__call__()` for actual evaluation after init.
    *   `tasks_path`: the directory or file path for json file(s) of task(s); all `*.json` files under the path specified will be recursively loaded when a directory path is provided;
    *   `logs_path`: the directory path for log files and will be created automatically when not existed; the structure of the directory will be arranged according to that under `tasks_path`;
    *   `community`: the way of cooperation among multiple agents; use [`AllInOne`](https://github.com/OS-Copilot/ScienceBoard/blob/main/sci/base/community.py?plain=1#L52) for standard setting inherited from OSWorld;
    *   `ignore`: skipped when log indicates that the task is finished (by checking the existence of `result.out`) if set to `True`; so you can re-run the same program to retry failure cases only;
    *   `debug`: finish the tasks manually instead of calling models;
    *   `relative`: allow VM to execute `pyautogui` codes with relative coordinates; basically used by InternVL-3.

## Recommended Configuration

It is recommended to run this project with at least the following configuration:

-   **CPU**: Intel Core i7-11700
-   **GPU**: Integrated graphics is sufficient
-   **Memory**: 32 GB RAM
-   **Storage**: > 100 GB available disk space

## Citation

If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:

```bibtex
@article{sun2025scienceboard,
  title={ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows},
  author={Sun, Qiushi and Liu, Zhoumianze and Ma, Chang and Ding, Zichen and Xu, Fangzhi and Yin, Zhangyue and Zhao, Haiteng and Wu, Zhenyu and Cheng, Kanzhi and Liu, Zhaoyang and others},
  journal={arXiv preprint arXiv:2505.19897},
  year={2025}
}
```