File size: 1,580 Bytes
cba70d1
 
 
 
 
78006a9
cba70d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78006a9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: mit
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
- Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
---

<p align="center">
<h1 align="center"> ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World </h1>
</p>

<p align="center">
  <a href="https://arxiv.org/abs/2505.19095">
    <img src="https://img.shields.io/badge/arXiv-2505.19095-b31b1b.svg" alt="arXiv">
  </a>
  <a href="https://github.com/niuzaisheng/ScreenExplorer">
    <img src="https://img.shields.io/badge/GitHub-ScreenExplorer-blue?logo=github&link=https://github.com/niuzaisheng/ScreenExplorer" alt="GitHub">
  </a>
</p>

We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.

This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`.

## Citation

```bibtex
@misc{niu2025screenexplorertrainingvisionlanguagemodel,
      title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World}, 
      author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
      year={2025},
      eprint={2505.19095},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.19095}, 
}
```