|
--- |
|
license: mit |
|
base_model: |
|
- Qwen/Qwen2.5-VL-3B-Instruct |
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
library_name: peft |
|
--- |
|
|
|
<p align="center"> |
|
<h1 align="center"> ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World </h1> |
|
</p> |
|
|
|
<p align="center"> |
|
<a href="https://arxiv.org/abs/2505.19095"> |
|
<img src="https://img.shields.io/badge/arXiv-2505.19095-b31b1b.svg" alt="arXiv"> |
|
</a> |
|
<a href="https://github.com/niuzaisheng/ScreenExplorer"> |
|
<img src="https://img.shields.io/badge/GitHub-ScreenExplorer-blue?logo=github&link=https://github.com/niuzaisheng/ScreenExplorer" alt="GitHub"> |
|
</a> |
|
</p> |
|
|
|
We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration. |
|
|
|
This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{niu2025screenexplorertrainingvisionlanguagemodel, |
|
title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World}, |
|
author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang}, |
|
year={2025}, |
|
eprint={2505.19095}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.AI}, |
|
url={https://arxiv.org/abs/2505.19095}, |
|
} |
|
``` |