metadata
license: mit
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
- Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.
This repo contains the LoRA checkpoints in the training process of ScreenExplorer-3B-E1
and ScreenExplorer-7B-E1
. And LoRA checkpoints of ScreenExplorer-3B-Distill
.
Citation
@misc{niu2025screenexplorertrainingvisionlanguagemodel,
title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World},
author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
year={2025},
eprint={2505.19095},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.19095},
}