niurl
/

ScreenExplorer

Model card Files Files and versions

ScreenExplorer / README.md

niurl's picture

Update README.md

78006a9 verified 3 months ago

|

history blame contribute delete

1.58 kB

	---
	license: mit
	base_model:
	- Qwen/Qwen2.5-VL-3B-Instruct
	- Qwen/Qwen2.5-VL-7B-Instruct
	library_name: peft
	---

	<p align="center">
	<h1 align="center"> ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World </h1>
	</p>

	<p align="center">
	<a href="https://arxiv.org/abs/2505.19095">
	<img src="https://img.shields.io/badge/arXiv-2505.19095-b31b1b.svg" alt="arXiv">
	</a>
	<a href="https://github.com/niuzaisheng/ScreenExplorer">
	<img src="https://img.shields.io/badge/GitHub-ScreenExplorer-blue?logo=github&link=https://github.com/niuzaisheng/ScreenExplorer" alt="GitHub">
	</a>
	</p>

	We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.

	This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`.

	## Citation

	```bibtex
	@misc{niu2025screenexplorertrainingvisionlanguagemodel,
	title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World},
	author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
	year={2025},
	eprint={2505.19095},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2505.19095},
	}
	```