One Vision-Language-Action Model for GUI Agent
Qinghong (Kevin) Lin
KevinQHLin
AI & ML interests
Vision-Language Model, Video Understanding, Human-AI Interaction
Recent Activity
upvoted an article about 22 hours ago
When Vision Meets Code authored
a paper
6 days ago
Learning Video Context as Interleaved Multimodal Sequences published an
article
7 days ago
When Vision Meets Code