Papers
arxiv:2505.07062

Seed1.5-VL Technical Report

Published on May 11
· Submitted by wondervictor on May 13
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)

Community

Paper author Paper submitter

Seed1.5-VL, a powerful and efficient vision-language foundation model designed for advanced general-purpose multimodal understanding and reasoning, achieves state-of-the-art performance on 38 out of 60 public benchmarks.

GitHub: https://github.com/ByteDance-Seed/Seed1.5-VL
API: https://www.volcengine.com/product/doubao

·

Will this model be open source in the future?

an audio overview for learning on the go: https://youtu.be/h-l7jqKs-Xg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.07062 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.07062 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 4