Spaces:

General-Level
/

README

Running

File size: 5,463 Bytes

75d0766
 
 
 
 
 
 
 
 
b1bfa3e
 
c9f89df
b1bfa3e
61a0387
b1bfa3e
 
7b074f6
39ed111
8220869
c8b8826
 
7b074f6
b1bfa3e
 
 
 
 
 
 
 
 
 
302ccff
b1bfa3e
 
 
302ccff
b1bfa3e
302ccff
b1bfa3e
 
302ccff
 
 
 
918602a
 
 
 
 
 
 
 
 
 
 
 
a562cec
918602a
302ccff
918602a
a562cec
 
302ccff
 
18c35de
302ccff
 
 
 
 
a562cec
 
918602a
 
302ccff
 
 
 
 
b1bfa3e
 
 
302ccff
 
 
 
 
 
 
918602a
a562cec
 
ef07489
918602a
 
 
9e69f95
 
201fd1e
918602a
302ccff
 
a562cec
302ccff
 
 
 
 
 
 
a562cec
302ccff
 
a562cec

---
title: README
emoji: 🌍
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/N8lP93rB6lL3iqzML4SKZ.png'  width=100px>

<h1 align="center"><b>On Path to Multimodal Generalist: General-Level and General-Bench</b></h1>
<p align="center">
<a href="https://generalist.top/">[📖 Project]</a>
<a href="https://generalist.top/leaderboard">[🏆 Leaderboard]</a>
<a href="https://arxiv.org/abs/2505.04620">[📄 Paper]</a>
<a href="https://huggingface.co/papers/2505.04620">[🤗 Paper-HF]</a>
<a href="https://huggingface.co/datasets/General-Level/General-Bench-Closeset">[🤗 Dataset-HF (Close-Set)]</a>
<a href="https://huggingface.co/datasets/General-Level/General-Bench-Openset">[🤗 Dataset-HF (Open-Set)]</a>
<a href="https://github.com/path2generalist/General-Level">[📝 Github]</a>
</p>

---
</div>



<h1 align="center" style="color:#F27E7E"><em>
Does higher performance across tasks indicate a stronger capability of MLLM, and closer to AGI?
<br>
NO! But <b style="color:red">synergy</b> does.
</em></h1>


Most current MLLMs predominantly build on the language intelligence of LLMs to simulate the indirect intelligence of multimodality, which is merely extending language intelligence to aid multimodal understanding. While LLMs (e.g., ChatGPT) have already demonstrated such synergy in NLP, reflecting language intelligence, unfortunately, the vast majority of MLLMs do not really achieve it across modalities and tasks.

We argue that the key to advancing towards AGI lies in the synergy effect—a capability that enables knowledge learned in one modality or task to generalize and enhance mastery in other modalities or tasks, fostering mutual improvement across different modalities and tasks through interconnected learning.


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/-Asn68kJGjgqbGqZMrk4E.png'  width=950px>
</div>


---

This project introduces **General-Level** and **General-Bench**.

---

## 🌐🌐🌐 Keypoints

- [🏆 Overall Leaderboard](#leaderboard)
- [🚀 General-Level](#level)
- [🍕 General-Bench](#bench)
- [📌 Citation](#cite)

---

<h1 style="font-weight: bold; text-decoration: none;"> 🏆🏆🏆 Overall Leaderboard <a name="leaderboard" /> </a> </h1>


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/s1Q7t6Nmtnmv3bSvkquT0.png'  width=1200px>
</div>


---

<h1 style="font-weight: bold; text-decoration: none;"> 🚀🚀🚀 General-Level <a name="level" /> </a> </h1>

**A 5-scale level evaluation system with a new norm for assessing the multimodal generalists (multimodal LLMs/agents).  
The core is the use of <b style="color:red">synergy</b> as the evaluative criterion, categorizing capabilities based on whether MLLMs preserve synergy across comprehension and generation, as well as across multimodal interactions.**


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/lnvh5Qri9O23uk3BYiedX.jpeg'>
</div>



<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/BPqs-3UODQWvjFzvZYkI4.png'  width=1000px>
</div>



---

<h1 style="font-weight: bold; text-decoration: none;"> 🍕🍕🍕 General-Bench <a name="bench" /> </a> </h1>

  
**A companion  massive multimodal benchmark dataset, encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325K instances.**


We set two dataset types according to the use purpose:
- [**General-Bench-Openset**](https://huggingface.co/datasets/General-Level/General-Bench-Openset) with inputs and labels of samples all publicly open, for **free open-world use** (e.g., for academic experiment/comparisons).
- [**General-Bench-Closeset**](https://huggingface.co/datasets/General-Level/General-Bench-Closeset) with only sample inputs available, which is used for **leaderboard ranking**. Participants need to submit the predictions to us for internal evaluation.


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/d4TIWw3rlWuxpBCEpHYJB.jpeg'  width=1000px>
</div>





<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/qkD43ne58w31Z7jpkTKjr.jpeg'  width=900px>
</div>



---

<h1 style="font-weight: bold; text-decoration: none;"> 📌📌📌 Citation <a name="cite" /> </a> </h1>

If you find this project useful to your research, please kindly cite our paper:

```bibtex
@articles{fei2025pathmultimodalgeneralistgenerallevel,
  title={On Path to Multimodal Generalist: General-Level and General-Bench},
  author={Hao Fei and Yuan Zhou and Juncheng Li and Xiangtai Li and Qingshan Xu and Bobo Li and Shengqiong Wu and Yaoting Wang and Junbao Zhou and Jiahao Meng and Qingyu Shi and Zhiyuan Zhou and Liangtao Shi and Minghe Gao and Daoan Zhang and Zhiqi Ge and Weiming Wu and Siliang Tang and Kaihang Pan and Yaobo Ye and Haobo Yuan and Tao Zhang and Tianjie Ju and Zixiang Meng and Shilin Xu and Liyu Jia and Wentao Hu and Meng Luo and Jiebo Luo and Tat-Seng Chua and Shuicheng Yan and Hanwang Zhang},
  eprint={2505.04620},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
  url={https://arxiv.org/abs/2505.04620},
}

```