Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -41,8 +41,22 @@ We argue that the key to advancing towards AGI lies in the synergy effectβa ca
|
|
41 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/-Asn68kJGjgqbGqZMrk4E.png' width=950px>
|
42 |
</div>
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
---
|
45 |
-
|
|
|
46 |
|
47 |
<div align="center">
|
48 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/32goE-PYuwOwRvYg4GcfK.png' width=900px>
|
@@ -51,10 +65,10 @@ We argue that the key to advancing towards AGI lies in the synergy effectβa ca
|
|
51 |
|
52 |
---
|
53 |
|
54 |
-
|
55 |
|
56 |
-
|
57 |
-
|
58 |
|
59 |
|
60 |
<div align="center">
|
@@ -73,7 +87,16 @@ This project introduces **General-Level** and **General-Bench**.
|
|
73 |
|
74 |
|
75 |
---
|
76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
<div align="center">
|
79 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/d4TIWw3rlWuxpBCEpHYJB.jpeg'>
|
|
|
41 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/-Asn68kJGjgqbGqZMrk4E.png' width=950px>
|
42 |
</div>
|
43 |
|
44 |
+
|
45 |
+
---
|
46 |
+
|
47 |
+
This project introduces **General-Level** and **General-Bench**.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
## πππ Keypoints
|
52 |
+
|
53 |
+
- [π Overall Leaderboard](#leaderboard)
|
54 |
+
- [π General-Level](#level)
|
55 |
+
- [π General-Bench](#bench)
|
56 |
+
|
57 |
---
|
58 |
+
|
59 |
+
# πππ Overall Leaderboard<a name="leaderboard" />
|
60 |
|
61 |
<div align="center">
|
62 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/32goE-PYuwOwRvYg4GcfK.png' width=900px>
|
|
|
65 |
|
66 |
---
|
67 |
|
68 |
+
# πππ General-Level<a name="level" />
|
69 |
|
70 |
+
**A 5-scale level evaluation system with a new norm for assessing the multimodal generalists (multimodal LLMs/agents).
|
71 |
+
The core is the use of <b style="color:red">synergy</b> as the evaluative criterion, categorizing capabilities based on whether MLLMs preserve synergy across comprehension and generation, as well as across multimodal interactions.**
|
72 |
|
73 |
|
74 |
<div align="center">
|
|
|
87 |
|
88 |
|
89 |
---
|
90 |
+
|
91 |
+
# πππ General-Bench<a name="bench" />
|
92 |
+
|
93 |
+
**A companion massive multimodal benchmark dataset, encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325K instances.**
|
94 |
+
|
95 |
+
|
96 |
+
We set two data domains:
|
97 |
+
- [**General-Bench-Openset**](https://huggingface.co/datasets/General-Level/General-Bench-Openset) with inputs and labels of samples all publicly open, for open-world use (e.g., academic experiment).
|
98 |
+
- [**General-Bench-Closeset**](https://huggingface.co/datasets/General-Level/General-Bench-Closeset) with only sample inputs available, which participants can use for ranking in our leaderboard.
|
99 |
+
|
100 |
|
101 |
<div align="center">
|
102 |
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/d4TIWw3rlWuxpBCEpHYJB.jpeg'>
|