Improve model card with pipeline tag and library name
Browse filesThis PR improves the model card by:
- Adding the `pipeline_tag: image-text-to-text`, ensuring the model is easily discoverable when searching for image-to-text models.
- Specifying the `library_name: transformers`, which enables the "How to use this model" widget on the model page.
- Correcting the arXiv link in the README.
README.md
CHANGED
@@ -1,21 +1,22 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
4 |
|
5 |
-
|
6 |
<p align="center">
|
7 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/q3Anm7o-MoNYjB8JztGVT.png" width="60%" />
|
8 |
</p>
|
9 |
|
10 |
<font size=3><div align='center' >
|
11 |
-
[[π arXiv Paper](https://arxiv.org/abs/
|
12 |
[[π R1-Reward Code](https://github.com/yfzhang114/r1_reward)]
|
13 |
[[π R1-Reward Data](https://huggingface.co/datasets/yifanzhang114/R1-Reward-RL)]
|
14 |
</div></font>
|
15 |
|
16 |
# Training Multimodal Reward Model Through Stable Reinforcement Learning
|
17 |
|
18 |
-
π₯ We are proud to open-source **R1-Reward**, a comprehensive project for
|
19 |
|
20 |
* **R1-Reward Model:** A state-of-the-art (SOTA) multimodal reward model demonstrating substantial gains (Voting@15):
|
21 |
* **13.5%** improvement on VL Reward-Bench.
|
@@ -45,5 +46,4 @@ If you find it useful for your research and applications, please cite related pa
|
|
45 |
- [MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?](https://github.com/yfzhang114/MME-RealWorld)
|
46 |
- [MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs](https://arxiv.org/abs/2411.15296)
|
47 |
- [Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models](https://github.com/yfzhang114/SliME)
|
48 |
-
- [VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction](https://github.com/VITA-MLLM/VITA)
|
49 |
-
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
pipeline_tag: image-text-to-text
|
4 |
+
library_name: transformers
|
5 |
---
|
6 |
|
|
|
7 |
<p align="center">
|
8 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/q3Anm7o-MoNYjB8JztGVT.png" width="60%" />
|
9 |
</p>
|
10 |
|
11 |
<font size=3><div align='center' >
|
12 |
+
[[π arXiv Paper](https://arxiv.org/abs/2505.02835)]
|
13 |
[[π R1-Reward Code](https://github.com/yfzhang114/r1_reward)]
|
14 |
[[π R1-Reward Data](https://huggingface.co/datasets/yifanzhang114/R1-Reward-RL)]
|
15 |
</div></font>
|
16 |
|
17 |
# Training Multimodal Reward Model Through Stable Reinforcement Learning
|
18 |
|
19 |
+
π₯ We are proud to open-source **R1-Reward**, a comprehensive project for improving reward modeling through reinforcement learning. This release includes:
|
20 |
|
21 |
* **R1-Reward Model:** A state-of-the-art (SOTA) multimodal reward model demonstrating substantial gains (Voting@15):
|
22 |
* **13.5%** improvement on VL Reward-Bench.
|
|
|
46 |
- [MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?](https://github.com/yfzhang114/MME-RealWorld)
|
47 |
- [MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs](https://arxiv.org/abs/2411.15296)
|
48 |
- [Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models](https://github.com/yfzhang114/SliME)
|
49 |
+
- [VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction](https://github.com/VITA-MLLM/VITA)
|
|