Safetensors
modernbert
zli12321 commited on
Commit
4d7b18d
·
verified ·
1 Parent(s): 34696d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -43
README.md CHANGED
@@ -4,14 +4,14 @@ license: apache-2.0
4
 
5
 
6
  # Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
7
- [[📖 Paper](https://arxiv.org/abs/2506.15068)]
8
 
9
 
10
  ## About Open-Ended R1 Training
11
  As open-ended long-form generation gains traction, reliably judging the quality of multi-sentence and paragraph-length outputs has become a major hurdle—traditional overlap metrics like ROUGE-L and BERTScore often miss nuances of coherence, style, and relevance, and can be skewed by pretraining biases. This leaves a critical gap in evaluation methods for guiding and training models that produce lengthy, free-form text.
12
 
13
 
14
- # VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
15
 
16
  [Zongxia Li*](https://zli12321.github.io/), [Xiyang Wu*](https://wuxiyang1996.github.io/), [Yubin Qin](https://www.linkedin.com/in/yubin-qin/), [Guangyao Shi](https://guangyaoshi.github.io/), [Hongyang Du](https://www.linkedin.com/in/hongyangdu/), [Dinesh Manocha](https://www.cs.umd.edu/people/dmanocha), [Tianyi Zhou](https://tianyizhou.github.io/), [Jordan Lee Boyd-Graber](https://users.umiacs.umd.edu/~ying/)
17
 
@@ -24,16 +24,16 @@ With the recent success of video generation models such as [Sora](https://openai
24
 
25
  We introduce VideoHallu, a curated dataset that includes videos generated by seven video generation models and a question-answer set to test MLLM's abilities to catch generated videos' abnormalities.
26
 
27
- We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding.
28
 
29
 
30
 
31
- ## 🔥 News
32
  - [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
33
 
 
34
 
35
-
36
- ## 🏅 <a name='rb'></a>Reward Model
37
  - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
38
  - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
39
 
@@ -64,17 +64,6 @@ We sincerely appreciate the contributions of the open-source community. The rela
64
  If you find our work helpful for your research, please consider citing our work.
65
 
66
  ```
67
- @misc{li2025videohalluevaluatingmitigatingmultimodal,
68
- title={VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos},
69
- author={Zongxia Li and Xiyang Wu and Yubin Qin and Guangyao Shi and Hongyang Du and Dinesh Manocha and Tianyi Zhou and Jordan Lee Boyd-Graber},
70
- year={2025},
71
- eprint={2505.01481},
72
- archivePrefix={arXiv},
73
- primaryClass={cs.CV},
74
- url={https://arxiv.org/abs/2505.01481},
75
- }
76
-
77
-
78
  @misc{li2025semanticallyawarerewardsopenendedr1,
79
  title={Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation},
80
  author={Zongxia Li and Yapei Chang and Yuhang Zhou and Xiyang Wu and Zichao Liang and Yoo Yeon Sung and Jordan Lee Boyd-Graber},
@@ -86,34 +75,14 @@ If you find our work helpful for your research, please consider citing our work.
86
  }
87
 
88
 
89
- ## Hallucination
90
- @misc{guan2024hallusionbenchadvanceddiagnosticsuite,
91
- title={HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models},
92
- author={Tianrui Guan and Fuxiao Liu and Xiyang Wu and Ruiqi Xian and Zongxia Li and Xiaoyu Liu and Xijun Wang and Lichang Chen and Furong Huang and Yaser Yacoob and Dinesh Manocha and Tianyi Zhou},
93
- year={2024},
94
- eprint={2310.14566},
95
- archivePrefix={arXiv},
96
- primaryClass={cs.CV},
97
- url={https://arxiv.org/abs/2310.14566},
98
- }
99
-
100
- @misc{wu2024autohallusionautomaticgenerationhallucination,
101
- title={AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models},
102
- author={Xiyang Wu and Tianrui Guan and Dianqi Li and Shuaiyi Huang and Xiaoyu Liu and Xijun Wang and Ruiqi Xian and Abhinav Shrivastava and Furong Huang and Jordan Lee Boyd-Graber and Tianyi Zhou and Dinesh Manocha},
103
- year={2024},
104
- eprint={2406.10900},
105
- archivePrefix={arXiv},
106
- primaryClass={cs.CV},
107
- url={https://arxiv.org/abs/2406.10900},
108
- }
109
-
110
- @misc{li2025surveystateartlarge,
111
- title={A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges},
112
- author={Zongxia Li and Xiyang Wu and Hongyang Du and Fuxiao Liu and Huy Nghiem and Guangyao Shi},
113
  year={2025},
114
- eprint={2501.02189},
115
  archivePrefix={arXiv},
116
  primaryClass={cs.CV},
117
- url={https://arxiv.org/abs/2501.02189},
118
  }
119
  ```
 
4
 
5
 
6
  # Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
7
+ [[📖 Paper](https://arxiv.org/abs/2506.15068)]
8
 
9
 
10
  ## About Open-Ended R1 Training
11
  As open-ended long-form generation gains traction, reliably judging the quality of multi-sentence and paragraph-length outputs has become a major hurdle—traditional overlap metrics like ROUGE-L and BERTScore often miss nuances of coherence, style, and relevance, and can be skewed by pretraining biases. This leaves a critical gap in evaluation methods for guiding and training models that produce lengthy, free-form text.
12
 
13
 
14
+ <!-- # VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
15
 
16
  [Zongxia Li*](https://zli12321.github.io/), [Xiyang Wu*](https://wuxiyang1996.github.io/), [Yubin Qin](https://www.linkedin.com/in/yubin-qin/), [Guangyao Shi](https://guangyaoshi.github.io/), [Hongyang Du](https://www.linkedin.com/in/hongyangdu/), [Dinesh Manocha](https://www.cs.umd.edu/people/dmanocha), [Tianyi Zhou](https://tianyizhou.github.io/), [Jordan Lee Boyd-Graber](https://users.umiacs.umd.edu/~ying/)
17
 
 
24
 
25
  We introduce VideoHallu, a curated dataset that includes videos generated by seven video generation models and a question-answer set to test MLLM's abilities to catch generated videos' abnormalities.
26
 
27
+ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding. -->
28
 
29
 
30
 
31
+ <!-- ## 🔥 News
32
  - [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
33
 
34
+ -->
35
 
36
+ ## 🏅 <a name='rb'></a> 🔥 Reward Model
 
37
  - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
38
  - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
39
 
 
64
  If you find our work helpful for your research, please consider citing our work.
65
 
66
  ```
 
 
 
 
 
 
 
 
 
 
 
67
  @misc{li2025semanticallyawarerewardsopenendedr1,
68
  title={Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation},
69
  author={Zongxia Li and Yapei Chang and Yuhang Zhou and Xiyang Wu and Zichao Liang and Yoo Yeon Sung and Jordan Lee Boyd-Graber},
 
75
  }
76
 
77
 
78
+ ## VLMs that use RewardBert as an evaluator
79
+ @misc{li2025videohalluevaluatingmitigatingmultimodal,
80
+ title={VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos},
81
+ author={Zongxia Li and Xiyang Wu and Yubin Qin and Guangyao Shi and Hongyang Du and Dinesh Manocha and Tianyi Zhou and Jordan Lee Boyd-Graber},
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  year={2025},
83
+ eprint={2505.01481},
84
  archivePrefix={arXiv},
85
  primaryClass={cs.CV},
86
+ url={https://arxiv.org/abs/2505.01481},
87
  }
88
  ```