IntelligenceLab
/

RewardPreferenceBert

Model card Files Files and versions Community

zli12321 commited on Jun 19

Commit

4391f77

·

verified ·

1 Parent(s): f02450c

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -25,6 +25,9 @@ We introduce VideoHallu, a curated dataset that includes videos generated by sev
 We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding.
 ## 🔥 News
 - [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.

 We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding.
+## About Open-Ended R1 Training
+As open-ended long-form generation gains traction, reliably judging the quality of multi-sentence and paragraph-length outputs has become a major hurdle—traditional overlap metrics like ROUGE-L and BERTScore often miss nuances of coherence, style, and relevance, and can be skewed by pretraining biases. This leaves a critical gap in evaluation methods for guiding and training models that produce lengthy, free-form text.
 ## 🔥 News
 - [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.