IntelligenceLab
/

RewardPreferenceBert

Model card Files Files and versions Community

zli12321 commited on May 6

Commit

8c319e0

·

verified ·

1 Parent(s): 0948786

Update README.md

Files changed (1) hide show

README.md +1 -18

README.md CHANGED Viewed

@@ -20,26 +20,9 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
 ## 🔥 News
-- [2025/05/02] We release our datasets in huggingface🤗.
-## 🔍 Dataset
-To facilitate GRPO training, we also randomly sample 1,000 videos from [PhysBench](https://huggingface.co/datasets/WeiChow/PhysBench-train) training data to first improve model' reasoning abilities in real-world videos, then train the model on part of our synthetic videos.
-Our data spans the following categories:
-<img src="./images/fig1.png" style="zoom:35%;" />
-## Getting Started
-```
-# Download the dataset
-pip install huggingface_hub
-# Download data to your local dir
-huggingface-cli download IntelligenceLab/VideoHallu --repo-type dataset --local-dir ./new_video_folders --local-dir-use-symlinks False
-```
 ## 🏅 <a name='rb'></a>Reward Model
 We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.

 ## 🔥 News
+- [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
 ## 🏅 <a name='rb'></a>Reward Model
 We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.