Update README.md
Browse files
README.md
CHANGED
@@ -20,26 +20,9 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
|
|
20 |
|
21 |
|
22 |
## 🔥 News
|
23 |
-
- [2025/05/02] We release our datasets in huggingface🤗.
|
24 |
|
25 |
-
## 🔍 Dataset
|
26 |
|
27 |
-
To facilitate GRPO training, we also randomly sample 1,000 videos from [PhysBench](https://huggingface.co/datasets/WeiChow/PhysBench-train) training data to first improve model' reasoning abilities in real-world videos, then train the model on part of our synthetic videos.
|
28 |
-
|
29 |
-
Our data spans the following categories:
|
30 |
-
|
31 |
-
<img src="./images/fig1.png" style="zoom:35%;" />
|
32 |
-
|
33 |
-
|
34 |
-
## Getting Started
|
35 |
-
|
36 |
-
```
|
37 |
-
# Download the dataset
|
38 |
-
pip install huggingface_hub
|
39 |
-
|
40 |
-
# Download data to your local dir
|
41 |
-
huggingface-cli download IntelligenceLab/VideoHallu --repo-type dataset --local-dir ./new_video_folders --local-dir-use-symlinks False
|
42 |
-
```
|
43 |
|
44 |
## 🏅 <a name='rb'></a>Reward Model
|
45 |
We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|
|
|
20 |
|
21 |
|
22 |
## 🔥 News
|
23 |
+
- [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
|
24 |
|
|
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## 🏅 <a name='rb'></a>Reward Model
|
28 |
We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|