Safetensors
modernbert
zli12321 commited on
Commit
8c319e0
·
verified ·
1 Parent(s): 0948786

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -18
README.md CHANGED
@@ -20,26 +20,9 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
20
 
21
 
22
  ## 🔥 News
23
- - [2025/05/02] We release our datasets in huggingface🤗.
24
 
25
- ## 🔍 Dataset
26
 
27
- To facilitate GRPO training, we also randomly sample 1,000 videos from [PhysBench](https://huggingface.co/datasets/WeiChow/PhysBench-train) training data to first improve model' reasoning abilities in real-world videos, then train the model on part of our synthetic videos.
28
-
29
- Our data spans the following categories:
30
-
31
- <img src="./images/fig1.png" style="zoom:35%;" />
32
-
33
-
34
- ## Getting Started
35
-
36
- ```
37
- # Download the dataset
38
- pip install huggingface_hub
39
-
40
- # Download data to your local dir
41
- huggingface-cli download IntelligenceLab/VideoHallu --repo-type dataset --local-dir ./new_video_folders --local-dir-use-symlinks False
42
- ```
43
 
44
  ## 🏅 <a name='rb'></a>Reward Model
45
  We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
 
20
 
21
 
22
  ## 🔥 News
23
+ - [2025/05/02] We release our datasets in [huggingface](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)🤗.
24
 
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## 🏅 <a name='rb'></a>Reward Model
28
  We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.