Update README.md
Browse files
README.md
CHANGED
@@ -37,6 +37,12 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
|
|
37 |
- RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
|
38 |
- We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
#### Method: `compute_score`
|
41 |
**Parameters**
|
42 |
- `reference_answer` (list of str): A list of gold (correct) answers to the question
|
|
|
37 |
- RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
|
38 |
- We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
|
39 |
|
40 |
+
### Installation
|
41 |
+
```
|
42 |
+
## For more evaluation metrics, refer to https://github.com/zli12321/qa_metrics
|
43 |
+
pip install qa-metrics
|
44 |
+
```
|
45 |
+
|
46 |
#### Method: `compute_score`
|
47 |
**Parameters**
|
48 |
- `reference_answer` (list of str): A list of gold (correct) answers to the question
|