Safetensors
modernbert
zli12321 commited on
Commit
13577c4
·
verified ·
1 Parent(s): 1da6c31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -37,6 +37,12 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
37
  - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
38
  - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
39
 
 
 
 
 
 
 
40
  #### Method: `compute_score`
41
  **Parameters**
42
  - `reference_answer` (list of str): A list of gold (correct) answers to the question
 
37
  - RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
38
  - We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
39
 
40
+ ### Installation
41
+ ```
42
+ ## For more evaluation metrics, refer to https://github.com/zli12321/qa_metrics
43
+ pip install qa-metrics
44
+ ```
45
+
46
  #### Method: `compute_score`
47
  **Parameters**
48
  - `reference_answer` (list of str): A list of gold (correct) answers to the question