mkhalifa commited on
Commit
593f8de
·
verified ·
1 Parent(s): b1687f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -15,6 +15,9 @@ tags:
15
 
16
  ThinkPRM-14B is a generative Process Reward Model (PRM) based on the R1-Distill-Qwen-14B architecture. It is fine-tuned to perform step-by-step verification of reasoning processes (like mathematical solutions) by generating an explicit verification chain-of-thought (CoT) that involves labeling every step. It is designed to be highly data-efficient, requiring significantly less supervision data than traditional discriminative PRMs while achieving strong performance.
17
 
 
 
 
18
  ## Model Details
19
 
20
  ### Model Description
 
15
 
16
  ThinkPRM-14B is a generative Process Reward Model (PRM) based on the R1-Distill-Qwen-14B architecture. It is fine-tuned to perform step-by-step verification of reasoning processes (like mathematical solutions) by generating an explicit verification chain-of-thought (CoT) that involves labeling every step. It is designed to be highly data-efficient, requiring significantly less supervision data than traditional discriminative PRMs while achieving strong performance.
17
 
18
+ Here's an example of the model output:
19
+
20
+
21
  ## Model Details
22
 
23
  ### Model Description