Weiyun1025 commited on
Commit
852965e
·
verified ·
1 Parent(s): 4f7e3ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -105,7 +105,7 @@ In this work, we use the Best-of-N evaluation strategy and employ [VisualPRM-8B]
105
 
106
  ### Multimodal Reasoning and Mathematics
107
 
108
- ![image/png](https://huggingface.co/datasets/Weiyun1025/InternVL-Performance/resolve/main/internvl3/reasoning.png)
109
 
110
  ### OCR, Chart, and Document Understanding
111
 
@@ -161,7 +161,7 @@ The evaluation results in the Figure below shows that the model with native mult
161
 
162
  As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data.
163
 
164
- ![image/png](https://huggingface.co/datasets/Weiyun1025/InternVL-Performance/resolve/main/internvl3/ablation-mpo.png)
165
 
166
  ### Variable Visual Position Encoding
167
 
 
105
 
106
  ### Multimodal Reasoning and Mathematics
107
 
108
+ ![image/png](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K-v1.1/resolve/main/visualprm-performance.png)
109
 
110
  ### OCR, Chart, and Document Understanding
111
 
 
161
 
162
  As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data.
163
 
164
+ ![image/png](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2/resolve/main/ablation-mpo.png)
165
 
166
  ### Variable Visual Position Encoding
167