OpsEval / data_v2 /pufa_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2025-02-27
cd43969
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con
Baichuan2-13B-Chat,65.33,66.67,66.67,66.67,62.67,61.33,62.67,62.67
ChatGLM3-6B,60.0,60.0,61.33333333,61.33333333,56.0,56.0,58.66666667,58.66666667
DevOps-Model-14B-Chat,29.33,29.33,62.67,61.33,82.67,81.33,53.33,70.67
ERNIE-Bot-4.0,86.67,86.67,86.67,86.67,82.67,82.67,86.67,86.67
GPT-3.5-turbo,77.33,77.33,84.0,81.33,76.0,78.67,84.0,82.67
GPT-4,88.0,88.0,86.67,86.67,84.0,84.0,90.67,90.67
InternLM2-Chat-20B,76.0,76.0,80.0,80.0,80.0,80.0,,
InternLM2-Chat-7B,78.66666667,78.66666667,72.0,72.0,72.0,72.0,53.33333333,53.33333333
LLaMA-2-13B,44.0,44.0,68.0,68.0,61.33,61.33,53.33,53.33
LLaMA-2-70B-Chat,6.67,6.67,65.33,65.33,49.33,49.33,66.67,66.67
LLaMA-2-7B,25.33,25.33,40.0,40.0,48.0,48.0,52.0,52.0
Mistral-7B,4.0,4.0,58.67,58.67,22.67,22.67,54.67,54.67
Qwen-14B-Chat,73.33,73.33,69.33,72.0,73.33,73.33,72.0,80.0
Qwen-72B-Chat,90.67,90.67,85.33,85.33,88.0,88.0,82.67,82.67
Yi-34B-Chat,84.0,84.0,88.0,88.0,90.67,92.0,78.67,89.33
Claude-3-Opus,93.24324324324324,93.24324324324324,,,,,,
Deepseek-R1-Distill-Llama-8B,24.324324324324326,24.324324324324326,35.13513513513514,35.13513513513514,79.05405405405405,79.05405405405405,26.351351351351347,26.351351351351347
Deepseek-R1-Distill-Qwen-1.5B,25.0,25.0,22.972972972972975,22.972972972972975,13.513513513513514,13.513513513513514,25.0,25.0
Deepseek-R1-Distill-Qwen-14B,92.56756756756756,92.56756756756756,,,87.16216216216216,87.16216216216216,,
Deepseek-R1-Distill-Qwen-32B,95.94594594594595,95.94594594594595,,,95.27027027027026,95.27027027027026,,
Deepseek-R1-Distill-Qwen-7B,77.70270270270271,77.70270270270271,81.75675675675676,81.75675675675676,26.351351351351347,26.351351351351347,34.45945945945946,34.45945945945946
Gemma-2B,36.0,36.0,41.33333,41.33333,36.0,36.0,30.66667,30.66667
Gemma-7B,34.66667,34.66667,56.0,56.0,46.66667,46.66667,56.0,56.0
Hisense,92.56756756756756,93.36486486468645,93.91891891891892,94.2410479339064,93.91891891891892,94.5147428741915,93.91891891891892,94.08436728447799
Meta-Llama-3-8B-Instruct,85.8108108108108,85.8108108108108,31.756756756756754,31.756756756756754,83.1081081081081,83.1081081081081,27.7027027027027,27.7027027027027
Qwen1.5-14B-Base,78.66667,78.66667,72.0,72.0,92.0,92.0,42.66667,42.66667
Qwen1.5-14B-Chat,86.66667,89.33333,85.33333,85.33333,78.66667,80.0,86.66667,85.33333