OpsEval / data_v2 /oracle_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2024-09-06
fe35dbb
raw
history blame
1.23 kB
name,zero_self_con,zero_cot_self_con,few_self_con,few_cot_self_con
Aquilachat2-34B,34.66,47.74,44.48,
Baichuan-13B-Chat,12.07,27.57,19.52,30.58
Baichuan2-13B-Chat,25.5,21.3,26.7,24.7
Chatglm2-6B,23.34,24.14,22.94,26.16
Chatglm3-6B,21.32796781,28.97384306,21.73038229,29.57746479
Chinese-Alpaca-2-13B,22.94,25.75,25.15,22.33
Chinese-Llama-2-13B,14.69,19.92,19.72,20.93
Devops-Model-14B-Chat,22.74,27.77,37.02,26.36
Ernie-Bot-4.0,48.56,50.64,48.0,54.0
Gpt-3.5-Turbo,35.81,43.26,39.44,27.77
Gpt-4,,65.17,,48.09
Internlm-7B,25.96,25.96,29.18,28.37
Internlm2-Chat-7B,28.57142857,31.79074447,30.78470825,31.18712274
Llama-2-13B,24.35,31.99,26.76,20.72
Llama-2-70B-Chat,15.29,34.81,26.76,33.8
Llama-2-7B,20.72,27.97,18.51,17.91
Mistral-7B,1.9,45.61,15.0,35.97
Qwen-14B-Chat,27.57,36.02,35.41,33.4
Qwen-72B-Chat,48.49,49.7,49.7,44.87
Qwen-7B-Chat,17.71,28.37,29.58,31.79
Yi-34B-Chat,49.3,53.72,56.34,54.33
Claude-3-Opus,50.00570664579664,,,
gemma_2b,18.51107,24.9497,21.52918,27.7666
gemma_7b,19.3159,53.94737,18.51107,5.204461
Meta-Llama-3-8B-Instruct,33.91785690993282,27.773429857170807,41.359323028761494,32.62733972477663
Qwen1.5-14B-Base,20.92555,35.61368,41.44869,30.78471
Qwen1.5-14B-Chat,23.34004,41.04628,38.02817,40.04024