OpsEval / data_v2 /network_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2024-09-06
fe35dbb
raw
history blame
1.32 kB
name,zero_self_con,zero_cot_self_con,few_self_con,few_cot_self_con
Aquilachat2-34B,34.66,47.74,44.48,
Baichuan-13B-Chat,16.0,49.7,36.1,55.6
Baichuan2-13B-Chat,35.9,30.5,35.6,32.0
Chatglm2-6B,33.7,42.2,36.0,39.5
Chatglm3-6B,41.39414802,49.22547332,38.81239243,42.85714286
Chinese-Alpaca-2-13B,33.1,44.2,44.0,42.7
Chinese-Llama-2-13B,22.5,38.8,41.8,32.2
Devops-Model-14B-Chat,46.57,56.01,60.08,55.79
Ernie-Bot-4.0,67.54,71.96,72.0,78.0
Glm3-Turbo,59.63855422,,,
Glm4,67.383821,,,
Gpt-3.5-Turbo,58.6,67.6,59.7,67.4
Gpt-4,,,,86.0
Hunyuan-13B,60.0,70.0,,
Internlm-7B,41.7,38.4,42.6,41.3
Internlm2-Chat-20B,57.48709122,57.14285714,59.1222031,50.77452668
Internlm2-Chat-7B,54.30292599,59.81067126,58.51979346,51.63511188
Llama-2-13B,31.6,57.0,38.9,50.6
Llama-2-70B-Chat,38.55,57.49,49.09,48.57
Llama-2-7B,30.2,55.6,40.8,50.4
Mistral-7B,1.9,45.61,15.0,35.97
Qwen-14B-Chat,48.81,57.4,56.12,54.99
Qwen-72B-Chat,65.86,68.3,69.4,70.08
Qwen-7B-Chat,29.9,53.5,46.9,47.7
Yi-34B-Chat,62.56,69.75,65.37,71.21
Claude-3-Opus,62.329525111479995,,,
gemma_2b,29.69019,39.15663,29.77625,38.64028
gemma_7b,31.58348,47.59036,34.68158,48.88124
Meta-Llama-3-8B-Instruct,35.904696806952444,38.94801939914722,41.717931191615406,31.059792337987826
Qwen1.5-14B-Base,45.18072,59.1222,61.10155,52.4957
Qwen1.5-14B-Chat,53.87263,63.85542,58.0895,65.57659