safearena-leaderboard / results.csv
adadtur's picture
Update results.csv
6212974 verified
raw
history blame
577 Bytes
Model,Safe Completion Rate,Harmful Completion Rate,Refusal Rate,Normalized Safety Score,License,Bias Completion Rate,Cybercrime Completion Rate,Harassment Completion Rate,Misinformation Completion Rate,Illegal Activity Completion Rate
GPT-4o,34.4,22.8,30.2,31.7,Proprietary,14.0,16.0,16.0,28.0,40.0
GPT-4o-Mini,18.4,14.0,36.5,35.7,Proprietary,6.0,8.0,14.0,24.0,18.0
Claude-3.5-Sonnet,21.2,7.6,57.7,55.0,Proprietary,4.0,6.0,5.0,12.0,12.0
Llama-3.2-90B,8.4,11.2,14.0,34.0,Llama License,22.0,8.0,10.0,14.0,2.0
Qwen-2-VL-72B,24.4,26.0,0.8,21.5,Qwen License,34.0,18.0,18.0,30.0,30.0