rubriceval / leaderboard.csv
vbhat4's picture
Add draft application file and resources
6594739 verified
raw
history blame contribute delete
408 Bytes
Model,Score,95% CI
GPT-4 Omni,3.18,+0.06/-0.06
GPT-4 Turbo,3.1,+0.06/-0.06
Gemini 1.5 Pro,3.06,+0.07/-0.07
Gemini 1.5 Flash,2.98,+0.07/-0.07
Llama 3 70B,2.9,+0.07/-0.07
Claude 3 Opus,2.86,+0.08/-0.08
Claude 3 Sonnet,2.79,+0.08/-0.08
Claude 3 Haiku,2.73,+0.08/-0.08
Gemini 1.0 Pro,2.56,+0.07/-0.07
Llama 3 8B,2.56,+0.07/-0.07
GPT-3.5 Turbo,2.52,+0.08/-0.08
Gemma 7B,2.14,+0.07/-0.07
Gemma 2B,1.83,+0.16/-0.16