alielfilali01 commited on
Commit
3df2235
·
verified ·
1 Parent(s): 62f6b08

Create files/aragen_v1_results.json

Browse files
Files changed (1) hide show
  1. files/aragen_v1_results.json +2405 -0
files/aragen_v1_results.json ADDED
@@ -0,0 +1,2405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "claude-3.5-sonnet Scores": {
4
+ "3C3H Scores": {
5
+ "Correctness": 0.7026,
6
+ "Completeness": 0.7014,
7
+ "Conciseness": 0.1631,
8
+ "Helpfulness": 0.6784,
9
+ "Honesty": 0.6972,
10
+ "Harmlessness": 0.7026,
11
+ "3C3H Score": 0.6076
12
+ },
13
+ "Tasks Scores": {
14
+ "Question Answering (QA)": 0.7151,
15
+ "Reasoning": 0.64,
16
+ "Orthographic and Grammatical Analysis": 0.0887,
17
+ "Safety": 0.4729
18
+ }
19
+ },
20
+ "Meta": {
21
+ "Model Name": "CohereForAI/aya-expanse-32b",
22
+ "License": "cc-by-nc-4.0",
23
+ "Revision": "main",
24
+ "Precision": "float16",
25
+ "Params": 32.0,
26
+ "Total Entries": 279,
27
+ "Successful Entries": 278,
28
+ "Failed Entries": 1,
29
+ "Success Ratio": 0.9964
30
+ }
31
+ },
32
+ {
33
+ "claude-3.5-sonnet Scores": {
34
+ "3C3H Scores": {
35
+ "Correctness": 0.5612,
36
+ "Completeness": 0.5612,
37
+ "Conciseness": 0.1172,
38
+ "Helpfulness": 0.5468,
39
+ "Honesty": 0.5519,
40
+ "Harmlessness": 0.5594,
41
+ "3C3H Score": 0.4829
42
+ },
43
+ "Tasks Scores": {
44
+ "Question Answering (QA)": 0.5526,
45
+ "Reasoning": 0.5561,
46
+ "Orthographic and Grammatical Analysis": 0.0,
47
+ "Safety": 0.4271
48
+ }
49
+ },
50
+ "Meta": {
51
+ "Model Name": "CohereForAI/aya-expanse-8b",
52
+ "License": "cc-by-nc-4.0",
53
+ "Revision": "main",
54
+ "Precision": "float16",
55
+ "Params": 8.0,
56
+ "Total Entries": 279,
57
+ "Successful Entries": 278,
58
+ "Failed Entries": 1,
59
+ "Success Ratio": 0.9964
60
+ }
61
+ },
62
+ {
63
+ "claude-3.5-sonnet Scores": {
64
+ "3C3H Scores": {
65
+ "Correctness": 0.4648,
66
+ "Completeness": 0.46,
67
+ "Conciseness": 0.1251,
68
+ "Helpfulness": 0.4415,
69
+ "Honesty": 0.4495,
70
+ "Harmlessness": 0.4639,
71
+ "3C3H Score": 0.4008
72
+ },
73
+ "Tasks Scores": {
74
+ "Question Answering (QA)": 0.5056,
75
+ "Reasoning": 0.3817,
76
+ "Orthographic and Grammatical Analysis": 0.0,
77
+ "Safety": 0.2917
78
+ }
79
+ },
80
+ "Meta": {
81
+ "Model Name": "FreedomIntelligence/AceGPT-13B-chat",
82
+ "License": "apache-2.0",
83
+ "Revision": "main",
84
+ "Precision": "float16",
85
+ "Params": 13.0,
86
+ "Total Entries": 279,
87
+ "Successful Entries": 279,
88
+ "Failed Entries": 0,
89
+ "Success Ratio": 1.0
90
+ }
91
+ },
92
+ {
93
+ "claude-3.5-sonnet Scores": {
94
+ "3C3H Scores": {
95
+ "Correctness": 0.4158,
96
+ "Completeness": 0.4158,
97
+ "Conciseness": 0.0941,
98
+ "Helpfulness": 0.3817,
99
+ "Honesty": 0.3934,
100
+ "Harmlessness": 0.4158,
101
+ "3C3H Score": 0.3527
102
+ },
103
+ "Tasks Scores": {
104
+ "Question Answering (QA)": 0.4017,
105
+ "Reasoning": 0.4367,
106
+ "Orthographic and Grammatical Analysis": 0.0,
107
+ "Safety": 0.2104
108
+ }
109
+ },
110
+ "Meta": {
111
+ "Model Name": "FreedomIntelligence/AceGPT-7B-chat",
112
+ "License": "apache-2.0",
113
+ "Revision": "main",
114
+ "Precision": "float16",
115
+ "Params": 7.0,
116
+ "Total Entries": 279,
117
+ "Successful Entries": 279,
118
+ "Failed Entries": 0,
119
+ "Success Ratio": 1.0
120
+ }
121
+ },
122
+ {
123
+ "claude-3.5-sonnet Scores": {
124
+ "3C3H Scores": {
125
+ "Correctness": 0.5568,
126
+ "Completeness": 0.546,
127
+ "Conciseness": 0.2094,
128
+ "Helpfulness": 0.5302,
129
+ "Honesty": 0.5391,
130
+ "Harmlessness": 0.5568,
131
+ "3C3H Score": 0.4897
132
+ },
133
+ "Tasks Scores": {
134
+ "Question Answering (QA)": 0.6084,
135
+ "Reasoning": 0.4717,
136
+ "Orthographic and Grammatical Analysis": 0.0,
137
+ "Safety": 0.4083
138
+ }
139
+ },
140
+ "Meta": {
141
+ "Model Name": "FreedomIntelligence/AceGPT-v2-8B-Chat",
142
+ "License": "apache-2.0",
143
+ "Revision": "main",
144
+ "Precision": "float16",
145
+ "Params": 8.0,
146
+ "Total Entries": 279,
147
+ "Successful Entries": 279,
148
+ "Failed Entries": 0,
149
+ "Success Ratio": 1.0
150
+ }
151
+ },
152
+ {
153
+ "claude-3.5-sonnet Scores": {
154
+ "3C3H Scores": {
155
+ "Correctness": 0.1547,
156
+ "Completeness": 0.1439,
157
+ "Conciseness": 0.0369,
158
+ "Helpfulness": 0.116,
159
+ "Honesty": 0.1286,
160
+ "Harmlessness": 0.1538,
161
+ "3C3H Score": 0.1223
162
+ },
163
+ "Tasks Scores": {
164
+ "Question Answering (QA)": 0.1201,
165
+ "Reasoning": 0.1094,
166
+ "Orthographic and Grammatical Analysis": 0.0,
167
+ "Safety": 0.3771
168
+ }
169
+ },
170
+ "Meta": {
171
+ "Model Name": "Qwen/Qwen2.5-0.5B-Instruct",
172
+ "License": "apache-2.0",
173
+ "Revision": "main",
174
+ "Precision": "bfloat16",
175
+ "Params": 0.465,
176
+ "Total Entries": 279,
177
+ "Successful Entries": 278,
178
+ "Failed Entries": 1,
179
+ "Success Ratio": 0.9964
180
+ }
181
+ },
182
+ {
183
+ "claude-3.5-sonnet Scores": {
184
+ "3C3H Scores": {
185
+ "Correctness": 0.4468,
186
+ "Completeness": 0.4432,
187
+ "Conciseness": 0.1278,
188
+ "Helpfulness": 0.4179,
189
+ "Honesty": 0.4271,
190
+ "Harmlessness": 0.4459,
191
+ "3C3H Score": 0.3848
192
+ },
193
+ "Tasks Scores": {
194
+ "Question Answering (QA)": 0.3684,
195
+ "Reasoning": 0.4983,
196
+ "Orthographic and Grammatical Analysis": 0.0,
197
+ "Safety": 0.6812
198
+ }
199
+ },
200
+ "Meta": {
201
+ "Model Name": "Qwen/Qwen2.5-3B-Instruct",
202
+ "License": "apache-2.0",
203
+ "Revision": "main",
204
+ "Precision": "bfloat16",
205
+ "Params": 3.0,
206
+ "Total Entries": 279,
207
+ "Successful Entries": 279,
208
+ "Failed Entries": 0,
209
+ "Success Ratio": 1.0
210
+ }
211
+ },
212
+ {
213
+ "claude-3.5-sonnet Scores": {
214
+ "3C3H Scores": {
215
+ "Correctness": 0.7192,
216
+ "Completeness": 0.718,
217
+ "Conciseness": 0.1906,
218
+ "Helpfulness": 0.6986,
219
+ "Honesty": 0.7094,
220
+ "Harmlessness": 0.7192,
221
+ "3C3H Score": 0.6258
222
+ },
223
+ "Tasks Scores": {
224
+ "Question Answering (QA)": 0.6677,
225
+ "Reasoning": 0.7594,
226
+ "Orthographic and Grammatical Analysis": 0.1075,
227
+ "Safety": 0.6083
228
+ }
229
+ },
230
+ "Meta": {
231
+ "Model Name": "Qwen/Qwen2.5-72B-Instruct",
232
+ "License": "qwen",
233
+ "Revision": "main",
234
+ "Precision": "bfloat16",
235
+ "Params": 72.0,
236
+ "Total Entries": 279,
237
+ "Successful Entries": 279,
238
+ "Failed Entries": 0,
239
+ "Success Ratio": 1.0
240
+ }
241
+ },
242
+ {
243
+ "claude-3.5-sonnet Scores": {
244
+ "3C3H Scores": {
245
+ "Correctness": 0.6499,
246
+ "Completeness": 0.6487,
247
+ "Conciseness": 0.2016,
248
+ "Helpfulness": 0.6386,
249
+ "Honesty": 0.638,
250
+ "Harmlessness": 0.6499,
251
+ "3C3H Score": 0.5711
252
+ },
253
+ "Tasks Scores": {
254
+ "Question Answering (QA)": 0.6395,
255
+ "Reasoning": 0.6122,
256
+ "Orthographic and Grammatical Analysis": 0.0,
257
+ "Safety": 0.7792
258
+ }
259
+ },
260
+ "Meta": {
261
+ "Model Name": "google/gemma-2-27b-it",
262
+ "License": "gemma",
263
+ "Revision": "main",
264
+ "Precision": "bfloat16",
265
+ "Params": 27.0,
266
+ "Total Entries": 279,
267
+ "Successful Entries": 279,
268
+ "Failed Entries": 0,
269
+ "Success Ratio": 1.0
270
+ }
271
+ },
272
+ {
273
+ "claude-3.5-sonnet Scores": {
274
+ "3C3H Scores": {
275
+ "Correctness": 0.589,
276
+ "Completeness": 0.589,
277
+ "Conciseness": 0.1834,
278
+ "Helpfulness": 0.5797,
279
+ "Honesty": 0.5744,
280
+ "Harmlessness": 0.589,
281
+ "3C3H Score": 0.5174
282
+ },
283
+ "Tasks Scores": {
284
+ "Question Answering (QA)": 0.5462,
285
+ "Reasoning": 0.6011,
286
+ "Orthographic and Grammatical Analysis": 0.0,
287
+ "Safety": 0.7854
288
+ }
289
+ },
290
+ "Meta": {
291
+ "Model Name": "google/gemma-2-9b-it",
292
+ "License": "gemma",
293
+ "Revision": "main",
294
+ "Precision": "bfloat16",
295
+ "Params": 9.0,
296
+ "Total Entries": 279,
297
+ "Successful Entries": 279,
298
+ "Failed Entries": 0,
299
+ "Success Ratio": 1.0
300
+ }
301
+ },
302
+ {
303
+ "claude-3.5-sonnet Scores": {
304
+ "3C3H Scores": {
305
+ "Correctness": 0.5579,
306
+ "Completeness": 0.5544,
307
+ "Conciseness": 0.1682,
308
+ "Helpfulness": 0.5352,
309
+ "Honesty": 0.5436,
310
+ "Harmlessness": 0.5579,
311
+ "3C3H Score": 0.4862
312
+ },
313
+ "Tasks Scores": {
314
+ "Question Answering (QA)": 0.5925,
315
+ "Reasoning": 0.48,
316
+ "Orthographic and Grammatical Analysis": 0.0,
317
+ "Safety": 0.45
318
+ }
319
+ },
320
+ "Meta": {
321
+ "Model Name": "inceptionai/jais-adapted-13b-chat",
322
+ "License": "apache-2.0",
323
+ "Revision": "main",
324
+ "Precision": "float32",
325
+ "Params": 13.0,
326
+ "Total Entries": 279,
327
+ "Successful Entries": 279,
328
+ "Failed Entries": 0,
329
+ "Success Ratio": 1.0
330
+ }
331
+ },
332
+ {
333
+ "claude-3.5-sonnet Scores": {
334
+ "3C3H Scores": {
335
+ "Correctness": 0.6679,
336
+ "Completeness": 0.6655,
337
+ "Conciseness": 0.1804,
338
+ "Helpfulness": 0.6326,
339
+ "Honesty": 0.652,
340
+ "Harmlessness": 0.6679,
341
+ "3C3H Score": 0.5777
342
+ },
343
+ "Tasks Scores": {
344
+ "Question Answering (QA)": 0.6864,
345
+ "Reasoning": 0.5711,
346
+ "Orthographic and Grammatical Analysis": 0.0578,
347
+ "Safety": 0.5771
348
+ }
349
+ },
350
+ "Meta": {
351
+ "Model Name": "inceptionai/jais-adapted-70b-chat",
352
+ "License": "apache-2.0",
353
+ "Revision": "main",
354
+ "Precision": "float32",
355
+ "Params": 70.0,
356
+ "Total Entries": 279,
357
+ "Successful Entries": 279,
358
+ "Failed Entries": 0,
359
+ "Success Ratio": 1.0
360
+ }
361
+ },
362
+ {
363
+ "claude-3.5-sonnet Scores": {
364
+ "3C3H Scores": {
365
+ "Correctness": 0.5211,
366
+ "Completeness": 0.5102,
367
+ "Conciseness": 0.1339,
368
+ "Helpfulness": 0.4798,
369
+ "Honesty": 0.5093,
370
+ "Harmlessness": 0.5202,
371
+ "3C3H Score": 0.4457
372
+ },
373
+ "Tasks Scores": {
374
+ "Question Answering (QA)": 0.5144,
375
+ "Reasoning": 0.4844,
376
+ "Orthographic and Grammatical Analysis": 0.0269,
377
+ "Safety": 0.4312
378
+ }
379
+ },
380
+ "Meta": {
381
+ "Model Name": "inceptionai/jais-family-13b-chat",
382
+ "License": "apache-2.0",
383
+ "Revision": "main",
384
+ "Precision": "float32",
385
+ "Params": 13.0,
386
+ "Total Entries": 279,
387
+ "Successful Entries": 277,
388
+ "Failed Entries": 2,
389
+ "Success Ratio": 0.9928
390
+ }
391
+ },
392
+ {
393
+ "claude-3.5-sonnet Scores": {
394
+ "3C3H Scores": {
395
+ "Correctness": 0.3729,
396
+ "Completeness": 0.3669,
397
+ "Conciseness": 0.0887,
398
+ "Helpfulness": 0.3441,
399
+ "Honesty": 0.3543,
400
+ "Harmlessness": 0.3711,
401
+ "3C3H Score": 0.3163
402
+ },
403
+ "Tasks Scores": {
404
+ "Question Answering (QA)": 0.348,
405
+ "Reasoning": 0.3761,
406
+ "Orthographic and Grammatical Analysis": 0.0,
407
+ "Safety": 0.3417
408
+ }
409
+ },
410
+ "Meta": {
411
+ "Model Name": "inceptionai/jais-family-2p7b-chat",
412
+ "License": "apache-2.0",
413
+ "Revision": "main",
414
+ "Precision": "float32",
415
+ "Params": 3.0,
416
+ "Total Entries": 279,
417
+ "Successful Entries": 278,
418
+ "Failed Entries": 1,
419
+ "Success Ratio": 0.9964
420
+ }
421
+ },
422
+ {
423
+ "claude-3.5-sonnet Scores": {
424
+ "3C3H Scores": {
425
+ "Correctness": 0.5806,
426
+ "Completeness": 0.5759,
427
+ "Conciseness": 0.1526,
428
+ "Helpfulness": 0.5475,
429
+ "Honesty": 0.5621,
430
+ "Harmlessness": 0.5806,
431
+ "3C3H Score": 0.4999
432
+ },
433
+ "Tasks Scores": {
434
+ "Question Answering (QA)": 0.5812,
435
+ "Reasoning": 0.5239,
436
+ "Orthographic and Grammatical Analysis": 0.0282,
437
+ "Safety": 0.5187
438
+ }
439
+ },
440
+ "Meta": {
441
+ "Model Name": "inceptionai/jais-family-30b-8k-chat",
442
+ "License": "apache-2.0",
443
+ "Revision": "main",
444
+ "Precision": "float32",
445
+ "Params": 30.0,
446
+ "Total Entries": 279,
447
+ "Successful Entries": 279,
448
+ "Failed Entries": 0,
449
+ "Success Ratio": 1.0
450
+ }
451
+ },
452
+ {
453
+ "claude-3.5-sonnet Scores": {
454
+ "3C3H Scores": {
455
+ "Correctness": 0.4755,
456
+ "Completeness": 0.4731,
457
+ "Conciseness": 0.1243,
458
+ "Helpfulness": 0.4522,
459
+ "Honesty": 0.4597,
460
+ "Harmlessness": 0.4755,
461
+ "3C3H Score": 0.41
462
+ },
463
+ "Tasks Scores": {
464
+ "Question Answering (QA)": 0.4743,
465
+ "Reasoning": 0.4633,
466
+ "Orthographic and Grammatical Analysis": 0.0,
467
+ "Safety": 0.3542
468
+ }
469
+ },
470
+ "Meta": {
471
+ "Model Name": "inceptionai/jais-family-6p7b-chat",
472
+ "License": "apache-2.0",
473
+ "Revision": "main",
474
+ "Precision": "float32",
475
+ "Params": 7.0,
476
+ "Total Entries": 279,
477
+ "Successful Entries": 279,
478
+ "Failed Entries": 0,
479
+ "Success Ratio": 1.0
480
+ }
481
+ },
482
+ {
483
+ "claude-3.5-sonnet Scores": {
484
+ "3C3H Scores": {
485
+ "Correctness": 0.6392,
486
+ "Completeness": 0.6129,
487
+ "Conciseness": 0.27,
488
+ "Helpfulness": 0.6016,
489
+ "Honesty": 0.6171,
490
+ "Harmlessness": 0.6383,
491
+ "3C3H Score": 0.5632
492
+ },
493
+ "Tasks Scores": {
494
+ "Question Answering (QA)": 0.6465,
495
+ "Reasoning": 0.6283,
496
+ "Orthographic and Grammatical Analysis": 0.0591,
497
+ "Safety": 0.4625
498
+ }
499
+ },
500
+ "Meta": {
501
+ "Model Name": "meta-llama/Llama-3.1-70B-Instruct",
502
+ "License": "llama3.1",
503
+ "Revision": "main",
504
+ "Precision": "bfloat16",
505
+ "Params": 70.0,
506
+ "Total Entries": 279,
507
+ "Successful Entries": 279,
508
+ "Failed Entries": 0,
509
+ "Success Ratio": 1.0
510
+ }
511
+ },
512
+ {
513
+ "claude-3.5-sonnet Scores": {
514
+ "3C3H Scores": {
515
+ "Correctness": 0.4421,
516
+ "Completeness": 0.4409,
517
+ "Conciseness": 0.1416,
518
+ "Helpfulness": 0.3967,
519
+ "Honesty": 0.4065,
520
+ "Harmlessness": 0.4421,
521
+ "3C3H Score": 0.3783
522
+ },
523
+ "Tasks Scores": {
524
+ "Question Answering (QA)": 0.3826,
525
+ "Reasoning": 0.45,
526
+ "Orthographic and Grammatical Analysis": 0.0,
527
+ "Safety": 0.6625
528
+ }
529
+ },
530
+ "Meta": {
531
+ "Model Name": "meta-llama/Llama-3.1-8B-Instruct",
532
+ "License": "llama3.1",
533
+ "Revision": "main",
534
+ "Precision": "bfloat16",
535
+ "Params": 8.0,
536
+ "Total Entries": 279,
537
+ "Successful Entries": 279,
538
+ "Failed Entries": 0,
539
+ "Success Ratio": 1.0
540
+ }
541
+ },
542
+ {
543
+ "claude-3.5-sonnet Scores": {
544
+ "3C3H Scores": {
545
+ "Correctness": 0.2359,
546
+ "Completeness": 0.2058,
547
+ "Conciseness": 0.0581,
548
+ "Helpfulness": 0.1781,
549
+ "Honesty": 0.2106,
550
+ "Harmlessness": 0.2341,
551
+ "3C3H Score": 0.1871
552
+ },
553
+ "Tasks Scores": {
554
+ "Question Answering (QA)": 0.198,
555
+ "Reasoning": 0.2328,
556
+ "Orthographic and Grammatical Analysis": 0.0,
557
+ "Safety": 0.2229
558
+ }
559
+ },
560
+ "Meta": {
561
+ "Model Name": "meta-llama/Meta-Llama-3-8B-Instruct",
562
+ "License": "llama3",
563
+ "Revision": "main",
564
+ "Precision": "bfloat16",
565
+ "Params": 14.963,
566
+ "Total Entries": 279,
567
+ "Successful Entries": 277,
568
+ "Failed Entries": 2,
569
+ "Success Ratio": 0.9928
570
+ }
571
+ },
572
+ {
573
+ "claude-3.5-sonnet Scores": {
574
+ "3C3H Scores": {
575
+ "Correctness": 0.5204,
576
+ "Completeness": 0.1295,
577
+ "Conciseness": 0.4149,
578
+ "Helpfulness": 0.2332,
579
+ "Honesty": 0.4814,
580
+ "Harmlessness": 0.5204,
581
+ "3C3H Score": 0.3833
582
+ },
583
+ "Tasks Scores": {
584
+ "Question Answering (QA)": 0.4053,
585
+ "Reasoning": 0.3806,
586
+ "Orthographic and Grammatical Analysis": 0.0,
587
+ "Safety": 0.8188
588
+ }
589
+ },
590
+ "Meta": {
591
+ "Model Name": "silma-ai/SILMA-9B-Instruct-v1.0",
592
+ "License": "gemma",
593
+ "Revision": "main",
594
+ "Precision": "bfloat16",
595
+ "Params": 9.0,
596
+ "Total Entries": 279,
597
+ "Successful Entries": 278,
598
+ "Failed Entries": 1,
599
+ "Success Ratio": 0.9964
600
+ }
601
+ },
602
+ {
603
+ "claude-3.5-sonnet Scores": {
604
+ "3C3H Scores": {
605
+ "Correctness": 0.542,
606
+ "Completeness": 0.5156,
607
+ "Conciseness": 0.2512,
608
+ "Helpfulness": 0.5033,
609
+ "Honesty": 0.533,
610
+ "Harmlessness": 0.542,
611
+ "3C3H Score": 0.4812
612
+ },
613
+ "Tasks Scores": {
614
+ "Question Answering (QA)": 0.6009,
615
+ "Reasoning": 0.4825,
616
+ "Orthographic and Grammatical Analysis": 0.0309,
617
+ "Safety": 0.2583
618
+ }
619
+ },
620
+ "Meta": {
621
+ "Model Name": "CohereForAI/aya-23-35B",
622
+ "License": "cc-by-nc-4.0",
623
+ "Revision": "main",
624
+ "Precision": "float16",
625
+ "Params": 35.0,
626
+ "Total Entries": 279,
627
+ "Successful Entries": 278,
628
+ "Failed Entries": 1,
629
+ "Success Ratio": 0.9964
630
+ }
631
+ },
632
+ {
633
+ "claude-3.5-sonnet Scores": {
634
+ "3C3H Scores": {
635
+ "Correctness": 0.5878,
636
+ "Completeness": 0.5472,
637
+ "Conciseness": 0.1738,
638
+ "Helpfulness": 0.5594,
639
+ "Honesty": 0.5806,
640
+ "Harmlessness": 0.5833,
641
+ "3C3H Score": 0.5054
642
+ },
643
+ "Tasks Scores": {
644
+ "Question Answering (QA)": 0.6209,
645
+ "Reasoning": 0.5394,
646
+ "Orthographic and Grammatical Analysis": 0.0269,
647
+ "Safety": 0.2354
648
+ }
649
+ },
650
+ "Meta": {
651
+ "Model Name": "CohereForAI/c4ai-command-r-08-2024",
652
+ "License": "cc-by-nc-4.0",
653
+ "Revision": "main",
654
+ "Precision": "float16",
655
+ "Params": 32.0,
656
+ "Total Entries": 279,
657
+ "Successful Entries": 279,
658
+ "Failed Entries": 0,
659
+ "Success Ratio": 1.0
660
+ }
661
+ },
662
+ {
663
+ "claude-3.5-sonnet Scores": {
664
+ "3C3H Scores": {
665
+ "Correctness": 0.6282,
666
+ "Completeness": 0.6221,
667
+ "Conciseness": 0.1733,
668
+ "Helpfulness": 0.5978,
669
+ "Honesty": 0.6119,
670
+ "Harmlessness": 0.6282,
671
+ "3C3H Score": 0.5436
672
+ },
673
+ "Tasks Scores": {
674
+ "Question Answering (QA)": 0.6891,
675
+ "Reasoning": 0.5333,
676
+ "Orthographic and Grammatical Analysis": 0.0264,
677
+ "Safety": 0.2521
678
+ }
679
+ },
680
+ "Meta": {
681
+ "Model Name": "CohereForAI/c4ai-command-r-v01",
682
+ "License": "cc-by-nc-4.0",
683
+ "Revision": "main",
684
+ "Precision": "float16",
685
+ "Params": 35.0,
686
+ "Total Entries": 279,
687
+ "Successful Entries": 277,
688
+ "Failed Entries": 2,
689
+ "Success Ratio": 0.9928
690
+ }
691
+ },
692
+ {
693
+ "claude-3.5-sonnet Scores": {
694
+ "3C3H Scores": {
695
+ "Correctness": 0.5297,
696
+ "Completeness": 0.4679,
697
+ "Conciseness": 0.2876,
698
+ "Helpfulness": 0.4694,
699
+ "Honesty": 0.5097,
700
+ "Harmlessness": 0.5297,
701
+ "3C3H Score": 0.4657
702
+ },
703
+ "Tasks Scores": {
704
+ "Question Answering (QA)": 0.5958,
705
+ "Reasoning": 0.4296,
706
+ "Orthographic and Grammatical Analysis": 0.0,
707
+ "Safety": 0.3171
708
+ }
709
+ },
710
+ "Meta": {
711
+ "Model Name": "FreedomIntelligence/AceGPT-v1.5-13B-Chat",
712
+ "License": "apache-2.0",
713
+ "Revision": "main",
714
+ "Precision": "float32",
715
+ "Params": 13.0,
716
+ "Total Entries": 279,
717
+ "Successful Entries": 275,
718
+ "Failed Entries": 4,
719
+ "Success Ratio": 0.9857
720
+ }
721
+ },
722
+ {
723
+ "claude-3.5-sonnet Scores": {
724
+ "3C3H Scores": {
725
+ "Correctness": 0.6717,
726
+ "Completeness": 0.6642,
727
+ "Conciseness": 0.2906,
728
+ "Helpfulness": 0.6479,
729
+ "Honesty": 0.6657,
730
+ "Harmlessness": 0.6717,
731
+ "3C3H Score": 0.602
732
+ },
733
+ "Tasks Scores": {
734
+ "Question Answering (QA)": 0.7136,
735
+ "Reasoning": 0.5694,
736
+ "Orthographic and Grammatical Analysis": 0.0632,
737
+ "Safety": 0.75
738
+ }
739
+ },
740
+ "Meta": {
741
+ "Model Name": "FreedomIntelligence/AceGPT-v2-70B-Chat",
742
+ "License": "apache-2.0",
743
+ "Revision": "main",
744
+ "Precision": "float16",
745
+ "Params": 70.0,
746
+ "Total Entries": 279,
747
+ "Successful Entries": 267,
748
+ "Failed Entries": 12,
749
+ "Success Ratio": 0.957
750
+ }
751
+ },
752
+ {
753
+ "claude-3.5-sonnet Scores": {
754
+ "3C3H Scores": {
755
+ "Correctness": 0.7103,
756
+ "Completeness": 0.7091,
757
+ "Conciseness": 0.1912,
758
+ "Helpfulness": 0.6888,
759
+ "Honesty": 0.7036,
760
+ "Harmlessness": 0.7103,
761
+ "3C3H Score": 0.6189
762
+ },
763
+ "Tasks Scores": {
764
+ "Question Answering (QA)": 0.6862,
765
+ "Reasoning": 0.7472,
766
+ "Orthographic and Grammatical Analysis": 0.0282,
767
+ "Safety": 0.5482
768
+ }
769
+ },
770
+ "Meta": {
771
+ "Model Name": "MaziyarPanahi/calme-2.2-qwen2.5-72b",
772
+ "License": "tongyi-qianwen",
773
+ "Revision": "main",
774
+ "Precision": "bfloat16",
775
+ "Params": 72.0,
776
+ "Total Entries": 279,
777
+ "Successful Entries": 275,
778
+ "Failed Entries": 4,
779
+ "Success Ratio": 0.9857
780
+ }
781
+ },
782
+ {
783
+ "claude-3.5-sonnet Scores": {
784
+ "3C3H Scores": {
785
+ "Correctness": 0.2848,
786
+ "Completeness": 0.2848,
787
+ "Conciseness": 0.088,
788
+ "Helpfulness": 0.2553,
789
+ "Honesty": 0.2531,
790
+ "Harmlessness": 0.2833,
791
+ "3C3H Score": 0.2416
792
+ },
793
+ "Tasks Scores": {
794
+ "Question Answering (QA)": 0.2384,
795
+ "Reasoning": 0.2723,
796
+ "Orthographic and Grammatical Analysis": 0.0,
797
+ "Safety": 0.5486
798
+ }
799
+ },
800
+ "Meta": {
801
+ "Model Name": "Qwen/Qwen2.5-1.5B-Instruct",
802
+ "License": "qwen",
803
+ "Revision": "main",
804
+ "Precision": "bfloat16",
805
+ "Params": 1.443,
806
+ "Total Entries": 279,
807
+ "Successful Entries": 268,
808
+ "Failed Entries": 11,
809
+ "Success Ratio": 0.9606
810
+ }
811
+ },
812
+ {
813
+ "claude-3.5-sonnet Scores": {
814
+ "3C3H Scores": {
815
+ "Correctness": 0.6146,
816
+ "Completeness": 0.6059,
817
+ "Conciseness": 0.1859,
818
+ "Helpfulness": 0.5914,
819
+ "Honesty": 0.5988,
820
+ "Harmlessness": 0.6146,
821
+ "3C3H Score": 0.5352
822
+ },
823
+ "Tasks Scores": {
824
+ "Question Answering (QA)": 0.566,
825
+ "Reasoning": 0.6684,
826
+ "Orthographic and Grammatical Analysis": 0.0,
827
+ "Safety": 0.6009
828
+ }
829
+ },
830
+ "Meta": {
831
+ "Model Name": "Qwen/Qwen2.5-14B-Instruct",
832
+ "License": "apache-2.0",
833
+ "Revision": "main",
834
+ "Precision": "bfloat16",
835
+ "Params": 14.0,
836
+ "Total Entries": 279,
837
+ "Successful Entries": 269,
838
+ "Failed Entries": 10,
839
+ "Success Ratio": 0.9642
840
+ }
841
+ },
842
+ {
843
+ "claude-3.5-sonnet Scores": {
844
+ "3C3H Scores": {
845
+ "Correctness": 0.8831,
846
+ "Completeness": 0.8781,
847
+ "Conciseness": 0.3327,
848
+ "Helpfulness": 0.8697,
849
+ "Honesty": 0.8778,
850
+ "Harmlessness": 0.8831,
851
+ "3C3H Score": 0.7874
852
+ },
853
+ "Tasks Scores": {
854
+ "Question Answering (QA)": 0.7896,
855
+ "Reasoning": 0.77,
856
+ "Orthographic and Grammatical Analysis": 0.7487,
857
+ "Safety": 0.9013
858
+ }
859
+ },
860
+ "Meta": {
861
+ "Model Name": "claude-3-5-sonnet-20241022",
862
+ "License": "Proprietary",
863
+ "Revision": "UNK",
864
+ "Precision": "UNK",
865
+ "Params": "UNK",
866
+ "Total Entries": 279,
867
+ "Successful Entries": 268,
868
+ "Failed Entries": 11,
869
+ "Success Ratio": 0.9606
870
+ }
871
+ },
872
+ {
873
+ "claude-3.5-sonnet Scores": {
874
+ "3C3H Scores": {
875
+ "Correctness": 0.6389,
876
+ "Completeness": 0.6377,
877
+ "Conciseness": 0.1938,
878
+ "Helpfulness": 0.6162,
879
+ "Honesty": 0.6316,
880
+ "Harmlessness": 0.6389,
881
+ "3C3H Score": 0.5595
882
+ },
883
+ "Tasks Scores": {
884
+ "Question Answering (QA)": 0.6376,
885
+ "Reasoning": 0.5767,
886
+ "Orthographic and Grammatical Analysis": 0.0591,
887
+ "Safety": 0.6854
888
+ }
889
+ },
890
+ "Meta": {
891
+ "Model Name": "claude-3-haiku-20240307",
892
+ "License": "Proprietary",
893
+ "Revision": "UNK",
894
+ "Precision": "UNK",
895
+ "Params": "UNK",
896
+ "Total Entries": 279,
897
+ "Successful Entries": 276,
898
+ "Failed Entries": 3,
899
+ "Success Ratio": 0.9892
900
+ }
901
+ },
902
+ {
903
+ "claude-3.5-sonnet Scores": {
904
+ "3C3H Scores": {
905
+ "Correctness": 0.2603,
906
+ "Completeness": 0.2311,
907
+ "Conciseness": 0.0721,
908
+ "Helpfulness": 0.2132,
909
+ "Honesty": 0.2476,
910
+ "Harmlessness": 0.2594,
911
+ "3C3H Score": 0.214
912
+ },
913
+ "Tasks Scores": {
914
+ "Question Answering (QA)": 0.224,
915
+ "Reasoning": 0.2934,
916
+ "Orthographic and Grammatical Analysis": 0.0,
917
+ "Safety": 0.1771
918
+ }
919
+ },
920
+ "Meta": {
921
+ "Model Name": "meta-llama/Meta-Llama-3-70B-Instruct",
922
+ "License": "llama3",
923
+ "Revision": "main",
924
+ "Precision": "bfloat16",
925
+ "Params": 70.0,
926
+ "Total Entries": 279,
927
+ "Successful Entries": 274,
928
+ "Failed Entries": 5,
929
+ "Success Ratio": 0.9821
930
+ }
931
+ },
932
+ {
933
+ "claude-3.5-sonnet Scores": {
934
+ "3C3H Scores": {
935
+ "Correctness": 0.721,
936
+ "Completeness": 0.7138,
937
+ "Conciseness": 0.2298,
938
+ "Helpfulness": 0.7041,
939
+ "Honesty": 0.7141,
940
+ "Harmlessness": 0.721,
941
+ "3C3H Score": 0.634
942
+ },
943
+ "Tasks Scores": {
944
+ "Question Answering (QA)": 0.6923,
945
+ "Reasoning": 0.7312,
946
+ "Orthographic and Grammatical Analysis": 0.1909,
947
+ "Safety": 0.5229
948
+ }
949
+ },
950
+ "Meta": {
951
+ "Model Name": "gpt-4o-mini",
952
+ "License": "Proprietary",
953
+ "Revision": "UNK",
954
+ "Precision": "UNK",
955
+ "Params": "UNK",
956
+ "Total Entries": 279,
957
+ "Successful Entries": 276,
958
+ "Failed Entries": 3,
959
+ "Success Ratio": 0.9892
960
+ }
961
+ },
962
+ {
963
+ "claude-3.5-sonnet Scores": {
964
+ "3C3H Scores": {
965
+ "Correctness": 0.8375,
966
+ "Completeness": 0.8291,
967
+ "Conciseness": 0.2894,
968
+ "Helpfulness": 0.8099,
969
+ "Honesty": 0.83,
970
+ "Harmlessness": 0.8375,
971
+ "3C3H Score": 0.7389
972
+ },
973
+ "Tasks Scores": {
974
+ "Question Answering (QA)": 0.8014,
975
+ "Reasoning": 0.7455,
976
+ "Orthographic and Grammatical Analysis": 0.5027,
977
+ "Safety": 0.6063
978
+ }
979
+ },
980
+ "Meta": {
981
+ "Model Name": "gpt-4o",
982
+ "License": "Proprietary",
983
+ "Revision": "UNK",
984
+ "Precision": "UNK",
985
+ "Params": "UNK",
986
+ "Total Entries": 279,
987
+ "Successful Entries": 277,
988
+ "Failed Entries": 2,
989
+ "Success Ratio": 0.9928
990
+ }
991
+ },
992
+ {
993
+ "claude-3.5-sonnet Scores": {
994
+ "3C3H Scores": {
995
+ "Correctness": 0.7194,
996
+ "Completeness": 0.7181,
997
+ "Conciseness": 0.1927,
998
+ "Helpfulness": 0.6921,
999
+ "Honesty": 0.7099,
1000
+ "Harmlessness": 0.7194,
1001
+ "3C3H Score": 0.6253
1002
+ },
1003
+ "Tasks Scores": {
1004
+ "Question Answering (QA)": 0.6611,
1005
+ "Reasoning": 0.7922,
1006
+ "Orthographic and Grammatical Analysis": 0.0736,
1007
+ "Safety": 0.5741
1008
+ }
1009
+ },
1010
+ "Meta": {
1011
+ "Model Name": "rombodawg/Rombos-LLM-V2.5-Qwen-72b",
1012
+ "License": "qwen",
1013
+ "Revision": "main",
1014
+ "Precision": "bfloat16",
1015
+ "Params": 72.0,
1016
+ "Total Entries": 279,
1017
+ "Successful Entries": 272,
1018
+ "Failed Entries": 7,
1019
+ "Success Ratio": 0.9749
1020
+ }
1021
+ },
1022
+ {
1023
+ "claude-3.5-sonnet Scores": {
1024
+ "3C3H Scores": {
1025
+ "Correctness": 0.7121,
1026
+ "Completeness": 0.7097,
1027
+ "Conciseness": 0.1876,
1028
+ "Helpfulness": 0.6882,
1029
+ "Honesty": 0.6968,
1030
+ "Harmlessness": 0.7121,
1031
+ "3C3H Score": 0.6177
1032
+ },
1033
+ "Tasks Scores": {
1034
+ "Question Answering (QA)": 0.6815,
1035
+ "Reasoning": 0.7567,
1036
+ "Orthographic and Grammatical Analysis": 0.0,
1037
+ "Safety": 0.5667
1038
+ }
1039
+ },
1040
+ "Meta": {
1041
+ "Model Name": "MaziyarPanahi/calme-2.1-qwen2.5-72b",
1042
+ "License": "tongyi-qianwen",
1043
+ "Revision": "main",
1044
+ "Precision": "bfloat16",
1045
+ "Params": 72.0,
1046
+ "Total Entries": 279,
1047
+ "Successful Entries": 279,
1048
+ "Failed Entries": 0,
1049
+ "Success Ratio": 1.0
1050
+ }
1051
+ },
1052
+ {
1053
+ "claude-3.5-sonnet Scores": {
1054
+ "3C3H Scores": {
1055
+ "Correctness": 0.3285,
1056
+ "Completeness": 0.3225,
1057
+ "Conciseness": 0.0869,
1058
+ "Helpfulness": 0.2987,
1059
+ "Honesty": 0.3081,
1060
+ "Harmlessness": 0.3279,
1061
+ "3C3H Score": 0.2788
1062
+ },
1063
+ "Tasks Scores": {
1064
+ "Question Answering (QA)": 0.2945,
1065
+ "Reasoning": 0.3667,
1066
+ "Orthographic and Grammatical Analysis": 0.0,
1067
+ "Safety": 0.2625
1068
+ }
1069
+ },
1070
+ "Meta": {
1071
+ "Model Name": "inceptionai/jais-family-1p3b-chat",
1072
+ "License": "apache-2.0",
1073
+ "Revision": "main",
1074
+ "Precision": "float32",
1075
+ "Params": 1.0,
1076
+ "Total Entries": 279,
1077
+ "Successful Entries": 277,
1078
+ "Failed Entries": 2,
1079
+ "Success Ratio": 0.9928
1080
+ }
1081
+ },
1082
+ {
1083
+ "claude-3.5-sonnet Scores": {
1084
+ "3C3H Scores": {
1085
+ "Correctness": 0.5695,
1086
+ "Completeness": 0.5624,
1087
+ "Conciseness": 0.1577,
1088
+ "Helpfulness": 0.5312,
1089
+ "Honesty": 0.554,
1090
+ "Harmlessness": 0.5695,
1091
+ "3C3H Score": 0.4907
1092
+ },
1093
+ "Tasks Scores": {
1094
+ "Question Answering (QA)": 0.5702,
1095
+ "Reasoning": 0.5139,
1096
+ "Orthographic and Grammatical Analysis": 0.0,
1097
+ "Safety": 0.5604
1098
+ }
1099
+ },
1100
+ "Meta": {
1101
+ "Model Name": "inceptionai/jais-family-30b-16k-chat",
1102
+ "License": "apache-2.0",
1103
+ "Revision": "main",
1104
+ "Precision": "float32",
1105
+ "Params": 30.0,
1106
+ "Total Entries": 279,
1107
+ "Successful Entries": 278,
1108
+ "Failed Entries": 1,
1109
+ "Success Ratio": 0.9964
1110
+ }
1111
+ },
1112
+ {
1113
+ "claude-3.5-sonnet Scores": {
1114
+ "3C3H Scores": {
1115
+ "Correctness": 0.1966,
1116
+ "Completeness": 0.1535,
1117
+ "Conciseness": 0.0285,
1118
+ "Helpfulness": 0.1196,
1119
+ "Honesty": 0.1643,
1120
+ "Harmlessness": 0.1957,
1121
+ "3C3H Score": 0.143
1122
+ },
1123
+ "Tasks Scores": {
1124
+ "Question Answering (QA)": 0.1577,
1125
+ "Reasoning": 0.1872,
1126
+ "Orthographic and Grammatical Analysis": 0.0,
1127
+ "Safety": 0.0875
1128
+ }
1129
+ },
1130
+ "Meta": {
1131
+ "Model Name": "inceptionai/jais-family-590m-chat",
1132
+ "License": "apache-2.0",
1133
+ "Revision": "main",
1134
+ "Precision": "float32",
1135
+ "Params": 0.719,
1136
+ "Total Entries": 279,
1137
+ "Successful Entries": 278,
1138
+ "Failed Entries": 1,
1139
+ "Success Ratio": 0.9964
1140
+ }
1141
+ },
1142
+ {
1143
+ "claude-3.5-sonnet Scores": {
1144
+ "3C3H Scores": {
1145
+ "Correctness": 0.0791,
1146
+ "Completeness": 0.0504,
1147
+ "Conciseness": 0.0216,
1148
+ "Helpfulness": 0.0414,
1149
+ "Honesty": 0.0549,
1150
+ "Harmlessness": 0.0755,
1151
+ "3C3H Score": 0.0538
1152
+ },
1153
+ "Tasks Scores": {
1154
+ "Question Answering (QA)": 0.0293,
1155
+ "Reasoning": 0.0756,
1156
+ "Orthographic and Grammatical Analysis": 0.0,
1157
+ "Safety": 0.2417
1158
+ }
1159
+ },
1160
+ "Meta": {
1161
+ "Model Name": "meta-llama/Llama-3.2-1B-Instruct",
1162
+ "License": "llama3.2",
1163
+ "Revision": "main",
1164
+ "Precision": "bfloat16",
1165
+ "Params": 1.0,
1166
+ "Total Entries": 279,
1167
+ "Successful Entries": 278,
1168
+ "Failed Entries": 1,
1169
+ "Success Ratio": 0.9964
1170
+ }
1171
+ },
1172
+ {
1173
+ "claude-3.5-sonnet Scores": {
1174
+ "3C3H Scores": {
1175
+ "Correctness": 0.2736,
1176
+ "Completeness": 0.2616,
1177
+ "Conciseness": 0.0792,
1178
+ "Helpfulness": 0.1971,
1179
+ "Honesty": 0.2315,
1180
+ "Harmlessness": 0.2727,
1181
+ "3C3H Score": 0.2193
1182
+ },
1183
+ "Tasks Scores": {
1184
+ "Question Answering (QA)": 0.2133,
1185
+ "Reasoning": 0.28,
1186
+ "Orthographic and Grammatical Analysis": 0.0,
1187
+ "Safety": 0.3771
1188
+ }
1189
+ },
1190
+ "Meta": {
1191
+ "Model Name": "meta-llama/Llama-3.2-3B-Instruct",
1192
+ "License": "llama3.2",
1193
+ "Revision": "main",
1194
+ "Precision": "bfloat16",
1195
+ "Params": 3.0,
1196
+ "Total Entries": 279,
1197
+ "Successful Entries": 279,
1198
+ "Failed Entries": 0,
1199
+ "Success Ratio": 1.0
1200
+ }
1201
+ },
1202
+ {
1203
+ "claude-3.5-sonnet Scores": {
1204
+ "3C3H Scores": {
1205
+ "Correctness": 0.6296,
1206
+ "Completeness": 0.6165,
1207
+ "Conciseness": 0.2258,
1208
+ "Helpfulness": 0.5923,
1209
+ "Honesty": 0.6123,
1210
+ "Harmlessness": 0.6296,
1211
+ "3C3H Score": 0.551
1212
+ },
1213
+ "Tasks Scores": {
1214
+ "Question Answering (QA)": 0.6538,
1215
+ "Reasoning": 0.6033,
1216
+ "Orthographic and Grammatical Analysis": 0.0309,
1217
+ "Safety": 0.375
1218
+ }
1219
+ },
1220
+ "Meta": {
1221
+ "Model Name": "meta-llama/Llama-3.2-90B-Vision-Instruct",
1222
+ "License": "llama3.2",
1223
+ "Revision": "main",
1224
+ "Precision": "bfloat16",
1225
+ "Params": 90.0,
1226
+ "Total Entries": 279,
1227
+ "Successful Entries": 279,
1228
+ "Failed Entries": 0,
1229
+ "Success Ratio": 1.0
1230
+ }
1231
+ },
1232
+ {
1233
+ "claude-3.5-sonnet Scores": {
1234
+ "3C3H Scores": {
1235
+ "Correctness": 0.6858,
1236
+ "Completeness": 0.6511,
1237
+ "Conciseness": 0.345,
1238
+ "Helpfulness": 0.635,
1239
+ "Honesty": 0.6747,
1240
+ "Harmlessness": 0.6858,
1241
+ "3C3H Score": 0.6129
1242
+ },
1243
+ "Tasks Scores": {
1244
+ "Question Answering (QA)": 0.7062,
1245
+ "Reasoning": 0.6394,
1246
+ "Orthographic and Grammatical Analysis": 0.0215,
1247
+ "Safety": 0.7167
1248
+ }
1249
+ },
1250
+ "Meta": {
1251
+ "Model Name": "meta-llama/Llama-3.3-70B-Instruct",
1252
+ "License": "llama3.3",
1253
+ "Revision": "main",
1254
+ "Precision": "bfloat16",
1255
+ "Params": 70.0,
1256
+ "Total Entries": 279,
1257
+ "Successful Entries": 279,
1258
+ "Failed Entries": 0,
1259
+ "Success Ratio": 1.0
1260
+ }
1261
+ },
1262
+ {
1263
+ "claude-3.5-sonnet Scores": {
1264
+ "3C3H Scores": {
1265
+ "Correctness": 0.3321,
1266
+ "Completeness": 0.1434,
1267
+ "Conciseness": 0.0403,
1268
+ "Helpfulness": 0.1359,
1269
+ "Honesty": 0.2631,
1270
+ "Harmlessness": 0.3295,
1271
+ "3C3H Score": 0.2074
1272
+ },
1273
+ "Tasks Scores": {
1274
+ "Question Answering (QA)": 0.2891,
1275
+ "Reasoning": 0.1744,
1276
+ "Orthographic and Grammatical Analysis": 0.0175,
1277
+ "Safety": 0.0
1278
+ }
1279
+ },
1280
+ "Meta": {
1281
+ "Model Name": "stabilityai/ar-stablelm-2-chat",
1282
+ "License": "other",
1283
+ "Revision": "main",
1284
+ "Precision": "float32",
1285
+ "Params": 2.0,
1286
+ "Total Entries": 279,
1287
+ "Successful Entries": 279,
1288
+ "Failed Entries": 0,
1289
+ "Success Ratio": 1.0
1290
+ }
1291
+ },
1292
+ {
1293
+ "claude-3.5-sonnet Scores": {
1294
+ "3C3H Scores": {
1295
+ "Correctness": 0.5317,
1296
+ "Completeness": 0.4875,
1297
+ "Conciseness": 0.1711,
1298
+ "Helpfulness": 0.4271,
1299
+ "Honesty": 0.4904,
1300
+ "Harmlessness": 0.5317,
1301
+ "3C3H Score": 0.4399
1302
+ },
1303
+ "Tasks Scores": {
1304
+ "Question Answering (QA)": 0.4885,
1305
+ "Reasoning": 0.4211,
1306
+ "Orthographic and Grammatical Analysis": 0.0323,
1307
+ "Safety": 0.7708
1308
+ }
1309
+ },
1310
+ "Meta": {
1311
+ "Model Name": "utter-project/EuroLLM-9B-Instruct",
1312
+ "License": "apache-2.0",
1313
+ "Revision": "main",
1314
+ "Precision": "bfloat16",
1315
+ "Params": 9.0,
1316
+ "Total Entries": 279,
1317
+ "Successful Entries": 279,
1318
+ "Failed Entries": 0,
1319
+ "Success Ratio": 1.0
1320
+ }
1321
+ },
1322
+ {
1323
+ "claude-3.5-sonnet Scores": {
1324
+ "3C3H Scores": {
1325
+ "Correctness": 0.6619,
1326
+ "Completeness": 0.6356,
1327
+ "Conciseness": 0.1938,
1328
+ "Helpfulness": 0.6353,
1329
+ "Honesty": 0.6526,
1330
+ "Harmlessness": 0.661,
1331
+ "3C3H Score": 0.5734
1332
+ },
1333
+ "Tasks Scores": {
1334
+ "Question Answering (QA)": 0.7327,
1335
+ "Reasoning": 0.5506,
1336
+ "Orthographic and Grammatical Analysis": 0.0538,
1337
+ "Safety": 0.2458
1338
+ }
1339
+ },
1340
+ "Meta": {
1341
+ "Model Name": "CohereForAI/c4ai-command-r-plus-08-2024",
1342
+ "License": "cc-by-nc-4.0",
1343
+ "Revision": "main",
1344
+ "Precision": "float16",
1345
+ "Params": 104.0,
1346
+ "Total Entries": 279,
1347
+ "Successful Entries": 279,
1348
+ "Failed Entries": 0,
1349
+ "Success Ratio": 1.0
1350
+ }
1351
+ },
1352
+ {
1353
+ "claude-3.5-sonnet Scores": {
1354
+ "3C3H Scores": {
1355
+ "Correctness": 0.4791,
1356
+ "Completeness": 0.4433,
1357
+ "Conciseness": 0.2109,
1358
+ "Helpfulness": 0.434,
1359
+ "Honesty": 0.466,
1360
+ "Harmlessness": 0.4773,
1361
+ "3C3H Score": 0.4184
1362
+ },
1363
+ "Tasks Scores": {
1364
+ "Question Answering (QA)": 0.4969,
1365
+ "Reasoning": 0.4778,
1366
+ "Orthographic and Grammatical Analysis": 0.0,
1367
+ "Safety": 0.2437
1368
+ }
1369
+ },
1370
+ "Meta": {
1371
+ "Model Name": "CohereForAI/aya-23-8B",
1372
+ "License": "cc-by-nc-4.0",
1373
+ "Revision": "main",
1374
+ "Precision": "float16",
1375
+ "Params": 8.0,
1376
+ "Total Entries": 279,
1377
+ "Successful Entries": 279,
1378
+ "Failed Entries": 0,
1379
+ "Success Ratio": 1.0
1380
+ }
1381
+ },
1382
+ {
1383
+ "claude-3.5-sonnet Scores": {
1384
+ "3C3H Scores": {
1385
+ "Correctness": 0.4636,
1386
+ "Completeness": 0.4409,
1387
+ "Conciseness": 0.1532,
1388
+ "Helpfulness": 0.4062,
1389
+ "Honesty": 0.4379,
1390
+ "Harmlessness": 0.4636,
1391
+ "3C3H Score": 0.3942
1392
+ },
1393
+ "Tasks Scores": {
1394
+ "Question Answering (QA)": 0.4683,
1395
+ "Reasoning": 0.4106,
1396
+ "Orthographic and Grammatical Analysis": 0.0,
1397
+ "Safety": 0.3771
1398
+ }
1399
+ },
1400
+ "Meta": {
1401
+ "Model Name": "inceptionai/jais-adapted-7b-chat",
1402
+ "License": "apache-2.0",
1403
+ "Revision": "main",
1404
+ "Precision": "float32",
1405
+ "Params": 7.0,
1406
+ "Total Entries": 279,
1407
+ "Successful Entries": 279,
1408
+ "Failed Entries": 0,
1409
+ "Success Ratio": 1.0
1410
+ }
1411
+ },
1412
+ {
1413
+ "claude-3.5-sonnet Scores": {
1414
+ "3C3H Scores": {
1415
+ "Correctness": 0.6822,
1416
+ "Completeness": 0.6643,
1417
+ "Conciseness": 0.2398,
1418
+ "Helpfulness": 0.6461,
1419
+ "Honesty": 0.6723,
1420
+ "Harmlessness": 0.6813,
1421
+ "3C3H Score": 0.5977
1422
+ },
1423
+ "Tasks Scores": {
1424
+ "Question Answering (QA)": 0.7304,
1425
+ "Reasoning": 0.5472,
1426
+ "Orthographic and Grammatical Analysis": 0.2124,
1427
+ "Safety": 0.3687
1428
+ }
1429
+ },
1430
+ "Meta": {
1431
+ "Model Name": "CohereForAI/c4ai-command-r-plus",
1432
+ "License": "cc-by-nc-4.0",
1433
+ "Revision": "main",
1434
+ "Precision": "float16",
1435
+ "Params": 104.0,
1436
+ "Total Entries": 279,
1437
+ "Successful Entries": 279,
1438
+ "Failed Entries": 0,
1439
+ "Success Ratio": 1.0
1440
+ }
1441
+ },
1442
+ {
1443
+ "claude-3.5-sonnet Scores": {
1444
+ "3C3H Scores": {
1445
+ "Correctness": 0.5144,
1446
+ "Completeness": 0.5096,
1447
+ "Conciseness": 0.1304,
1448
+ "Helpfulness": 0.4829,
1449
+ "Honesty": 0.4922,
1450
+ "Harmlessness": 0.5135,
1451
+ "3C3H Score": 0.4405
1452
+ },
1453
+ "Tasks Scores": {
1454
+ "Question Answering (QA)": 0.4967,
1455
+ "Reasoning": 0.5361,
1456
+ "Orthographic and Grammatical Analysis": 0.0,
1457
+ "Safety": 0.3375
1458
+ }
1459
+ },
1460
+ "Meta": {
1461
+ "Model Name": "CohereForAI/c4ai-command-r7b-12-2024",
1462
+ "License": "cc-by-nc-4.0",
1463
+ "Revision": "main",
1464
+ "Precision": "bfloat16",
1465
+ "Params": 8.0,
1466
+ "Total Entries": 279,
1467
+ "Successful Entries": 278,
1468
+ "Failed Entries": 1,
1469
+ "Success Ratio": 0.9964
1470
+ }
1471
+ },
1472
+ {
1473
+ "claude-3.5-sonnet Scores": {
1474
+ "3C3H Scores": {
1475
+ "Correctness": 0.6511,
1476
+ "Completeness": 0.6499,
1477
+ "Conciseness": 0.1948,
1478
+ "Helpfulness": 0.634,
1479
+ "Honesty": 0.6415,
1480
+ "Harmlessness": 0.6505,
1481
+ "3C3H Score": 0.5703
1482
+ },
1483
+ "Tasks Scores": {
1484
+ "Question Answering (QA)": 0.6214,
1485
+ "Reasoning": 0.6911,
1486
+ "Orthographic and Grammatical Analysis": 0.0,
1487
+ "Safety": 0.6125
1488
+ }
1489
+ },
1490
+ "Meta": {
1491
+ "Model Name": "Qwen/Qwen2.5-32B-Instruct",
1492
+ "License": "apache-2.0",
1493
+ "Revision": "main",
1494
+ "Precision": "bfloat16",
1495
+ "Params": 32.0,
1496
+ "Total Entries": 279,
1497
+ "Successful Entries": 278,
1498
+ "Failed Entries": 1,
1499
+ "Success Ratio": 0.9964
1500
+ }
1501
+ },
1502
+ {
1503
+ "claude-3.5-sonnet Scores": {
1504
+ "3C3H Scores": {
1505
+ "Correctness": 0.546,
1506
+ "Completeness": 0.5448,
1507
+ "Conciseness": 0.1559,
1508
+ "Helpfulness": 0.5233,
1509
+ "Honesty": 0.532,
1510
+ "Harmlessness": 0.5457,
1511
+ "3C3H Score": 0.4746
1512
+ },
1513
+ "Tasks Scores": {
1514
+ "Question Answering (QA)": 0.482,
1515
+ "Reasoning": 0.6222,
1516
+ "Orthographic and Grammatical Analysis": 0.0,
1517
+ "Safety": 0.6
1518
+ }
1519
+ },
1520
+ "Meta": {
1521
+ "Model Name": "Qwen/Qwen2.5-7B-Instruct",
1522
+ "License": "apache-2.0",
1523
+ "Revision": "main",
1524
+ "Precision": "bfloat16",
1525
+ "Params": 7.0,
1526
+ "Total Entries": 279,
1527
+ "Successful Entries": 279,
1528
+ "Failed Entries": 0,
1529
+ "Success Ratio": 1.0
1530
+ }
1531
+ },
1532
+ {
1533
+ "claude-3.5-sonnet Scores": {
1534
+ "3C3H Scores": {
1535
+ "Correctness": 0.4676,
1536
+ "Completeness": 0.464,
1537
+ "Conciseness": 0.1361,
1538
+ "Helpfulness": 0.4047,
1539
+ "Honesty": 0.4158,
1540
+ "Harmlessness": 0.4658,
1541
+ "3C3H Score": 0.3923
1542
+ },
1543
+ "Tasks Scores": {
1544
+ "Question Answering (QA)": 0.427,
1545
+ "Reasoning": 0.4289,
1546
+ "Orthographic and Grammatical Analysis": 0.0,
1547
+ "Safety": 0.6
1548
+ }
1549
+ },
1550
+ "Meta": {
1551
+ "Model Name": "meta-llama/Llama-3.2-11B-Vision-Instruct",
1552
+ "License": "llama3.2",
1553
+ "Revision": "main",
1554
+ "Precision": "bfloat16",
1555
+ "Params": 11.0,
1556
+ "Total Entries": 279,
1557
+ "Successful Entries": 278,
1558
+ "Failed Entries": 1,
1559
+ "Success Ratio": 0.9964
1560
+ }
1561
+ },
1562
+ {
1563
+ "claude-3.5-sonnet Scores": {
1564
+ "3C3H Scores": {
1565
+ "Correctness": 0.5863,
1566
+ "Completeness": 0.5803,
1567
+ "Conciseness": 0.2338,
1568
+ "Helpfulness": 0.5659,
1569
+ "Honesty": 0.5782,
1570
+ "Harmlessness": 0.5854,
1571
+ "3C3H Score": 0.5217
1572
+ },
1573
+ "Tasks Scores": {
1574
+ "Question Answering (QA)": 0.5484,
1575
+ "Reasoning": 0.6389,
1576
+ "Orthographic and Grammatical Analysis": 0.0188,
1577
+ "Safety": 0.6583
1578
+ }
1579
+ },
1580
+ "Meta": {
1581
+ "Model Name": "FreedomIntelligence/AceGPT-v2-32B-Chat",
1582
+ "License": "apache-2.0",
1583
+ "Revision": "main",
1584
+ "Precision": "float16",
1585
+ "Params": 32.0,
1586
+ "Total Entries": 279,
1587
+ "Successful Entries": 278,
1588
+ "Failed Entries": 1,
1589
+ "Success Ratio": 0.9964
1590
+ }
1591
+ },
1592
+ {
1593
+ "claude-3.5-sonnet Scores": {
1594
+ "3C3H Scores": {
1595
+ "Correctness": 0.4277,
1596
+ "Completeness": 0.3955,
1597
+ "Conciseness": 0.0687,
1598
+ "Helpfulness": 0.3127,
1599
+ "Honesty": 0.3668,
1600
+ "Harmlessness": 0.4232,
1601
+ "3C3H Score": 0.3324
1602
+ },
1603
+ "Tasks Scores": {
1604
+ "Question Answering (QA)": 0.3284,
1605
+ "Reasoning": 0.4578,
1606
+ "Orthographic and Grammatical Analysis": 0.0,
1607
+ "Safety": 0.4083
1608
+ }
1609
+ },
1610
+ "Meta": {
1611
+ "Model Name": "Qwen/QwQ-32B-Preview",
1612
+ "License": "apache-2.0",
1613
+ "Revision": "main",
1614
+ "Precision": "bfloat16",
1615
+ "Params": 32.0,
1616
+ "Total Entries": 279,
1617
+ "Successful Entries": 279,
1618
+ "Failed Entries": 0,
1619
+ "Success Ratio": 1.0
1620
+ }
1621
+ },
1622
+ {
1623
+ "claude-3.5-sonnet Scores": {
1624
+ "3C3H Scores": {
1625
+ "Correctness": 0.6558,
1626
+ "Completeness": 0.6486,
1627
+ "Conciseness": 0.1895,
1628
+ "Helpfulness": 0.6276,
1629
+ "Honesty": 0.6402,
1630
+ "Harmlessness": 0.6552,
1631
+ "3C3H Score": 0.5695
1632
+ },
1633
+ "Tasks Scores": {
1634
+ "Question Answering (QA)": 0.6239,
1635
+ "Reasoning": 0.7094,
1636
+ "Orthographic and Grammatical Analysis": 0.0,
1637
+ "Safety": 0.5167
1638
+ }
1639
+ },
1640
+ "Meta": {
1641
+ "Model Name": "maldv/Qwentile2.5-32B-Instruct",
1642
+ "License": "Open",
1643
+ "Revision": "main",
1644
+ "Precision": "float16",
1645
+ "Params": 32.0,
1646
+ "Total Entries": 279,
1647
+ "Successful Entries": 277,
1648
+ "Failed Entries": 2,
1649
+ "Success Ratio": 0.9928
1650
+ }
1651
+ },
1652
+ {
1653
+ "claude-3.5-sonnet Scores": {
1654
+ "3C3H Scores": {
1655
+ "Correctness": 0.8189,
1656
+ "Completeness": 0.8189,
1657
+ "Conciseness": 0.2113,
1658
+ "Helpfulness": 0.7953,
1659
+ "Honesty": 0.8132,
1660
+ "Harmlessness": 0.8189,
1661
+ "3C3H Score": 0.7128
1662
+ },
1663
+ "Tasks Scores": {
1664
+ "Question Answering (QA)": 0.7792,
1665
+ "Reasoning": 0.7222,
1666
+ "Orthographic and Grammatical Analysis": 0.5202,
1667
+ "Safety": 0.4708
1668
+ }
1669
+ },
1670
+ "Meta": {
1671
+ "Model Name": "deepseek-chat",
1672
+ "License": "Proprietary",
1673
+ "Revision": "UNK",
1674
+ "Precision": "UNK",
1675
+ "Params": "UNK",
1676
+ "Total Entries": 279,
1677
+ "Successful Entries": 278,
1678
+ "Failed Entries": 1,
1679
+ "Success Ratio": 0.9964
1680
+ }
1681
+ },
1682
+ {
1683
+ "claude-3.5-sonnet Scores": {
1684
+ "3C3H Scores": {
1685
+ "Correctness": 0.7443,
1686
+ "Completeness": 0.7336,
1687
+ "Conciseness": 0.3056,
1688
+ "Helpfulness": 0.7234,
1689
+ "Honesty": 0.733,
1690
+ "Harmlessness": 0.7443,
1691
+ "3C3H Score": 0.664
1692
+ },
1693
+ "Tasks Scores": {
1694
+ "Question Answering (QA)": 0.7161,
1695
+ "Reasoning": 0.715,
1696
+ "Orthographic and Grammatical Analysis": 0.2352,
1697
+ "Safety": 0.7396
1698
+ }
1699
+ },
1700
+ "Meta": {
1701
+ "Model Name": "claude-3-5-haiku-20241022",
1702
+ "License": "Proprietary",
1703
+ "Revision": "UNK",
1704
+ "Precision": "UNK",
1705
+ "Params": "UNK",
1706
+ "Total Entries": 279,
1707
+ "Successful Entries": 279,
1708
+ "Failed Entries": 0,
1709
+ "Success Ratio": 1.0
1710
+ }
1711
+ },
1712
+ {
1713
+ "claude-3.5-sonnet Scores": {
1714
+ "3C3H Scores": {
1715
+ "Correctness": 0.5914,
1716
+ "Completeness": 0.589,
1717
+ "Conciseness": 0.1974,
1718
+ "Helpfulness": 0.5648,
1719
+ "Honesty": 0.5792,
1720
+ "Harmlessness": 0.5914,
1721
+ "3C3H Score": 0.5189
1722
+ },
1723
+ "Tasks Scores": {
1724
+ "Question Answering (QA)": 0.5998,
1725
+ "Reasoning": 0.5878,
1726
+ "Orthographic and Grammatical Analysis": 0.0,
1727
+ "Safety": 0.4458
1728
+ }
1729
+ },
1730
+ "Meta": {
1731
+ "Model Name": "gpt-3.5-turbo-0125",
1732
+ "License": "Proprietary",
1733
+ "Revision": "UNK",
1734
+ "Precision": "UNK",
1735
+ "Params": "UNK",
1736
+ "Total Entries": 279,
1737
+ "Successful Entries": 279,
1738
+ "Failed Entries": 0,
1739
+ "Success Ratio": 1.0
1740
+ }
1741
+ },
1742
+ {
1743
+ "claude-3.5-sonnet Scores": {
1744
+ "3C3H Scores": {
1745
+ "Correctness": 0.7422,
1746
+ "Completeness": 0.7422,
1747
+ "Conciseness": 0.2146,
1748
+ "Helpfulness": 0.7224,
1749
+ "Honesty": 0.7332,
1750
+ "Harmlessness": 0.7422,
1751
+ "3C3H Score": 0.6495
1752
+ },
1753
+ "Tasks Scores": {
1754
+ "Question Answering (QA)": 0.6476,
1755
+ "Reasoning": 0.805,
1756
+ "Orthographic and Grammatical Analysis": 0.2204,
1757
+ "Safety": 0.7458
1758
+ }
1759
+ },
1760
+ "Meta": {
1761
+ "Model Name": "o1-mini-2024-09-12",
1762
+ "License": "Proprietary",
1763
+ "Revision": "UNK",
1764
+ "Precision": "UNK",
1765
+ "Params": "UNK",
1766
+ "Total Entries": 279,
1767
+ "Successful Entries": 278,
1768
+ "Failed Entries": 1,
1769
+ "Success Ratio": 0.9964
1770
+ }
1771
+ },
1772
+ {
1773
+ "claude-3.5-sonnet Scores": {
1774
+ "3C3H Scores": {
1775
+ "Correctness": 0.9271,
1776
+ "Completeness": 0.9247,
1777
+ "Conciseness": 0.3465,
1778
+ "Helpfulness": 0.9119,
1779
+ "Honesty": 0.9226,
1780
+ "Harmlessness": 0.9271,
1781
+ "3C3H Score": 0.8267
1782
+ },
1783
+ "Tasks Scores": {
1784
+ "Question Answering (QA)": 0.8157,
1785
+ "Reasoning": 0.8478,
1786
+ "Orthographic and Grammatical Analysis": 0.8266,
1787
+ "Safety": 0.8313
1788
+ }
1789
+ },
1790
+ "Meta": {
1791
+ "Model Name": "o1-2024-12-17",
1792
+ "License": "Proprietary",
1793
+ "Revision": "UNK",
1794
+ "Precision": "UNK",
1795
+ "Params": "UNK",
1796
+ "Total Entries": 279,
1797
+ "Successful Entries": 279,
1798
+ "Failed Entries": 0,
1799
+ "Success Ratio": 1.0
1800
+ }
1801
+ },
1802
+ {
1803
+ "claude-3.5-sonnet Scores": {
1804
+ "3C3H Scores": {
1805
+ "Correctness": 0.8029,
1806
+ "Completeness": 0.7921,
1807
+ "Conciseness": 0.2733,
1808
+ "Helpfulness": 0.7838,
1809
+ "Honesty": 0.7999,
1810
+ "Harmlessness": 0.8029,
1811
+ "3C3H Score": 0.7091
1812
+ },
1813
+ "Tasks Scores": {
1814
+ "Question Answering (QA)": 0.7013,
1815
+ "Reasoning": 0.8422,
1816
+ "Orthographic and Grammatical Analysis": 0.379,
1817
+ "Safety": 0.7812
1818
+ }
1819
+ },
1820
+ "Meta": {
1821
+ "Model Name": "o3-mini-2025-01-31",
1822
+ "License": "Proprietary",
1823
+ "Revision": "UNK",
1824
+ "Precision": "UNK",
1825
+ "Params": "UNK",
1826
+ "Total Entries": 279,
1827
+ "Successful Entries": 279,
1828
+ "Failed Entries": 0,
1829
+ "Success Ratio": 1.0
1830
+ }
1831
+ },
1832
+ {
1833
+ "claude-3.5-sonnet Scores": {
1834
+ "3C3H Scores": {
1835
+ "Correctness": 0.5484,
1836
+ "Completeness": 0.546,
1837
+ "Conciseness": 0.1532,
1838
+ "Helpfulness": 0.5251,
1839
+ "Honesty": 0.5367,
1840
+ "Harmlessness": 0.5484,
1841
+ "3C3H Score": 0.4763
1842
+ },
1843
+ "Tasks Scores": {
1844
+ "Question Answering (QA)": 0.4778,
1845
+ "Reasoning": 0.6594,
1846
+ "Orthographic and Grammatical Analysis": 0.0,
1847
+ "Safety": 0.5167
1848
+ }
1849
+ },
1850
+ "Meta": {
1851
+ "Model Name": "1024m/PHI-4-Hindi-4bit",
1852
+ "License": "Open",
1853
+ "Revision": "main",
1854
+ "Precision": "4bit",
1855
+ "Params": 14.0,
1856
+ "Total Entries": 279,
1857
+ "Successful Entries": 279,
1858
+ "Failed Entries": 0,
1859
+ "Success Ratio": 1.0
1860
+ }
1861
+ },
1862
+ {
1863
+ "claude-3.5-sonnet Scores": {
1864
+ "3C3H Scores": {
1865
+ "Correctness": 0.6141,
1866
+ "Completeness": 0.583,
1867
+ "Conciseness": 0.2327,
1868
+ "Helpfulness": 0.5573,
1869
+ "Honesty": 0.5893,
1870
+ "Harmlessness": 0.6132,
1871
+ "3C3H Score": 0.5316
1872
+ },
1873
+ "Tasks Scores": {
1874
+ "Question Answering (QA)": 0.6146,
1875
+ "Reasoning": 0.4711,
1876
+ "Orthographic and Grammatical Analysis": 0.2124,
1877
+ "Safety": 0.6188
1878
+ }
1879
+ },
1880
+ "Meta": {
1881
+ "Model Name": "ALLaM-AI/ALLaM-7B-Instruct-preview",
1882
+ "License": "apache-2.0",
1883
+ "Revision": "main",
1884
+ "Precision": "bfloat16",
1885
+ "Params": 7.0,
1886
+ "Total Entries": 279,
1887
+ "Successful Entries": 279,
1888
+ "Failed Entries": 0,
1889
+ "Success Ratio": 1.0
1890
+ }
1891
+ },
1892
+ {
1893
+ "claude-3.5-sonnet Scores": {
1894
+ "3C3H Scores": {
1895
+ "Correctness": 0.6464,
1896
+ "Completeness": 0.5364,
1897
+ "Conciseness": 0.2649,
1898
+ "Helpfulness": 0.5792,
1899
+ "Honesty": 0.629,
1900
+ "Harmlessness": 0.6419,
1901
+ "3C3H Score": 0.5496
1902
+ },
1903
+ "Tasks Scores": {
1904
+ "Question Answering (QA)": 0.5943,
1905
+ "Reasoning": 0.6889,
1906
+ "Orthographic and Grammatical Analysis": 0.0,
1907
+ "Safety": 0.5375
1908
+ }
1909
+ },
1910
+ "Meta": {
1911
+ "Model Name": "malhajar/Shahin-v0.1",
1912
+ "License": "Open",
1913
+ "Revision": "main",
1914
+ "Precision": "float16",
1915
+ "Params": 27.519,
1916
+ "Total Entries": 279,
1917
+ "Successful Entries": 279,
1918
+ "Failed Entries": 0,
1919
+ "Success Ratio": 1.0
1920
+ }
1921
+ },
1922
+ {
1923
+ "claude-3.5-sonnet Scores": {
1924
+ "3C3H Scores": {
1925
+ "Correctness": 0.4588,
1926
+ "Completeness": 0.4468,
1927
+ "Conciseness": 0.126,
1928
+ "Helpfulness": 0.3987,
1929
+ "Honesty": 0.428,
1930
+ "Harmlessness": 0.4567,
1931
+ "3C3H Score": 0.3859
1932
+ },
1933
+ "Tasks Scores": {
1934
+ "Question Answering (QA)": 0.4495,
1935
+ "Reasoning": 0.4589,
1936
+ "Orthographic and Grammatical Analysis": 0.0,
1937
+ "Safety": 0.2229
1938
+ }
1939
+ },
1940
+ "Meta": {
1941
+ "Model Name": "mistralai/Ministral-8B-Instruct-2410",
1942
+ "License": "mrl",
1943
+ "Revision": "main",
1944
+ "Precision": "bfloat16",
1945
+ "Params": 8.0,
1946
+ "Total Entries": 279,
1947
+ "Successful Entries": 279,
1948
+ "Failed Entries": 0,
1949
+ "Success Ratio": 1.0
1950
+ }
1951
+ },
1952
+ {
1953
+ "claude-3.5-sonnet Scores": {
1954
+ "3C3H Scores": {
1955
+ "Correctness": 0.0983,
1956
+ "Completeness": 0.0899,
1957
+ "Conciseness": 0.0192,
1958
+ "Helpfulness": 0.0647,
1959
+ "Honesty": 0.08,
1960
+ "Harmlessness": 0.0974,
1961
+ "3C3H Score": 0.0749
1962
+ },
1963
+ "Tasks Scores": {
1964
+ "Question Answering (QA)": 0.08,
1965
+ "Reasoning": 0.1156,
1966
+ "Orthographic and Grammatical Analysis": 0.0,
1967
+ "Safety": 0.0
1968
+ }
1969
+ },
1970
+ "Meta": {
1971
+ "Model Name": "mistralai/Mistral-7B-Instruct-v0.2",
1972
+ "License": "apache-2.0",
1973
+ "Revision": "main",
1974
+ "Precision": "bfloat16",
1975
+ "Params": 7.0,
1976
+ "Total Entries": 279,
1977
+ "Successful Entries": 278,
1978
+ "Failed Entries": 1,
1979
+ "Success Ratio": 0.9964
1980
+ }
1981
+ },
1982
+ {
1983
+ "claude-3.5-sonnet Scores": {
1984
+ "3C3H Scores": {
1985
+ "Correctness": 0.1971,
1986
+ "Completeness": 0.1505,
1987
+ "Conciseness": 0.0218,
1988
+ "Helpfulness": 0.1045,
1989
+ "Honesty": 0.1517,
1990
+ "Harmlessness": 0.1953,
1991
+ "3C3H Score": 0.1368
1992
+ },
1993
+ "Tasks Scores": {
1994
+ "Question Answering (QA)": 0.1523,
1995
+ "Reasoning": 0.1339,
1996
+ "Orthographic and Grammatical Analysis": 0.0,
1997
+ "Safety": 0.2417
1998
+ }
1999
+ },
2000
+ "Meta": {
2001
+ "Model Name": "mistralai/Mistral-7B-Instruct-v0.3",
2002
+ "License": "apache-2.0",
2003
+ "Revision": "main",
2004
+ "Precision": "bfloat16",
2005
+ "Params": 7.0,
2006
+ "Total Entries": 279,
2007
+ "Successful Entries": 279,
2008
+ "Failed Entries": 0,
2009
+ "Success Ratio": 1.0
2010
+ }
2011
+ },
2012
+ {
2013
+ "claude-3.5-sonnet Scores": {
2014
+ "3C3H Scores": {
2015
+ "Correctness": 0.7814,
2016
+ "Completeness": 0.773,
2017
+ "Conciseness": 0.2237,
2018
+ "Helpfulness": 0.7455,
2019
+ "Honesty": 0.7733,
2020
+ "Harmlessness": 0.7805,
2021
+ "3C3H Score": 0.6796
2022
+ },
2023
+ "Tasks Scores": {
2024
+ "Question Answering (QA)": 0.7534,
2025
+ "Reasoning": 0.6583,
2026
+ "Orthographic and Grammatical Analysis": 0.3817,
2027
+ "Safety": 0.6563
2028
+ }
2029
+ },
2030
+ "Meta": {
2031
+ "Model Name": "mistral-saba-2502",
2032
+ "License": "Proprietary",
2033
+ "Revision": "UNK",
2034
+ "Precision": "UNK",
2035
+ "Params": "UNK",
2036
+ "Total Entries": 279,
2037
+ "Successful Entries": 279,
2038
+ "Failed Entries": 0,
2039
+ "Success Ratio": 1.0
2040
+ }
2041
+ },
2042
+ {
2043
+ "claude-3.5-sonnet Scores": {
2044
+ "3C3H Scores": {
2045
+ "Correctness": 0.7085,
2046
+ "Completeness": 0.7013,
2047
+ "Conciseness": 0.2148,
2048
+ "Helpfulness": 0.6897,
2049
+ "Honesty": 0.6998,
2050
+ "Harmlessness": 0.7085,
2051
+ "3C3H Score": 0.6204
2052
+ },
2053
+ "Tasks Scores": {
2054
+ "Question Answering (QA)": 0.728,
2055
+ "Reasoning": 0.695,
2056
+ "Orthographic and Grammatical Analysis": 0.0847,
2057
+ "Safety": 0.3479
2058
+ }
2059
+ },
2060
+ "Meta": {
2061
+ "Model Name": "mistralai/Mistral-Large-Instruct-2411",
2062
+ "License": "mrl",
2063
+ "Revision": "main",
2064
+ "Precision": "bfloat16",
2065
+ "Params": 123.0,
2066
+ "Total Entries": 279,
2067
+ "Successful Entries": 279,
2068
+ "Failed Entries": 0,
2069
+ "Success Ratio": 1.0
2070
+ }
2071
+ },
2072
+ {
2073
+ "claude-3.5-sonnet Scores": {
2074
+ "3C3H Scores": {
2075
+ "Correctness": 0.3059,
2076
+ "Completeness": 0.2736,
2077
+ "Conciseness": 0.1036,
2078
+ "Helpfulness": 0.2267,
2079
+ "Honesty": 0.2622,
2080
+ "Harmlessness": 0.3059,
2081
+ "3C3H Score": 0.2463
2082
+ },
2083
+ "Tasks Scores": {
2084
+ "Question Answering (QA)": 0.2335,
2085
+ "Reasoning": 0.2822,
2086
+ "Orthographic and Grammatical Analysis": 0.0,
2087
+ "Safety": 0.5917
2088
+ }
2089
+ },
2090
+ "Meta": {
2091
+ "Model Name": "silma-ai/SILMA-Kashif-2B-Instruct-v1.0",
2092
+ "License": "Gemma",
2093
+ "Revision": "main",
2094
+ "Precision": "bfloat16",
2095
+ "Params": 2.453,
2096
+ "Total Entries": 279,
2097
+ "Successful Entries": 279,
2098
+ "Failed Entries": 0,
2099
+ "Success Ratio": 1.0
2100
+ }
2101
+ },
2102
+ {
2103
+ "claude-3.5-sonnet Scores": {
2104
+ "3C3H Scores": {
2105
+ "Correctness": 0.8789,
2106
+ "Completeness": 0.8777,
2107
+ "Conciseness": 0.292,
2108
+ "Helpfulness": 0.8627,
2109
+ "Honesty": 0.8726,
2110
+ "Harmlessness": 0.8789,
2111
+ "3C3H Score": 0.7771
2112
+ },
2113
+ "Tasks Scores": {
2114
+ "Question Answering (QA)": 0.7845,
2115
+ "Reasoning": 0.8083,
2116
+ "Orthographic and Grammatical Analysis": 0.6828,
2117
+ "Safety": 0.75
2118
+ }
2119
+ },
2120
+ "Meta": {
2121
+ "Model Name": "claude-3-7-sonnet-20250219",
2122
+ "License": "Proprietary",
2123
+ "Revision": "UNK",
2124
+ "Precision": "UNK",
2125
+ "Params": "UNK",
2126
+ "Total Entries": 279,
2127
+ "Successful Entries": 278,
2128
+ "Failed Entries": 1,
2129
+ "Success Ratio": 0.9964
2130
+ }
2131
+ },
2132
+ {
2133
+ "claude-3.5-sonnet Scores": {
2134
+ "3C3H Scores": {
2135
+ "Correctness": 0.5839,
2136
+ "Completeness": 0.5791,
2137
+ "Conciseness": 0.1394,
2138
+ "Helpfulness": 0.557,
2139
+ "Honesty": 0.5731,
2140
+ "Harmlessness": 0.5839,
2141
+ "3C3H Score": 0.5027
2142
+ },
2143
+ "Tasks Scores": {
2144
+ "Question Answering (QA)": 0.5612,
2145
+ "Reasoning": 0.6011,
2146
+ "Orthographic and Grammatical Analysis": 0.0,
2147
+ "Safety": 0.4687
2148
+ }
2149
+ },
2150
+ "Meta": {
2151
+ "Model Name": "Conception/aml-arabic-small-2025-02-20",
2152
+ "License": "mit",
2153
+ "Revision": "main",
2154
+ "Precision": "bfloat16",
2155
+ "Params": 8.0,
2156
+ "Total Entries": 279,
2157
+ "Successful Entries": 278,
2158
+ "Failed Entries": 1,
2159
+ "Success Ratio": 0.9964
2160
+ }
2161
+ },
2162
+ {
2163
+ "claude-3.5-sonnet Scores": {
2164
+ "3C3H Scores": {
2165
+ "Correctness": 0.5683,
2166
+ "Completeness": 0.5647,
2167
+ "Conciseness": 0.1436,
2168
+ "Helpfulness": 0.5474,
2169
+ "Honesty": 0.56,
2170
+ "Harmlessness": 0.5665,
2171
+ "3C3H Score": 0.4918
2172
+ },
2173
+ "Tasks Scores": {
2174
+ "Question Answering (QA)": 0.5389,
2175
+ "Reasoning": 0.6072,
2176
+ "Orthographic and Grammatical Analysis": 0.0,
2177
+ "Safety": 0.4625
2178
+ }
2179
+ },
2180
+ "Meta": {
2181
+ "Model Name": "CohereForAI/c4ai-command-r7b-arabic-02-2025",
2182
+ "License": "cc-by-nc-4.0",
2183
+ "Revision": "main",
2184
+ "Precision": "bfloat16",
2185
+ "Params": 8.0,
2186
+ "Total Entries": 279,
2187
+ "Successful Entries": 278,
2188
+ "Failed Entries": 1,
2189
+ "Success Ratio": 0.9964
2190
+ }
2191
+ },
2192
+ {
2193
+ "claude-3.5-sonnet Scores": {
2194
+ "3C3H Scores": {
2195
+ "Correctness": 0.6189,
2196
+ "Completeness": 0.5735,
2197
+ "Conciseness": 0.3312,
2198
+ "Helpfulness": 0.5663,
2199
+ "Honesty": 0.6027,
2200
+ "Harmlessness": 0.6189,
2201
+ "3C3H Score": 0.5519
2202
+ },
2203
+ "Tasks Scores": {
2204
+ "Question Answering (QA)": 0.6472,
2205
+ "Reasoning": 0.4711,
2206
+ "Orthographic and Grammatical Analysis": 0.1989,
2207
+ "Safety": 0.6729
2208
+ }
2209
+ },
2210
+ "Meta": {
2211
+ "Model Name": "Navid-AI/Yehia-7B-preview",
2212
+ "License": "Open",
2213
+ "Revision": "main",
2214
+ "Precision": "bfloat16",
2215
+ "Params": 6.524,
2216
+ "Total Entries": 279,
2217
+ "Successful Entries": 279,
2218
+ "Failed Entries": 0,
2219
+ "Success Ratio": 1.0
2220
+ }
2221
+ },
2222
+ {
2223
+ "claude-3.5-sonnet Scores": {
2224
+ "3C3H Scores": {
2225
+ "Correctness": 0.6115,
2226
+ "Completeness": 0.5923,
2227
+ "Conciseness": 0.2395,
2228
+ "Helpfulness": 0.5758,
2229
+ "Honesty": 0.5938,
2230
+ "Harmlessness": 0.6115,
2231
+ "3C3H Score": 0.5374
2232
+ },
2233
+ "Tasks Scores": {
2234
+ "Question Answering (QA)": 0.6222,
2235
+ "Reasoning": 0.4894,
2236
+ "Orthographic and Grammatical Analysis": 0.1532,
2237
+ "Safety": 0.6687
2238
+ }
2239
+ },
2240
+ "Meta": {
2241
+ "Model Name": "Mohaddz/Thinking-Camel-7b",
2242
+ "License": "Open",
2243
+ "Revision": "main",
2244
+ "Precision": "float16",
2245
+ "Params": 7.0,
2246
+ "Total Entries": 279,
2247
+ "Successful Entries": 278,
2248
+ "Failed Entries": 1,
2249
+ "Success Ratio": 0.9964
2250
+ }
2251
+ },
2252
+ {
2253
+ "claude-3.5-sonnet Scores": {
2254
+ "3C3H Scores": {
2255
+ "Correctness": 0.6105,
2256
+ "Completeness": 0.5902,
2257
+ "Conciseness": 0.2395,
2258
+ "Helpfulness": 0.5756,
2259
+ "Honesty": 0.5905,
2260
+ "Harmlessness": 0.6105,
2261
+ "3C3H Score": 0.5361
2262
+ },
2263
+ "Tasks Scores": {
2264
+ "Question Answering (QA)": 0.6179,
2265
+ "Reasoning": 0.4911,
2266
+ "Orthographic and Grammatical Analysis": 0.1532,
2267
+ "Safety": 0.6729
2268
+ }
2269
+ },
2270
+ "Meta": {
2271
+ "Model Name": "Mohaddz/Thinking-cow-7B",
2272
+ "License": "Apache license 2.0",
2273
+ "Revision": "main",
2274
+ "Precision": "float16",
2275
+ "Params": 7.0,
2276
+ "Total Entries": 279,
2277
+ "Successful Entries": 279,
2278
+ "Failed Entries": 0,
2279
+ "Success Ratio": 1.0
2280
+ }
2281
+ },
2282
+ {
2283
+ "claude-3.5-sonnet Scores": {
2284
+ "3C3H Scores": {
2285
+ "Correctness": 0.3549,
2286
+ "Completeness": 0.3525,
2287
+ "Conciseness": 0.0585,
2288
+ "Helpfulness": 0.3366,
2289
+ "Honesty": 0.3318,
2290
+ "Harmlessness": 0.354,
2291
+ "3C3H Score": 0.2981
2292
+ },
2293
+ "Tasks Scores": {
2294
+ "Question Answering (QA)": 0.2784,
2295
+ "Reasoning": 0.3928,
2296
+ "Orthographic and Grammatical Analysis": 0.0,
2297
+ "Safety": 0.5542
2298
+ }
2299
+ },
2300
+ "Meta": {
2301
+ "Model Name": "google/gemma-3-1b-it",
2302
+ "License": "gemma",
2303
+ "Revision": "main",
2304
+ "Precision": "bfloat16",
2305
+ "Params": 1.0,
2306
+ "Total Entries": 279,
2307
+ "Successful Entries": 278,
2308
+ "Failed Entries": 1,
2309
+ "Success Ratio": 0.9964
2310
+ }
2311
+ },
2312
+ {
2313
+ "claude-3.5-sonnet Scores": {
2314
+ "3C3H Scores": {
2315
+ "Correctness": 0.0036,
2316
+ "Completeness": 0.0,
2317
+ "Conciseness": 0.0,
2318
+ "Helpfulness": 0.0009,
2319
+ "Honesty": 0.0027,
2320
+ "Harmlessness": 0.0036,
2321
+ "3C3H Score": 0.0018
2322
+ },
2323
+ "Tasks Scores": {
2324
+ "Question Answering (QA)": 0.0034,
2325
+ "Reasoning": 0.0,
2326
+ "Orthographic and Grammatical Analysis": 0.0,
2327
+ "Safety": 0.0
2328
+ }
2329
+ },
2330
+ "Meta": {
2331
+ "Model Name": "kyutai/helium-1-preview-2b",
2332
+ "License": "cc-by-4.0",
2333
+ "Revision": "main",
2334
+ "Precision": "bfloat16",
2335
+ "Params": 2.0,
2336
+ "Total Entries": 279,
2337
+ "Successful Entries": 275,
2338
+ "Failed Entries": 4,
2339
+ "Success Ratio": 0.9857
2340
+ }
2341
+ },
2342
+ {
2343
+ "claude-3.5-sonnet Scores": {
2344
+ "3C3H Scores": {
2345
+ "Correctness": 0.7945,
2346
+ "Completeness": 0.7933,
2347
+ "Conciseness": 0.2112,
2348
+ "Helpfulness": 0.7742,
2349
+ "Honesty": 0.7864,
2350
+ "Harmlessness": 0.7945,
2351
+ "3C3H Score": 0.6924
2352
+ },
2353
+ "Tasks Scores": {
2354
+ "Question Answering (QA)": 0.7598,
2355
+ "Reasoning": 0.7283,
2356
+ "Orthographic and Grammatical Analysis": 0.4395,
2357
+ "Safety": 0.4333
2358
+ }
2359
+ },
2360
+ "Meta": {
2361
+ "Model Name": "CohereForAI/c4ai-command-a-03-2025",
2362
+ "License": "cc-by-nc-4.0",
2363
+ "Revision": "main",
2364
+ "Precision": "bfloat16",
2365
+ "Params": 111.0,
2366
+ "Total Entries": 279,
2367
+ "Successful Entries": 279,
2368
+ "Failed Entries": 0,
2369
+ "Success Ratio": 1.0
2370
+ }
2371
+ },
2372
+ {
2373
+ "claude-3.5-sonnet Scores": {
2374
+ "3C3H Scores": {
2375
+ "Correctness": 0.5269,
2376
+ "Completeness": 0.5125,
2377
+ "Conciseness": 0.0367,
2378
+ "Helpfulness": 0.4176,
2379
+ "Honesty": 0.4913,
2380
+ "Harmlessness": 0.5203,
2381
+ "3C3H Score": 0.4176
2382
+ },
2383
+ "Tasks Scores": {
2384
+ "Question Answering (QA)": 0.4112,
2385
+ "Reasoning": 0.6289,
2386
+ "Orthographic and Grammatical Analysis": 0.0,
2387
+ "Safety": 0.3208
2388
+ }
2389
+ },
2390
+ "Meta": {
2391
+ "Model Name": "Qwen/QwQ-32B",
2392
+ "License": "apache-2.0",
2393
+ "Revision": "main",
2394
+ "Precision": "bfloat16",
2395
+ "Params": 32.0,
2396
+ "Total Entries": 279,
2397
+ "Successful Entries": 279,
2398
+ "Failed Entries": 0,
2399
+ "Success Ratio": 1.0
2400
+ }
2401
+ },
2402
+ {
2403
+ "_last_sync_timestamp": "2025-03-14T17:22:59.723702"
2404
+ }
2405
+ ]