shubhrapandit commited on
Commit
6ddbbac
·
verified ·
1 Parent(s): 50a2b47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -1
README.md CHANGED
@@ -174,10 +174,252 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
174
 
175
  <details>
176
  <summary>Benchmarking Command</summary>
177
- guidellm --model neuralmagic/pixtral-12b-quantized.w4a16 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=128,generated_tokens=128,images=1,width=640,height=480 --max seconds 120 --backend aiohttp_server
 
 
178
 
179
  </details>
180
 
181
  ### Single-stream performance (measured with vLLM version 0.7.2)
182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
 
175
  <details>
176
  <summary>Benchmarking Command</summary>
177
+ ```
178
+ guidellm --model neuralmagic/pixtral-12b-quantized.w4a16 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>,images=<num_images>,width=<image_width>,height=<image_height> --max seconds 120 --backend aiohttp_server
179
+ ```
180
 
181
  </details>
182
 
183
  ### Single-stream performance (measured with vLLM version 0.7.2)
184
 
185
+ <table border="1" class="dataframe">
186
+ <thead>
187
+ <tr>
188
+ <th></th>
189
+ <th></th>
190
+ <th></th>
191
+ <th style="text-align: center;" colspan="2" >Document Visual Question Answering<br>1680W x 2240H<br>64/128</th>
192
+ <th style="text-align: center;" colspan="2" >Visual Reasoning <br>640W x 480H<br>128/128</th>
193
+ <th style="text-align: center;" colspan="2" >Image Captioning<br>480W x 360H<br>0/128</th>
194
+ </tr>
195
+ <tr>
196
+ <th>Hardware</th>
197
+ <th>Model</th>
198
+ <th>Average Cost Reduction</th>
199
+ <th>Latency (s)</th>
200
+ <th>QPD</th>
201
+ <th>Latency (s)th>
202
+ <th>QPD</th>
203
+ <th>Latency (s)</th>
204
+ <th>QPD</th>
205
+ </tr>
206
+ </thead>
207
+ <tbody style="text-align: center">
208
+ <tr>
209
+ <th rowspan="3" valign="top">A6000x1</th>
210
+ <th>mgoin/pixtral-12b</th>
211
+ <td></td>
212
+ <td>5.7</td>
213
+ <td>796</td>
214
+ <td>4.8</td>
215
+ <td>929</td>
216
+ <td>4.7</td>
217
+ <td>964</td>
218
+ </tr>
219
+ <tr>
220
+ <th>neuralmagic/pixtral-12b-quantized.w8a8</th>
221
+ <td>1.55</td>
222
+ <td>3.7</td>
223
+ <td>1220</td>
224
+ <td>3.1</td>
225
+ <td>1437</td>
226
+ <td>3.0</td>
227
+ <td>1511</td>
228
+ </tr>
229
+ <tr>
230
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
231
+ <td>2.16</td>
232
+ <td>3.2</td>
233
+ <td>1417</td>
234
+ <td>2.1</td>
235
+ <td>2093</td>
236
+ <td>1.9</td>
237
+ <td>2371</td>
238
+ </tr>
239
+ <tr>
240
+ <th rowspan="3" valign="top">A100x1</th>
241
+ <th>mgoin/pixtral-12b</th>
242
+ <td></td>
243
+ <td>3.0</td>
244
+ <td>676</td>
245
+ <td>2.4</td>
246
+ <td>825</td>
247
+ <td>2.3</td>
248
+ <td>859</td>
249
+ </tr>
250
+ <tr>
251
+ <th>neuralmagic/pixtral-12b-quantized.w8a8</th>
252
+ <td>1.38</td>
253
+ <td>2.2</td>
254
+ <td>904</td>
255
+ <td>1.7</td>
256
+ <td>1159</td>
257
+ <td>1.7</td>
258
+ <td>1201</td>
259
+ </tr>
260
+ <tr>
261
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
262
+ <td>1.83</td>
263
+ <td>1.8</td>
264
+ <td>1096</td>
265
+ <td>1.3</td>
266
+ <td>1557</td>
267
+ <td>1.2</td>
268
+ <td>1702</td>
269
+ </tr>
270
+ <tr>
271
+ <th rowspan="3" valign="top">H100x1</th>
272
+ <th>mgoin/pixtral-12b</th>
273
+ <td></td>
274
+ <td>1.8</td>
275
+ <td>595</td>
276
+ <td>1.5</td>
277
+ <td>732</td>
278
+ <td>1.4</td>
279
+ <td>764</td>
280
+ </tr>
281
+ <tr>
282
+ <th>neuralmagic/pixtral-12b-FP8-Dynamic</th>
283
+ <td>1.35</td>
284
+ <td>1.4</td>
285
+ <td>767</td>
286
+ <td>1.1</td>
287
+ <td>1008</td>
288
+ <td>1.0</td>
289
+ <td>1056</td>
290
+ </tr>
291
+ <tr>
292
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
293
+ <td>1.37</td>
294
+ <td>1.4</td>
295
+ <td>787</td>
296
+ <td>1.1</td>
297
+ <td>1018</td>
298
+ <td>1.0</td>
299
+ <td>1065</td>
300
+ </tr>
301
+ </tbody>
302
+ </table>
303
+
304
+
305
+
306
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
307
+
308
+ <table border="1" class="dataframe">
309
+ <thead>
310
+ <tr>
311
+ <th></th>
312
+ <th></th>
313
+ <th></th>
314
+ <th style="text-align: center;" colspan="2" >Document Visual Question Answering<br>1680W x 2240H<br>64/128</th>
315
+ <th style="text-align: center;" colspan="2" >Visual Reasoning <br>640W x 480H<br>128/128</th>
316
+ <th style="text-align: center;" colspan="2" >Image Captioning<br>480W x 360H<br>0/128</th>
317
+ </tr>
318
+ <tr>
319
+ <th>Hardware</th>
320
+ <th>Model</th>
321
+ <th>Average Cost Reduction</th>
322
+ <th>Maximum throughput (QPS)</th>
323
+ <th>QPD</th>
324
+ <th>Maximum throughput (QPS)</th>
325
+ <th>QPD</th>
326
+ <th>Maximum throughput (QPS)</th>
327
+ <th>QPD</th>
328
+ </tr>
329
+ </thead>
330
+ <tbody style="text-align: center">
331
+ <tr>
332
+ <th rowspan="3" valign="top">A6000x1</th>
333
+ <th>mgoin/pixtral-12b</th>
334
+ <td></td>
335
+ <td>0.6</td>
336
+ <td>2632</td>
337
+ <td>0.9</td>
338
+ <td>4108</td>
339
+ <td>1.1</td>
340
+ <td>4774</td>
341
+ </tr>
342
+ <tr>
343
+ <th>neuralmagic/pixtral-12b-quantized.w8a8</th>
344
+ <td>1.50</td>
345
+ <td>0.9</td>
346
+ <td>3901</td>
347
+ <td>1.4</td>
348
+ <td>6160</td>
349
+ <td>1.6</td>
350
+ <td>7292</td>
351
+ </tr>
352
+ <tr>
353
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
354
+ <td>1.41</td>
355
+ <td>0.6</td>
356
+ <td>2890</td>
357
+ <td>1.3</td>
358
+ <td>5758</td>
359
+ <td>1.8</td>
360
+ <td>8312</td>
361
+ </tr>
362
+ <tr>
363
+ <th rowspan="3" valign="top">A100x1</th>
364
+ <th>mgoin/pixtral-12b</th>
365
+ <td></td>
366
+ <td>1.1</td>
367
+ <td>2291</td>
368
+ <td>1.8</td>
369
+ <td>3670</td>
370
+ <td>2.1</td>
371
+ <td>4284</td>
372
+ </tr>
373
+ <tr>
374
+ <th>neuralmagic/pixtral-12b-quantized.w8a8</th>
375
+ <td>1.38</td>
376
+ <td>1.5</td>
377
+ <td>3096</td>
378
+ <td>2.5</td>
379
+ <td>5076</td>
380
+ <td>3.0</td>
381
+ <td>5965</td>
382
+ </tr>
383
+ <tr>
384
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
385
+ <td>1.40</td>
386
+ <td>1.4</td>
387
+ <td>2728</td>
388
+ <td>2.6</td>
389
+ <td>5133</td>
390
+ <td>3.5</td>
391
+ <td>6943</td>
392
+ </tr>
393
+ <tr>
394
+ <th rowspan="3" valign="top">H100x1</th>
395
+ <th>BF16</th>
396
+ <td></td>
397
+ <td>2.6</td>
398
+ <td>2877</td>
399
+ <td>4.0</td>
400
+ <td>4372</td>
401
+ <td>4.7</td>
402
+ <td>5095</td>
403
+ </tr>
404
+ <tr>
405
+ <th>neuralmagic/pixtral-12b-FP8-Dynamic</th>
406
+ <td>1.33</td>
407
+ <td>3.4</td>
408
+ <td>3753</td>
409
+ <td>5.4</td>
410
+ <td>5862</td>
411
+ <td>6.3</td>
412
+ <td>6917</td>
413
+ </tr>
414
+ <tr>
415
+ <th>neuralmagic/pixtral-12b-quantized.w4a16</th>
416
+ <td>1.22</td>
417
+ <td>2.8</td>
418
+ <td>3115</td>
419
+ <td>5.0</td>
420
+ <td>5511</td>
421
+ <td>6.2</td>
422
+ <td>6777</td>
423
+ </tr>
424
+ </tbody>
425
+ </table>