Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,34 @@ license: mit
|
|
11 |
short_description: LLM Many-Model-As-Judge Benchmark
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
short_description: LLM Many-Model-As-Judge Benchmark
|
12 |
---
|
13 |
|
14 |
+
|
15 |
+
# AutoBench
|
16 |
+
|
17 |
+
This Space runs a benchmark to compare different language models using Hugging Face's Inference API.
|
18 |
+
|
19 |
+
## Features
|
20 |
+
|
21 |
+
- Benchmark multiple models side by side (models evaluate models)
|
22 |
+
- Test models across various topics and difficulty levels
|
23 |
+
- Evaluate question quality and answer quality
|
24 |
+
- Generate detailed performance reports
|
25 |
+
|
26 |
+
## How to Use
|
27 |
+
|
28 |
+
1. Enter your Hugging Face API token (needed to access models)
|
29 |
+
2. Select the models you want to benchmark
|
30 |
+
3. Choose topics and number of iterations
|
31 |
+
4. Click "Start Benchmark"
|
32 |
+
5. View and download results when complete
|
33 |
+
|
34 |
+
## Models
|
35 |
+
|
36 |
+
The benchmark supports any model available through Hugging Face's Inference API, including:
|
37 |
+
- Meta Llama models
|
38 |
+
- Google Gemma models
|
39 |
+
- Mistral models
|
40 |
+
- And many more!
|
41 |
+
|
42 |
+
## Note
|
43 |
+
|
44 |
+
Running a full benchmark might take some time depending on the number of models and iterations.
|