PeterKruger commited on
Commit
9d9a69a
·
verified ·
1 Parent(s): bb98c4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -11,4 +11,34 @@ license: mit
11
  short_description: LLM Many-Model-As-Judge Benchmark
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: LLM Many-Model-As-Judge Benchmark
12
  ---
13
 
14
+
15
+ # AutoBench
16
+
17
+ This Space runs a benchmark to compare different language models using Hugging Face's Inference API.
18
+
19
+ ## Features
20
+
21
+ - Benchmark multiple models side by side (models evaluate models)
22
+ - Test models across various topics and difficulty levels
23
+ - Evaluate question quality and answer quality
24
+ - Generate detailed performance reports
25
+
26
+ ## How to Use
27
+
28
+ 1. Enter your Hugging Face API token (needed to access models)
29
+ 2. Select the models you want to benchmark
30
+ 3. Choose topics and number of iterations
31
+ 4. Click "Start Benchmark"
32
+ 5. View and download results when complete
33
+
34
+ ## Models
35
+
36
+ The benchmark supports any model available through Hugging Face's Inference API, including:
37
+ - Meta Llama models
38
+ - Google Gemma models
39
+ - Mistral models
40
+ - And many more!
41
+
42
+ ## Note
43
+
44
+ Running a full benchmark might take some time depending on the number of models and iterations.