Spaces:

AutoBench
/

AutoBench_1.0_Demo

Sleeping

PeterKruger commited on Feb 28

Commit

9d9a69a

verified ·

1 Parent(s): bb98c4c

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -11,4 +11,34 @@ license: mit
 short_description: LLM Many-Model-As-Judge Benchmark
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: LLM Many-Model-As-Judge Benchmark
 ---
+# AutoBench
+This Space runs a benchmark to compare different language models using Hugging Face's Inference API.
+## Features
+- Benchmark multiple models side by side (models evaluate models)
+- Test models across various topics and difficulty levels
+- Evaluate question quality and answer quality
+- Generate detailed performance reports
+## How to Use
+1. Enter your Hugging Face API token (needed to access models)
+2. Select the models you want to benchmark
+3. Choose topics and number of iterations
+4. Click "Start Benchmark"
+5. View and download results when complete
+## Models
+The benchmark supports any model available through Hugging Face's Inference API, including:
+- Meta Llama models
+- Google Gemma models
+- Mistral models
+- And many more!
+## Note
+Running a full benchmark might take some time depending on the number of models and iterations.