Spaces:

AutoBench
/

AutoBench_1.0_Demo

Running

PeterKruger commited on Feb 28

Commit

a5bd1a1

verified ·

1 Parent(s): 0f654e5

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ short_description: Many-Model-As-Judge LLM Benchmark
 # AutoBench 1.0 Demo
-This Space runs a Many-Model-As-Judge LLM benchmark to compare different language models using Hugging Face's Inference API. This is a simplified version of Autobench 1.0 which relies on multiple inference providers to manage request load and a wider range of models (Anthropic, Grok, Nebius, OpenAI, Together AI, Vertex AI). For more advanced use, please use refer to the AutoBench 1.0 repository.
 ## Features
@@ -51,4 +51,5 @@ The benchmark supports any model available through Hugging Face's Inference API,
 ## Note
-Running a full benchmark might take some time depending on the number of models and iterations. Make sure you have sufficient Hugging Face credits to run the benchmark, especially when employing numerous models for long iteration duration.

 # AutoBench 1.0 Demo
+This Space runs a Many-Model-As-Judge LLM benchmark to compare different language models using Hugging Face's Inference API. This is a simplified version of Autobench 1.0 which relies on multiple inference providers to manage request load and a wider range of models (Anthropic, Grok, Nebius, OpenAI, Together AI, Vertex AI). For more advanced use, please refer to the AutoBench 1.0 repository.
 ## Features
 ## Note
+- In order to properly follow real-time the process of question generation, question ranking, answer generation, and answer ranking, check the container logs (above to the right of the "running" button).
+- Running a full benchmark might take some time depending on the number of models and iterations. Make sure you have sufficient Hugging Face credits to run the benchmark, especially when employing numerous models for long iteration duration.