Spaces:

AutoBench
/

AutoBench_1.0_Demo

Running

App Files Files Community

AutoBench_1.0_Demo / README.md

PeterKruger's picture

Update README.md

9d9a69a verified 6 months ago

|

1.06 kB

metadata

title: AutoBench
emoji: 🐠
colorFrom: red
colorTo: yellow
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
license: mit
short_description: LLM Many-Model-As-Judge Benchmark

AutoBench

This Space runs a benchmark to compare different language models using Hugging Face's Inference API.

Features

Benchmark multiple models side by side (models evaluate models)
Test models across various topics and difficulty levels
Evaluate question quality and answer quality
Generate detailed performance reports

How to Use

Enter your Hugging Face API token (needed to access models)
Select the models you want to benchmark
Choose topics and number of iterations
Click "Start Benchmark"
View and download results when complete

Models

The benchmark supports any model available through Hugging Face's Inference API, including:

Meta Llama models
Google Gemma models
Mistral models
And many more!

Note

Running a full benchmark might take some time depending on the number of models and iterations.