Spaces:

AutoBench
/

AutoBench_1.0_Demo

Sleeping

AutoBench_1.0_Demo / README.md

Update README.md

9d9a69a verified 6 months ago

1.06 kB

	---
	title: AutoBench
	emoji: 🐠
	colorFrom: red
	colorTo: yellow
	sdk: streamlit
	sdk_version: 1.42.2
	app_file: app.py
	pinned: false
	license: mit
	short_description: LLM Many-Model-As-Judge Benchmark
	---


	# AutoBench

	This Space runs a benchmark to compare different language models using Hugging Face's Inference API.

	## Features

	- Benchmark multiple models side by side (models evaluate models)
	- Test models across various topics and difficulty levels
	- Evaluate question quality and answer quality
	- Generate detailed performance reports

	## How to Use

	1. Enter your Hugging Face API token (needed to access models)
	2. Select the models you want to benchmark
	3. Choose topics and number of iterations
	4. Click "Start Benchmark"
	5. View and download results when complete

	## Models

	The benchmark supports any model available through Hugging Face's Inference API, including:
	- Meta Llama models
	- Google Gemma models
	- Mistral models
	- And many more!

	## Note

	Running a full benchmark might take some time depending on the number of models and iterations.