PeterKruger commited on
Commit
887c665
·
verified ·
1 Parent(s): ead8450

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -14,7 +14,7 @@ short_description: Collective-Model-As-Judge LLM Benchmark
14
 
15
  # AutoBench 1.0 Demo
16
 
17
- This Space runs a Collective-Model-As-Judge LLM benchmark to compare different language models using Hugging Face's Inference API. This is a simplified version of Autobench 1.0 which relies on multiple inference providers to manage request load and a wider range of models (Anthropic, Grok, Nebius, OpenAI, Together AI, Vertex AI). For more advanced use, please refer to the AutoBench 1.0 repository.
18
 
19
  ## Features
20
 
@@ -53,3 +53,14 @@ The benchmark supports any model available through Hugging Face's Inference API,
53
 
54
  - In order to properly follow real-time the process of question generation, question ranking, answer generation, and answer ranking, check the container logs (above to the right of the "running" button).
55
  - Running a full benchmark might take some time depending on the number of models and iterations. Make sure you have sufficient Hugging Face credits to run the benchmark, especially when employing numerous models for long iteration duration.
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  # AutoBench 1.0 Demo
16
 
17
+ This Space runs a Collective-Model-As-Judge LLM benchmark to compare different language models using Hugging Face's Inference API. This is a simplified version of Autobench 1.0 which relies on multiple inference providers to manage request load and a wider range of models (Anthropic, Grok, Nebius, OpenAI, Together AI, Vertex AI). For more advanced use, please refer to the [Hugging Face AutoBench 1.0 Repository](https://huggingface.co/PeterKruger/AutoBench).
18
 
19
  ## Features
20
 
 
53
 
54
  - In order to properly follow real-time the process of question generation, question ranking, answer generation, and answer ranking, check the container logs (above to the right of the "running" button).
55
  - Running a full benchmark might take some time depending on the number of models and iterations. Make sure you have sufficient Hugging Face credits to run the benchmark, especially when employing numerous models for long iteration duration.
56
+
57
+ ## Get Involved!
58
+
59
+ AutoBench is a step towards more robust, scalable, and future-proof LLM evaluation. We invite you to explore the code, run the benchmark, contribute to its development, and join the discussion on the future of LLM evaluation!
60
+
61
+ * **Start from our blog post on Hugging Face**: [Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)](https://huggingface.co/blog/PeterKruger/autobench/edit)
62
+ * **Explore the code and data:** [Hugging Face AutoBench 1.0 Repository](https://huggingface.co/PeterKruger/AutoBench) <!-- Replace with actual link -->
63
+ * **Try our Demo on Spaces:** [AutoBench 1.0 Demo](https://huggingface.co/spaces/PeterKruger/AutoBench) <!-- Replace with actual link -->
64
+ * **Read the detailed methodology:** [Detailed Methodology Document](https://huggingface.co/PeterKruger/AutoBench/blob/main/AutoBench_1_0_Detailed_Methodology_Document.pdf) <!-- Replace with link -->
65
+ * **Join the discussion:** [Hugging Face AutoBench Community Discussion](https://huggingface.co/PeterKruger/AutoBench/discussions) <!-- Replace with link -->
66
+ * **Contribute:** Contribute: Help us by suggesting new topics, refining prompts, or enhancing the weighting algorithm—submit pull requests or issues via the Hugging Face Repo.