from dataclasses import dataclass from enum import Enum # Your leaderboard name TITLE = """

🎙️ ACL-25 SpeechIQ Leaderboard

""" # What does your leaderboard evaluate? INTRODUCTION_TEXT = """ ## 🎯 Welcome to the SpeechIQ Leaderboard! This leaderboard presents evaluation results for voice understanding large language models (LLMVoice) using our novel SpeechIQ evaluation framework. The **Speech IQ Score** provides a unified metric for comparing both cascaded methods (ASR+LLM) and end-to-end models. """ # Which evaluations are you running? how can people reproduce what you have? LLM_BENCHMARKS_TEXT = """ ## 📊 About SpeechIQ Evaluation **Speech Intelligence Quotient (SpeechIQ)** represents a first-of-its-kind intelligence examination that bridges cognitive principles with voice-oriented benchmarks. Our framework moves beyond traditional metrics like Word Error Rate (WER) to provide comprehensive evaluation of voice understanding capabilities. ### 🎯 Evaluation Framework SpeechIQ evaluates models across three cognitive dimensions inspired by Bloom's Taxonomy: 1. **Remember** (Verbatim Accuracy): Tests the model's ability to accurately capture spoken content 2. **Understand** (Interpretation Similarity): Evaluates how well the model comprehends the meaning of speech 3. **Apply** (Downstream Performance): Measures the model's ability to use speech understanding for practical tasks ### 🏆 Model Categories - **Agentic (ASR + LLM)**: Cascaded approaches using separate ASR and LLM components - **End2End**: Direct speech-to-text models that process audio end-to-end ### 🔬 Key Benefits - **Unified Comparisons**: Compare cascaded and end-to-end approaches on equal footing - **Error Detection**: Identify annotation errors in existing benchmarks - **Hallucination Detection**: Detect and quantify hallucinations in voice LLMs - **Cognitive Assessment**: Map model capabilities to human cognitive principles ### 📈 Speech IQ Score The final Speech IQ Score combines performance across all three dimensions to provide a comprehensive measure of voice understanding intelligence. ## 🔄 Reproducibility For detailed methodology and reproduction instructions, please refer to our paper and codebase. """ EVALUATION_QUEUE_TEXT = """ ## 🚀 Submit Your Model for SpeechIQ Evaluation To submit your voice understanding model for SpeechIQ evaluation: ### 1) Ensure Model Compatibility Make sure your model can process audio inputs and generate text outputs in one of these formats: - **ASR + LLM**: Separate ASR and LLM components - **End-to-End**: Direct audio-to-text processing ### 2) Model Requirements - Model must be publicly accessible - Provide clear documentation of audio input format and expected outputs - Include information about audio encoder specifications ### 3) Evaluation Domains Your model will be evaluated across: - **Remember**: Transcription accuracy - **Understand**: Semantic understanding - **Apply**: Task-specific performance ### 4) Documentation Please provide: - Model architecture details - Training data information - Audio preprocessing requirements - Expected input/output formats ## 📧 Contact For questions about SpeechIQ evaluation or to submit your model, please contact the research team. """ CITATION_BUTTON_LABEL = "Refer to the following ACL 2025 main conference paper." CITATION_BUTTON_TEXT = r"""@article{speechiq2025, title={SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models}, author={Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi}, journal={ACL 2025 main conference}, year={2025} }"""