🎙️ ACL-25 SpeechIQ Leaderboard

from dataclasses import dataclass
from enum import Enum

# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">🎙️ ACL-25 SpeechIQ Leaderboard</h1>"""

# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
## 🎯 Welcome to the SpeechIQ Leaderboard!

This leaderboard presents evaluation results for voice understanding large language models (LLM<sub>Voice</sub>) using our novel SpeechIQ evaluation framework.
The **Speech IQ Score** provides a unified metric for comparing both cascaded methods (ASR+LLM) and end-to-end models.
"""

# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = """
## 📊 About SpeechIQ Evaluation

**Speech Intelligence Quotient (SpeechIQ)** represents a first-of-its-kind intelligence examination that bridges cognitive principles with voice-oriented benchmarks. Our framework moves beyond traditional metrics like Word Error Rate (WER) to provide comprehensive evaluation of voice understanding capabilities.

### 🎯 Evaluation Framework

SpeechIQ evaluates models across three cognitive dimensions inspired by Bloom's Taxonomy:

1. **Remember** (Verbatim Accuracy): Tests the model's ability to accurately capture spoken content
2. **Understand** (Interpretation Similarity): Evaluates how well the model comprehends the meaning of speech
3. **Apply** (Downstream Performance): Measures the model's ability to use speech understanding for practical tasks

### 🏆 Model Categories

- **Agentic (ASR + LLM)**: Cascaded approaches using separate ASR and LLM components
- **End2End**: Direct speech-to-text models that process audio end-to-end

### 🔬 Key Benefits

- **Unified Comparisons**: Compare cascaded and end-to-end approaches on equal footing
- **Error Detection**: Identify annotation errors in existing benchmarks
- **Hallucination Detection**: Detect and quantify hallucinations in voice LLMs
- **Cognitive Assessment**: Map model capabilities to human cognitive principles

### 📈 Speech IQ Score

The final Speech IQ Score combines performance across all three dimensions to provide a comprehensive measure of voice understanding intelligence.

## 🔄 Reproducibility

For detailed methodology and reproduction instructions, please refer to our paper and codebase.
"""

EVALUATION_QUEUE_TEXT = """
## 🚀 Submit Your Model for SpeechIQ Evaluation

To submit your voice understanding model for SpeechIQ evaluation:

### 1) Ensure Model Compatibility
Make sure your model can process audio inputs and generate text outputs in one of these formats:
- **ASR + LLM**: Separate ASR and LLM components
- **End-to-End**: Direct audio-to-text processing

### 2) Model Requirements
- Model must be publicly accessible
- Provide clear documentation of audio input format and expected outputs
- Include information about audio encoder specifications

### 3) Evaluation Domains
Your model will be evaluated across:
- **Remember**: Transcription accuracy
- **Understand**: Semantic understanding
- **Apply**: Task-specific performance

### 4) Documentation
Please provide:
- Model architecture details
- Training data information
- Audio preprocessing requirements
- Expected input/output formats

## 📧 Contact

For questions about SpeechIQ evaluation or to submit your model, please contact the research team.
"""

CITATION_BUTTON_LABEL = "Refer to the following ACL 2025 main conference paper."
CITATION_BUTTON_TEXT = r"""@article{speechiq2025,
  title={SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models},
  author={Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi},
  journal={ACL 2025 main conference},
  year={2025}
}"""