File size: 3,797 Bytes
be6f30c
 
 
97984bb
b9efa72
be6f30c
97984bb
 
b9efa72
be6f30c
97984bb
 
be6f30c
 
 
97984bb
 
 
 
 
 
be6f30c
97984bb
be6f30c
97984bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be6f30c
 
 
97984bb
be6f30c
97984bb
be6f30c
97984bb
 
 
 
be6f30c
97984bb
 
 
 
be6f30c
97984bb
 
 
 
 
be6f30c
97984bb
 
 
 
 
 
be6f30c
97984bb
 
 
be6f30c
 
9e096f1
 
 
 
 
 
97984bb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
from dataclasses import dataclass
from enum import Enum

# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">πŸŽ™οΈ ACL-25 SpeechIQ Leaderboard</h1>"""

# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
## 🎯 Welcome to the SpeechIQ Leaderboard!

This leaderboard presents evaluation results for voice understanding large language models (LLM<sub>Voice</sub>) using our novel SpeechIQ evaluation framework.
The **Speech IQ Score** provides a unified metric for comparing both cascaded methods (ASR+LLM) and end-to-end models.
"""

# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = """
## πŸ“Š About SpeechIQ Evaluation

**Speech Intelligence Quotient (SpeechIQ)** represents a first-of-its-kind intelligence examination that bridges cognitive principles with voice-oriented benchmarks. Our framework moves beyond traditional metrics like Word Error Rate (WER) to provide comprehensive evaluation of voice understanding capabilities.

### 🎯 Evaluation Framework

SpeechIQ evaluates models across three cognitive dimensions inspired by Bloom's Taxonomy:

1. **Remember** (Verbatim Accuracy): Tests the model's ability to accurately capture spoken content
2. **Understand** (Interpretation Similarity): Evaluates how well the model comprehends the meaning of speech
3. **Apply** (Downstream Performance): Measures the model's ability to use speech understanding for practical tasks

### πŸ† Model Categories

- **Agentic (ASR + LLM)**: Cascaded approaches using separate ASR and LLM components
- **End2End**: Direct speech-to-text models that process audio end-to-end

### πŸ”¬ Key Benefits

- **Unified Comparisons**: Compare cascaded and end-to-end approaches on equal footing
- **Error Detection**: Identify annotation errors in existing benchmarks
- **Hallucination Detection**: Detect and quantify hallucinations in voice LLMs
- **Cognitive Assessment**: Map model capabilities to human cognitive principles

### πŸ“ˆ Speech IQ Score

The final Speech IQ Score combines performance across all three dimensions to provide a comprehensive measure of voice understanding intelligence.

## πŸ”„ Reproducibility

For detailed methodology and reproduction instructions, please refer to our paper and codebase.
"""

EVALUATION_QUEUE_TEXT = """
## πŸš€ Submit Your Model for SpeechIQ Evaluation

To submit your voice understanding model for SpeechIQ evaluation:

### 1) Ensure Model Compatibility
Make sure your model can process audio inputs and generate text outputs in one of these formats:
- **ASR + LLM**: Separate ASR and LLM components
- **End-to-End**: Direct audio-to-text processing

### 2) Model Requirements
- Model must be publicly accessible
- Provide clear documentation of audio input format and expected outputs
- Include information about audio encoder specifications

### 3) Evaluation Domains
Your model will be evaluated across:
- **Remember**: Transcription accuracy
- **Understand**: Semantic understanding
- **Apply**: Task-specific performance

### 4) Documentation
Please provide:
- Model architecture details
- Training data information
- Audio preprocessing requirements
- Expected input/output formats

## πŸ“§ Contact

For questions about SpeechIQ evaluation or to submit your model, please contact the research team.
"""

CITATION_BUTTON_LABEL = "Refer to the following ACL 2025 main conference paper."
CITATION_BUTTON_TEXT = r"""@article{speechiq2025,
  title={SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models},
  author={Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi},
  journal={ACL 2025 main conference},
  year={2025}
}"""