|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Multiverse4FM/Multiverse-1K-mixed |
|
- Multiverse4FM/Multiverse-1K |
|
- simplescaling/s1K-1.1 |
|
base_model: |
|
- Qwen/Qwen2.5-32B-Instruct |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
|
|
# Model Summary |
|
|
|
> Multiverse-32B, built on [Multiverse](https://multiverse4fm.github.io/), is the first open-source, non-AR model to achieve scores of 54% and 46% on AIME 2024 & 2025. |
|
|
|
- **Webpage:** [Multiverse](https://multiverse4fm.github.io/) |
|
- **Paper:** [https://arxiv.org/abs/2506.09991](https://arxiv.org/abs/2506.09991) |
|
|
|
# Use |
|
|
|
The model usage is documented [here](https://github.com/Multiverse4FM/Multiverse). |
|
|
|
# Evaluation |
|
|
|
| Model | AIME24 | AIME25 | MATH500 | GPQA-Diamond | |
|
| :--- | :---: | :---: | :---: | :---: | |
|
| s1-32B | 35.4 | 25.8 | 88.6 | 48.0 | |
|
| s1.1-32B | 52.9 | 41.7 | 93.4 | 62.6 | |
|
| Qwen2.5-32B-Instruct | 15.8 | 10.4 | 80.4 | 47.0 | |
|
| Autoregressive-32B | **54.6** | <u>45.0</u> | **92.8** | <u>61.6</u> | |
|
| **Multiverse-32B-zero** | 52.1 | 44.2 | <u>92.4</u> | **63.6** | |
|
| **Multiverse-32B** | 53.8 | **45.8** | 91.8 | 60.7 | |
|
|
|
# Acknowledge |
|
|
|
Thanks to the amazing s1 team for their s1.1 dataset as base data, and the Qwen team for their Qwen-2.5-32B-Instruct as base model. |
|
|
|
# Citation Information |
|
|
|
```bibtex |
|
@misc{yang2025multiverselanguagemodelssecretly, |
|
title={Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation}, |
|
author={Xinyu Yang and Yuwei An and Hongyi Liu and Tianqi Chen and Beidi Chen}, |
|
year={2025}, |
|
eprint={2506.09991}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG}, |
|
url={https://arxiv.org/abs/2506.09991}, |
|
} |
|
``` |
|
|