Morph-1B
Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.
This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.
Model Details
Model Description
- Developed by: Song Bian*, Minghao Yan*, Shivaram Venkataraman
Model Sources
- Repository: open-lm-morph
- Paper: Scaling Inference-Efficient Language Models
Model Sources
The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.
Training Details
We utilize DCLM-Baseline dataset for training.
The training procedure and hyperparameters are detailed in our ICML 2025 paper.
Evaluation
We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.
Results
Models | d_model | n_layers | Average | Latency(s) |
---|---|---|---|---|
Open-LM-1B | 2048 | 24 | 0.49 | 3.61 |
OPT-1.3B | 2048 | 24 | 0.50 | 2.55 |
Pythia-1.3B | 2048 | 22 | 0.49 | 3.28 |
Neox-1.3B | 2048 | 24 | 0.49 | 3.99 |
OPT-IML-1.3B | 2048 | 24 | 0.54 | 2.54 |
Morph-1B | 3072 | 12 | 0.52 | 1.96 |
Summary
the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.
Citation
BibTeX:
@article{bian2025scaling, title={Scaling Inference-Efficient Language Models}, author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram}, journal={arXiv preprint arXiv:2501.18107}, year={2025} }