Morph-1B

Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.

This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.

Model Details

Model Description

  • Developed by: Song Bian*, Minghao Yan*, Shivaram Venkataraman

Model Sources

Model Sources

The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.

Training Details

We utilize DCLM-Baseline dataset for training.

The training procedure and hyperparameters are detailed in our ICML 2025 paper.

Evaluation

We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.

Results

Models d_model n_layers Average Latency(s)
Open-LM-1B 2048 24 0.49 3.61
OPT-1.3B 2048 24 0.50 2.55
Pythia-1.3B 2048 22 0.49 3.28
Neox-1.3B 2048 24 0.49 3.99
OPT-IML-1.3B 2048 24 0.54 2.54
Morph-1B 3072 12 0.52 1.96

Summary

the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.

Citation

BibTeX:

@article{bian2025scaling, title={Scaling Inference-Efficient Language Models}, author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram}, journal={arXiv preprint arXiv:2501.18107}, year={2025} }

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NaiveUser/morph-1b