Morph-1B

Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.

This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.

Model Details

Model Description

Developed by: Song Bian*, Minghao Yan*, Shivaram Venkataraman

Model Sources

Repository: open-lm-morph
Paper: Scaling Inference-Efficient Language Models

Model Sources

The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.

Training Details

We utilize DCLM-Baseline dataset for training.

The training procedure and hyperparameters are detailed in our ICML 2025 paper.

Evaluation

We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.

Results

Models	d_model	n_layers	Average	Latency(s)
Open-LM-1B	2048	24	0.49	3.61
OPT-1.3B	2048	24	0.50	2.55
Pythia-1.3B	2048	22	0.49	3.28
Neox-1.3B	2048	24	0.49	3.99
OPT-IML-1.3B	2048	24	0.54	2.54
Morph-1B	3072	12	0.52	1.96

Summary

the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.

Citation

BibTeX:

@article{bian2025scaling, title={Scaling Inference-Efficient Language Models}, author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram}, journal={arXiv preprint arXiv:2501.18107}, year={2025} }

NaiveUser
/

morph-1b

Morph-1B

Model Details

Model Description

Model Sources

Model Sources

Training Details

Evaluation

Results

Summary

Citation

Dataset used to train NaiveUser/morph-1b