Model Card for LegoGPT

These are the model weights for LegoGPT, the first approach for generating physically stable LEGO brick models from text prompts, as described in Generating Physically Stable and Buildable LEGO® Designs from Text. This model was fine-tuned from meta-llama/Llama-3.2-1B-Instruct.

Model Details

Model Description

Developed by: Carnegie Mellon University Generative Intelligence Lab
Funded by: This work is partly supported by the Packard Foundation, Cisco Research Grant, and Amazon Faculty Award. This work is also in part supported by the Manufacturing Futures Institute, Carnegie Mellon University, through a grant from the Richard King Mellon Foundation. KD is supported by the Microsoft Research PhD Fellowship.
Model type: Autoregressive
Language(s): English
License: MIT
Finetuned from model: meta-llama/Llama-3.2-1B-Instruct
Project page: https://avalovelace1.github.io/LegoGPT/

Model Sources

Repository: AvaLovelace1/LegoGPT
Paper: Generating Physically Stable and Buildable LEGO® Designs from Text
Demo: cmu-gil/LegoGPT-Demo

Limitations

The model is restricted to creating structures made of 1-unit-tall cuboid bricks on a 20x20x20 grid. It was trained on a dataset of 21 object categories: basket, bed, bench, birdhouse, bookshelf, bottle, bowl, bus, camera, car, chair, guitar, jar, mug, piano, pot, sofa, table, tower, train, vessel. Performance on prompts from outside these categories may be limited.

How to Get Started with the Model

See the GitHub repo for usage examples and an interactive CLI demo.

Training Details

Training Data

LegoGPT was trained using StableText2Lego, a dataset of 47k LEGO structures.

Training Procedure

The model was fine-tuned using LoRA applied to the q_proj and v_proj matrices. We used AdamW optimization. The learning rate followed a cosine decay with warmup.

Training Hyperparameters

Training regime: bf16 mixed precision
Epochs: 3
Global batch size: 64
Max learning rate: 0.002
Learning rate warmup steps: 100
LoRA rank: 32
LoRA alpha: 16
LoRA dropout: 0.05

Evaluation

See the paper for detailed evaluations.

Environmental Impact

Hardware Type: 8x NVIDIA RTX A6000 (48 GB)
Hours used: 0.5

Citation

If you find this model useful for your research, please cite the following work.

@article{pun2025legogpt,
    title   = {Generating Physically Stable and Buildable LEGO Designs from Text},
    author  = {Pun, Ava and Deng, Kangle and Liu, Ruixuan and Ramanan, Deva and Liu, Changliu and Zhu, Jun-Yan},
    journal = {arXiv preprint arXiv:2505.05469},
    year    = {2025}
}

Model Card Contact

Ava Pun ([email protected])

Framework versions

PEFT 0.15.0

AvaLovelace
/

LegoGPT