tesseract / README.md
anyeZHY's picture
Update README.md
bf2ed41 verified
metadata
license: mit
pipeline_tag: image-to-video
library_name: diffusers

TesserAct: Learning 4D Embodied World Models

Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

Paper PDF  |  Project Page  |  Model on Hugging Face  |  Code

We propose TesserAct, the 4D Embodied World Model, which takes input images and text instruction to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.