TesserAct: Learning 4D Embodied World Models
Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan
Paper PDF | Project Page | Model on Hugging Face | Code
We propose TesserAct, the 4D Embodied World Model, which takes input images and text instruction to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
1
Ask for provider support