Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
Abstract
Video generation models trained on synthetic physics primitives demonstrate zero-shot generalization to complex real-world scenarios by modeling force propagation through time and space.
Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.
Community
Goal Force trains a physics-grounded video model to follow explicit force-directed goals, achieving zero-shot planning in real-world tasks by implicit neural physics simulation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Planning with Sketch-Guided Verification for Physics-Aware Video Generation (2025)
- Dexterous World Models (2025)
- ProPhy: Progressive Physical Alignment for Dynamic World Simulation (2025)
- MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis (2025)
- VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation (2025)
- Image Generation as a Visual Planner for Robotic Manipulation (2025)
- In-Video Instructions: Visual Signals as Generative Control (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper