DeltaWorld (Predictor) — Kinetics-700

DeltaWorld is a generative world model operating on "delta" tokens to efficiently generate diverse plausible futures, as introduced in A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens (CVPR 2026 Highlight).

Project Page | GitHub | Paper

This repository contains the generative ViT-B predictor trained on Kinetics-700 at 512x512 resolution.

Usage

Requires a trained DeltaTok tokenizer and a frozen DINOv3 ViT-B backbone. Full training and evaluation code is available in the DeltaTok GitHub repository. To evaluate:

python main.py validate -c configs/deltaworld_vitb_dinov3_vitb_kinetics.yaml \
  --model.ckpt_path=path/to/deltaworld-kinetics/pytorch_model.bin \
  --model.network.tokenizer.ckpt_path=path/to/deltatok-kinetics/pytorch_model.bin

Acknowledgements

Citation

@inproceedings{kerssies2026deltatok,
  title     = {A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens},
  author    = {Kerssies, Tommie and Berton, Gabriele and He, Ju and Yu, Qihang and Ma, Wufei and de Geus, Daan and Dubbelman, Gijs and Chen, Liang-Chieh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Amazon-FAR/deltaworld-kinetics

Paper for Amazon-FAR/deltaworld-kinetics