Zhensong Zhang's picture

16

Zhensong Zhang

JasonCU

[email protected]

AI & ML interests

None yet

Recent Activity

upvoted a paper 14 days ago

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

upvoted a paper 15 days ago

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

upvoted a paper 27 days ago

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

View all activity

Organizations

None yet

upvoted a paper 14 days ago

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Paper • 2512.14052 • Published 17 days ago • 39

upvoted a paper 15 days ago

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published 16 days ago • 67

upvoted 2 papers 27 days ago

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Paper • 2512.03000 • Published 30 days ago • 36

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Paper • 2512.05060 • Published 28 days ago • 18

upvoted 7 papers 28 days ago

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Paper • 2512.01707 • Published Dec 1, 2025 • 7

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Paper • 2512.00891 • Published Nov 30, 2025 • 14

CaptionQA: Is Your Caption as Useful as the Image Itself?

Paper • 2511.21025 • Published Nov 26, 2025 • 27

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published Nov 7, 2025 • 42

VisPlay: Self-Evolving Vision-Language Models from Images

Paper • 2511.15661 • Published Nov 19, 2025 • 42

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316

ViDiC: Video Difference Captioning

Paper • 2512.03405 • Published 30 days ago • 27

upvoted 3 papers 29 days ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 110

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Paper • 2512.02425 • Published about 1 month ago • 23

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 181

upvoted an article 8 months ago

Article

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖

+1

Apr 14, 2025

•

48

upvoted a paper over 1 year ago

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26, 2024 • 39