Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
20
Changli Tang
Changli
Follow
john-hug's profile picture
liuzhan22's profile picture
pierrci's profile picture
5 followers
·
3 following
TCL606
AI & ML interests
Speech signal processing; video understanding; multi-modal LLM
Recent Activity
authored
a paper
19 days ago
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
authored
a paper
19 days ago
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
authored
a paper
19 days ago
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
View all activity
Organizations
Papers
6
arxiv:
2506.15220
arxiv:
2502.11775
arxiv:
2410.06682
arxiv:
2406.15704
Expand 6 papers
models
0
None public yet
datasets
1
Changli/Ytb_Video
Viewer
•
Updated
Apr 28
•
5.57k
•
277