lmms-lab
/

LLaVA-NeXT-Video-32B-Qwen

Video-Text-to-Text

text-generation

Model card Files Files and versions

Question about LLaVA-Video-32B-Qwen: Performance issues

#7

by RachelZhou - opened Jan 28, 2025

I have a few questions regarding the 32B video model implementations and performance:

Could you clarify which is the latest model: lmms-lab/LLaVA-NeXT-Video-32B-Qwen or lmms-lab/LLaVA-Video-32B-Qwen? It’s unclear which one should be used for the latest evaluations.
In practical implementations, I’ve noticed that the 32B model appears to perform worse than the 7B and 72B models. Any idea why this might be the case?
I also observed that there hasn’t been a performance evaluation of the 32B model on the latest evaluation benchmarks. Is this due to any particular issue with the model, or has it simply not been prioritized for testing?

really? could you tell more detail about the experiment in which task, so the 7B is more steadily or better accuracy?

LMMs-Lab org Feb 20, 2025

The data used for 32B model is different from 7B and 72B.
32B is just an early version for demo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment