See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 8
LASER: Lip Landmark Assisted Speaker Detection for Robustness Paper • 2501.11899 • Published Jan 21, 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 8 • 3
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 8
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 8