InfoSynth: Information-Guided Benchmark Synthesis for LLMs Paper • 2601.00575 • Published 5 days ago • 1
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning Paper • 2512.16909 • Published 19 days ago • 1
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Paper • 2409.18121 • Published Sep 26, 2024 • 8
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Paper • 2403.19822 • Published Mar 28, 2024
ALOHa: A New Measure for Hallucination in Captioning Models Paper • 2404.02904 • Published Apr 3, 2024
Virtual Personas for Language Models via an Anthology of Backstories Paper • 2407.06576 • Published Jul 9, 2024 • 1
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18, 2024 • 2
view post Post 621 🚨 Launching The Visual Haystacks (VHs) Benchmark: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark to assess LMMs' capability in long-context visual retrieval and reasoning. Check it out! tsunghanwu/visual_haystackshttps://visual-haystacks.github.io/https://arxiv.org/abs/2407.13766https://github.com/visual-haystacks/vhs_benchmark 🔥 1 1 + Reply
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition Paper • 2401.02417 • Published Jan 4, 2024 • 1
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Paper • 2312.14378 • Published Dec 22, 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises Paper • 2312.08366 • Published Dec 13, 2023
CLAIR: Evaluating Image Captions with Large Language Models Paper • 2310.12971 • Published Oct 19, 2023
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition Paper • 2301.02736 • Published Jan 6, 2023
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video Paper • 2401.05314 • Published Jan 10, 2024 • 12
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping Paper • 2309.07970 • Published Sep 14, 2023 • 8