Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 12 items • Updated 7 days ago • 50
Step-Audio-R1 Collection Step-Audio-R1 is the first audio language model to successfully unlock test-time compute scaling. • 3 items • Updated Nov 21 • 15
LightOnOCR Collection The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR • 7 items • Updated Nov 13 • 15
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23 • 62
MathCanvas Collection Datasets and models for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning" • 5 items • Updated Nov 19 • 3
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16 • 22
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 28 items • Updated Sep 1 • 59
Canary Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 5 items • Updated 7 days ago • 29