view changelog Changelog Inference Providers now fully support OpenAI-compatible API 27 days ago • 76
💬Urdu ASR Models Collection Collection of fine-tuned Urdu speech recognition models. • 9 items • Updated Jul 14 • 2
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 154
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 504
view article Article LeMaterial: an open source initiative to accelerate materials discovery and research By AlexDuvalinho and 9 others • Dec 10, 2024 • 52
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55
Llama 4 Collection Meta's new Llama 4 multimodal models, Scout & Maverick. Includes Dynamic GGUFs, 16-bit & Dynamic 4-bit uploads. Run & fine-tune them with Unsloth! • 15 items • Updated about 5 hours ago • 47
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated Mar 21 • 22
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 7 items • Updated 24 days ago • 155
view article Article DeepSearch Using Visual RAG in Agentic Frameworks 🔎 By paultltc and 1 other • Mar 21 • 35
view article Article Open-Source Handwritten Signature Detection Model By samuellimabraz • Mar 14 • 117
view article Article Welcome to Inference Providers on the Hub 🔥 By julien-c and 6 others • Jan 28 • 487
Gemma 3 Collection All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 55 items • Updated about 5 hours ago • 76
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model By EuroBERT and 3 others • Mar 10 • 146
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality By saurabhdash and 3 others • Mar 4 • 75
Cohere Labs Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 14 days ago • 69
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 287