alielfilali01 (Ali El Filali)

reacted to prithivMLmods's post with ❤️ 4 months ago

Post

5470

FastVLMs by Apple are the talk of the week for edge device VLMs and also for consumer-grade VLMs on the Hub. They have some impressive demos available on the Hub for live captioning and inference tasks. Meanwhile, I’m still exploring one of the coolest edge-device multimodal releases—Liquid AI’s LFM2-VL (450M and 1.6B). I’ve also made a live camera video inference demo, which is capable of running on Colab’s free-tier T4 GPU.

🤗Live Captioning Notebooks:
➠ LiquidAI LFM2 VL 1.6B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_1_6B_Live_Cam.ipynb

➠ LiquidAI LFM2 VL 450M Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_450M_Live_Cam.ipynb

✨I also made a demo for the FastVLM Live Captioning Notebook.
➠ FastVLM 0.5B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Apple-FastVLM-0.5B-Live-Cam/apple_FastVLM_0_5B_live_cam.ipynb

↗️For more notebooks, kindly visit the following repositories.
➠ Multimodal Outpost Notebooks: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks

Feel free to fork, modify, and explore!

reacted to hba123's post with 🚀 4 months ago

Post

2858

We have written a fun little blog on how you can do robotics with Ark and in Python. We also give you some examples of how OpenAI Gym can become hardware-grounded and how easy it is to do so:

Check it out: https://huggingface.co/blog/hba123/ark

5 replies

·

posted an update 5 months ago

Post

876

Guys WTH is "yofo-*" ???
Most OpenAI staff associated with the openai/gpt-oss-68911959590a1634ba11c7a4 release are affiliated to dozens of yofo orgs ...

i.e

yofo-wildflower

Some HF folks as well 👀

reacted to celinah's post with 🚀 5 months ago

Post

2667

✨ Today we’re releasing Tiny Agents in Python — an MCP-powered Agent in ~70 lines of code 🐍

Inspired by Tiny Agents in JS from @julien-c , we ported the idea to Python and integrated it directly into huggingface_hub — with a built-in MCP Client and a Tiny Agents CLI.

TL;DR: With MCP (Model Context Protocol), you can expose tools like web search or image generation and connect them directly to LLMs. It’s simple — and surprisingly powerful.

pip install "huggingface_hub[mcp]>=0.32.0"

We wrote a blog post where we show how to run Tiny Agents, and dive deeper into how they work and how to build your own.
👉 https://huggingface.co/blog/python-tiny-agents

1 reply

·

reacted to blaise-tk's post with 🚀 6 months ago

Post

3594

A few months ago, I shared that I was building with @deeivihh something like "the Steam for open source apps"...

🚀 Today, I’m excited to announce that Dione is now open source and live in public beta!

Our mission is simple: make it easier to discover, use, and contribute to open source applications.

🔗 GitHub: https://github.com/dioneapp/dioneapp
💬 Join the community: https://discord.gg/JDFJp33vrM

Want to give it a try? I’d love your feedback! 👀

reacted to merve's post with 🔥 6 months ago

Post

3670

IN: video fine-tuning support for

facebook V-JEPA 2 in HF transformers 🔥

it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! 🤗

reacted to cbensimon's post with 🔥 7 months ago

Post

4148

🚀 ZeroGPU now supports PyTorch native quantization via torchao

While it hasn’t been battle-tested yet, Int8WeightOnlyConfig is already working flawlessly in our tests.

Let us know if you run into any issues — and we’re excited to see what the community will build!

import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_

pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)

@spaces.GPU
def generate(prompt: str):
    return pipeline(prompt).images[0]

5 replies

·

reacted to Kseniase's post with 👍 7 months ago

Post

6428

12 Foundational AI Model Types

Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196)
+ history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.)

2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs

3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both

4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.

5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047)
Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.

Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

·

reacted to BramVanroy's post with ❤️ 8 months ago

Post

3637

📢💾 Introducing the Common Crawl Creative Commons Corpus (C5)!

C5 is a large-scale effort to heavily filter web-crawled data, as collected by the non-profit Common Crawl, to only documents that are Creative Commons-licensed such as cc-by-4.0 or public domain cc0. At this stage 150 billion tokens have been collected.

---
📄 data: BramVanroy/CommonCrawl-CreativeCommons
🧰 software: https://github.com/BramVanroy/CommonCrawl-CreativeCommons
---

</> To build C5, HTML pages are scrutinized and all links (if any) to CC licenses are collected, both in regular hyperlinks as well as in metadata. Additional data fields are included such as "was the license found in the head?" or "if multiple licenses were found, do they contradict each other?", which makes further filtering a breeze.

🌐 In this first version of C5, 8 languages are included (Afrikaans, German, English, French, Frysian, Italian, Dutch and Spanish). The language set was limited for two reasons: computational and storage limitations, and a collaboration with GPT-NL, which requested CC data for these languages to train a Dutch-focused, copyright-conscious LLM. In total, this V1 release contains almost 150 thousand documents and 150 billion tokens. This data was not filtered on quality nor deduplicated so that you can decide for yourself how much data to keep. To give some quality indication, a dataset field is present to describe whether a document is included in the FineWeb(-2) datasets, which are of high quality.

🔍 More work needs to be done! Only 7 out of 100+ Common Crawl crawls have been processed so far. That's encouraging because it means there is a lot more Creative Commons data to be collected! But to get there I need help in terms of compute. The current processing was already heavily sponsored by the Flemish Supercomputer but more is needed. If you have the compute available and which to collaborate in an open and transparent manner, please get in touch!

1 reply

·

reacted to lukmanaj's post with 👍🤗 8 months ago

Post

2281

I’m excited to share that I’ve completed the Hugging Face Agents Course and earned my certificate.

Over the past few months, I explored how to build intelligent, autonomous agents using cutting-edge tools like smolagents, LlamaIndex, and LangGraph. The course covered everything from the fundamentals of agents to advanced topics like fine-tuning for function-calling, observability, evaluation, and even agents in games.

Some key content included:

1. Introduction to AI Agents

2. Agentic RAG use cases

3. Multi-framework implementation: smolagents, LlamaIndex, and LangGraph

4. Building, testing, and certifying a complete agent project

This was a hands-on, practical experience that deepened my understanding of how to design reliable, tool-using LLM agents. Looking forward to leveraging these skills in real-world applications in healthcare, logistics, and beyond.

Many thanks to the Hugging Face team for putting this together.
Let’s build safe and useful agents!

7 replies

·

posted an update 8 months ago

Post

867

Great efforts from @AtlasIA folks to adapt text2image models (ghibli style) for Moroccan Context

Read the blog is here : https://huggingface.co/blog/atlasia/creating-your-custom-ghibli-text-to-image-model

reacted to anakin87's post with 👍 8 months ago

Post

3516

𝗜 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗮 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 𝘁𝗼 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗲𝘃𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗚𝗥𝗣𝗢! 👑 🗓️

✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

I experimented with GRPO lately.

I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...

I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.

Choosing an original problem forced me to:
🤔 Think about the problem setting
🧬 Generate data
🤏 Choose the right base model
🏆 Design reward functions (and experiencing reward hacking)
🔄 Run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding 😄 experience.

I learned a lot of things, that I want to share with you. 👇
✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
💻 Code: https://github.com/anakin87/qwen-scheduler-grpo
🤗 Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837

2 replies

·

reacted to ImranzamanML's post with 👍🧠 8 months ago

Post

2916

🚀 New paper out: "Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function"
Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function (2410.03979)

In this work, we tackle some major challenges in Arabic multi-label emotion classification especially the issues of class imbalance and label correlation that often hurt model performance, particularly for minority emotions.

Our approach:

Stacked contextual embeddings from fine-tuned ArabicBERT, MarBERT, and AraBERT models.

A meta-learning strategy that builds richer representations.

A hybrid loss function combining class weighting, label correlation matrices, and contrastive learning to better handle class imbalances.

🧠 Model pipeline: stacked embeddings → meta-learner → Bi-LSTM → fully connected network → multi-label classification.

🔍 Extensive experiments show significant improvements across Precision, Recall, F1-Score, Jaccard Accuracy, and Hamming Loss.
🌟 The hybrid loss function in particular helped close the gap between majority and minority classes!

We also performed ablation studies to break down each component’s contribution and the results consistently validated our design choices.

This framework isn't just for Arabic it offers a generalizable path for improving multi-label emotion classification in other low-resource languages and domains.

Big thanks to my co-authors: Muhammad Azeem Aslam, Wang Jun, Nisar Ahmed, Li Yanan, Hu Hongfei, Wang Shiyu, and Xin Liu!

Would love to hear your thoughts on this work! 👇

reacted to shekkizh's post with ❤️ 8 months ago

Post

1908

Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434

3 replies

·

reacted to clem's post with 🤗 10 months ago

Post

4785

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

·

reacted to BrigitteTousi's post with 🚀 10 months ago

Post

3460

LeRobot goes to driving school! 🚗🚗🚗

Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!

Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!

Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

1 reply

·

reacted to MohamedRashad's post with 🚀❤️ 10 months ago

Post

1738

A while back i shared this model MohamedRashad/arabic-small-nougat that was a finetune from facebook/nougat-small for the Arabic Language.

Today this humble project has been scaled with new models, new datasets, new space, and a new paper

Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e

1 reply

·

Ali El Filali PRO

AI & ML interests

Recent Activity

Organizations

Ali El Filali PRO

AI & ML interests

Recent Activity

Organizations

alielfilali01's activity