AI & ML interests

None defined yet.

Recent Activity

qgallouedecΒ  updated a dataset 5 days ago
trl-lib/llava-instruct-mix
qgallouedecΒ  published a dataset 5 days ago
trl-lib/llava-instruct-mix
qgallouedecΒ  updated a Space 17 days ago
trl-lib/dataset-length-profiler
View all activity

sergiopaniegoΒ 
posted an update 2 days ago
sergiopaniegoΒ 
posted an update 3 days ago
view post
Post
249
New Zero-Shot Object Detectors in transformers! πŸ₯½

We’ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others πŸ–ΌοΈ

Play with it: ariG23498/zero-shot-od
sergiopaniegoΒ 
posted an update 3 days ago
sergiopaniegoΒ 
posted an update 7 days ago
view post
Post
375
Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

πŸŒ‹ GRPO
🎞️ GSPO
πŸ™ MPO
βž• vLLM integration for online training w/ transformers backend\

🐑 Blog: https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoΒ 
posted an update 9 days ago
sergiopaniegoΒ 
posted an update 10 days ago
view post
Post
3338
Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? πŸŒ‹

πŸ§‘β€πŸ³ We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe πŸ‘‰https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images πŸŒ‹
sergiopaniegoΒ 
posted an update 10 days ago
view post
Post
4467
Just included example scripts for aligning models using GSPO (including VLM example) πŸ™†β€β™‚οΈπŸ™†β€β™‚οΈ

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!πŸ‘©β€πŸ’»πŸ‘©β€πŸ’»

πŸ§‘β€πŸŽ¨ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
πŸ¦„ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
πŸ§™β€β™‚οΈ GSPO paper: Group Sequence Policy Optimization (2507.18071)
sergiopaniegoΒ 
posted an update 15 days ago
view post
Post
317
Did you miss this? πŸ‘“

πŸ§™β€β™‚οΈvLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!
sergiopaniegoΒ 
posted an update 16 days ago
view post
Post
2592
We just released TRL v0.20 with major multimodal upgrades!

πŸ‘οΈ VLM support for GRPO (highly requested by the community!)
🎞️ New GSPO trainer (from @Qwen , released last week, VLM-ready)
πŸ™ New MPO trainer (multimodal by design, as in the paper)

πŸ“ Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0
sergiopaniegoΒ 
posted an update 22 days ago
view post
Post
1180
Yet Another New Multimodal Fine-Tuning Recipe πŸ₯§

πŸ§‘β€πŸ³ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

πŸ’‘ This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! ➑️ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update 27 days ago
view post
Post
1662
πŸ§‘β€πŸ³ New Multimodal Fine-Tuning Recipe πŸ§‘β€πŸ³

⚑️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

πŸ” Object detection typically involves detecting categories in images (e.g., vase).

By combining it with visual grounding, we add contextual understanding so instead of detecting just "vase", we can detect "middle vase" in an image.

VLMs are super powerful!

In this case, I use PaliGemma 2 which already supports object detection and extend it to also add visual grounding.

πŸ€— Check it out here: https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding
sergiopaniegoΒ 
posted an update 28 days ago
view post
Post
1625
Multiple NEW notebooks and scripts added to the Hugging Face Gemma recipes repo!

Thanks to the community 🫢, we're adding more and more recipes using Gemma πŸ’Ž

Fine tuning for all modalities, function calling, RAG...

Repo: https://github.com/huggingface/huggingface-gemma-recipes

We're also open to new ideas from the community πŸ€—!
  • 1 reply
Β·