AI & ML interests

None defined yet.

Recent Activity

burtenshawย  updated a dataset 6 minutes ago
agents-course/certificates
sergiopaniegoย  updated a dataset 27 minutes ago
agents-course/final-certificates
View all activity

sergiopaniegoย 
posted an update 2 days ago
sergiopaniegoย 
posted an update 3 days ago
view post
Post
256
New Zero-Shot Object Detectors in transformers! ๐Ÿฅฝ

Weโ€™ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others ๐Ÿ–ผ๏ธ

Play with it: ariG23498/zero-shot-od
sergiopaniegoย 
posted an update 4 days ago
sergiopaniegoย 
posted an update 7 days ago
view post
Post
376
Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

๐ŸŒ‹ GRPO
๐ŸŽž๏ธ GSPO
๐Ÿ™ MPO
โž• vLLM integration for online training w/ transformers backend\

๐Ÿก Blog: https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoย 
posted an update 9 days ago
sergiopaniegoย 
posted an update 10 days ago
view post
Post
3341
Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? ๐ŸŒ‹

๐Ÿง‘โ€๐Ÿณ We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe ๐Ÿ‘‰https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images ๐ŸŒ‹
sergiopaniegoย 
posted an update 10 days ago
view post
Post
4467
Just included example scripts for aligning models using GSPO (including VLM example) ๐Ÿ™†โ€โ™‚๏ธ๐Ÿ™†โ€โ™‚๏ธ

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘ฉโ€๐Ÿ’ป

๐Ÿง‘โ€๐ŸŽจ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
๐Ÿฆ„ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
๐Ÿงฉ More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
๐Ÿง™โ€โ™‚๏ธ GSPO paper: Group Sequence Policy Optimization (2507.18071)
sergiopaniegoย 
posted an update 15 days ago
view post
Post
318
Did you miss this? ๐Ÿ‘“

๐Ÿง™โ€โ™‚๏ธvLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!
sergiopaniegoย 
posted an update 16 days ago
view post
Post
2594
We just released TRL v0.20 with major multimodal upgrades!

๐Ÿ‘๏ธ VLM support for GRPO (highly requested by the community!)
๐ŸŽž๏ธ New GSPO trainer (from @Qwen , released last week, VLM-ready)
๐Ÿ™ New MPO trainer (multimodal by design, as in the paper)

๐Ÿ“ Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0
sergiopaniegoย 
posted an update 22 days ago
view post
Post
1181
Yet Another New Multimodal Fine-Tuning Recipe ๐Ÿฅง

๐Ÿง‘โ€๐Ÿณ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

๐Ÿ’ก This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! โžก๏ธ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo
  • 2 replies
ยท
sergiopaniegoย 
posted an update 27 days ago
view post
Post
1662
๐Ÿง‘โ€๐Ÿณ New Multimodal Fine-Tuning Recipe ๐Ÿง‘โ€๐Ÿณ

โšก๏ธ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

๐Ÿ” Object detection typically involves detecting categories in images (e.g., vase).

By combining it with visual grounding, we add contextual understanding so instead of detecting just "vase", we can detect "middle vase" in an image.

VLMs are super powerful!

In this case, I use PaliGemma 2 which already supports object detection and extend it to also add visual grounding.

๐Ÿค— Check it out here: https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding
sergiopaniegoย 
posted an update 28 days ago
view post
Post
1625
Multiple NEW notebooks and scripts added to the Hugging Face Gemma recipes repo!

Thanks to the community ๐Ÿซถ, we're adding more and more recipes using Gemma ๐Ÿ’Ž

Fine tuning for all modalities, function calling, RAG...

Repo: https://github.com/huggingface/huggingface-gemma-recipes

We're also open to new ideas from the community ๐Ÿค—!
  • 1 reply
ยท
burtenshawย 
posted an update about 1 month ago
view post
Post
973
Kimi-K2 is ready for general use! In these notebooks I walk you through use cases like function calling and structured outputs.

๐Ÿ”— burtenshaw/Kimi-K2-notebooks

You can swap it into any OpenAI compatible application via Inference Providers and get to work with an open source model.
  • 1 reply
ยท
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago