Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework 🏗️. Language Models: orchestration, post-training, synthetic data...

Recent Activity

liked a model 5 days ago
anakin87/qwen-scheduler-7b-grpo
liked a Space 5 days ago
ariG23498/qwen-od
new activity 5 days ago
deepset/roberta-base-squad2:robert
View all activity

Organizations

deepset's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture Hugging Face Discord Community's profile picture

Posts 14

view post
Post
3328
𝗜 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗮 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 𝘁𝗼 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗲𝘃𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗚𝗥𝗣𝗢! 👑 🗓️

✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

I experimented with GRPO lately.

I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...

I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.

Choosing an original problem forced me to:
🤔 Think about the problem setting
🧬 Generate data
🤏 Choose the right base model
🏆 Design reward functions (and experiencing reward hacking)
🔄 Run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding 😄 experience.


I learned a lot of things, that I want to share with you. 👇
✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
💻 Code: https://github.com/anakin87/qwen-scheduler-grpo
🤗 Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837

Articles 3

Article
66

I trained a Language Model to schedule events with GRPO!