Building on HF

5 19 92

Tyler Williams PRO

unmodeled-tyler

https://quantaintellect.com

AI & ML interests

AI research engineer & solo operator of VANTA Research/Quanta Intellect

Recent Activity

updated a collection 5 days ago

My Current Open Source Daily Drivers

repliedto RakshitAralimatti's post 6 days ago

🔥 GLM-5.1 (zai-org/GLM-5.1) — Quietly One of the Best flagship model for agentic engineering and Coding tasks Right Now threw some LangGraph agent code at it, a messy RAG pipeline, some async Python stuff and it just handled it. no drama, no hallucinated methods, actually usable output on the first try. open source closing the gap this fast is genuinely exciting. go check zai-org/GLM-5.1 on HF if you haven't already Good work @zai-org-3

reacted to anakin87's post with ❤️ 6 days ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

View all activity

Organizations

updated a collection 5 days ago

My Current Open Source Daily Drivers

Collection

What I'm using the most right now. • 3 items • Updated 5 days ago

repliedto RakshitAralimatti's post 6 days ago

I've really been enjoying GLM-5.1 - It's what I've been using for the majority of my agent-based work these days. Absolutely zero complaints from me, and it got me off the $100/mo Claude Max plan so I call that a win lol

reactedto anakin87's post with ❤️ 6 days ago

Post

3272

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

reactedto chimbiwide's post with 🔥 6 days ago

Post

1514

We(@KeeganC and @chimbiwide ) have released our newest NPC roleplaying model: chimbiwide/Gemma4NPC-E4B , based on Gemma4-E4B with an improved training dataset.
We welcome any feedback and suggestions, please leave a comment!
We are releasing the E2B model soon, stay tuned!

1 reply

reactedto ArtelTaleb's post with ❤️ 9 days ago

Post

4775

🎵 MP3 Player - Drop your music, hit play. No install

MP3 Player - brings that energy back - straight in your browser.

- Drop your files - MP3, WAV, FLAC, AAC, OGG, AIFF, WMA — it reads them all
- Build your playlist - add tracks one by one or batch-load a whole folder
- Retro LCD display - scrolling track info, elapsed time, the full throwback
- Full controls - play, pause, skip, shuffle, repeat
- Mobile-first - big tactile buttons, works on phone like an iPod in your pocket

No install. No GPU needed on your end. Just upload and play.

👉 ArtelTaleb/mp3-player

reactedto allisonandreyev's post with 🔥 9 days ago

Post

5596

ConfCrawler 🕷️ — never miss a conference deadline again

Keeping track of submission deadlines across CV, NLP, robotics, and ML conferences is a mess. ConfCrawler aggregates them in one place so you can actually plan your research calendar.

What's in it:
- Deadlines for major conferences (CVPR, ICCV, NeurIPS, ICRA, ACL, etc.)
- Updated regularly
- Filterable by field / month

Built this out of personal frustration while juggling multiple submission cycles. Hope it saves someone else the tab-hoarding.
🔗 https://confcrawler.vercel.app/
feedback welcome — open to adding more conferences if yours isn't listed!

reactedto alibidaran's post with 🔥 11 days ago

Post

4046

With the release of Gemma 4, I launched a new Space called MEDPAI — a medical imaging analysis tool that combines object detection with multimodal AI.
Here's how it works:

Upload a CT scan or X-ray
Computer vision models detect and annotate findings
Gemma 4 33B generates a report or answers your questions about the image

Currently available detectors: dental analysis and bone fracture detection.
More models are in the pipeline — follow the Space to stay updated!
alibidaran/MEDPAI

3 replies

repliedto alibidaran's post 11 days ago

This is so cool! Awesome resource for people that might be interested in a bit more info about their results!

published a model 13 days ago

vanta-research/vessel-slingshot-alpha-4b

Updated 13 days ago

liked 2 models 13 days ago

prism-ml/Bonsai-8B-unpacked

8B • Updated 14 days ago • 1.18k • 8

prism-ml/Bonsai-8B-gguf

Text Generation • 8B • Updated 7 days ago • 78.8k • 584

reactedto danielhanchen's post with 👀 14 days ago

Post

2655

A new way to use Unsloth.

Coming soon...

reactedto qgallouedec's post with 🔥 14 days ago

Post

2253

TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl

Read more: hf.co/blog/trl-v1

reactedto sergiopaniego's post with 🔥 14 days ago

Post

1970

TRL is officially an adult 🥳

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1

2 replies

reactedto alibidaran's post with 🔥 14 days ago

Post

2050

🧠 Introducing Qwen2.5 — Cognitive Reasoning Mode

I fine-tuned Qwen2.5 with GRPO to actually think before it answers — not just pattern-match.

Most LLMs mimic reasoning. This one builds a real cognitive path:

📌 Plan → understand the task
🔍 Monitor → reason step by step
✅ Evaluate → verify before answering

Every response follows a strict structured protocol:
<think> <planning> ... <monitoring> ... <evaluation> ... </think>
Then a clean, reasoning-free <output>.

The model self-checks its own structure. If a section is missing or malformed → the response is invalid.

This isn't chain-of-thought slapped on top. The reasoning protocol is baked in via RL.

🔗 Full README + inference code below 👇
alibidaran/Qwen_COG_Thinker_Merged

#AI #LLM #Qwen #ReasoningModels #GRPO #OpenSource

2 replies

reactedto their post with 🚀 14 days ago

Post

2283

RESULTS ARE IN!

- Videos of each evaluation: https://www.youtube.com/playlist?list=PLkDBfeR-zsShiZ2HpcscFDH-36uDwsl5W
- Link to repo: https://github.com/unmodeled-tyler/vessel-browser
-

quanta-intellect

Finally just wrapped up a comparative analysis of my new open source AI browser, Vessel, against Claude Chrome from Anthropic.

The test evaluates both web navigation harnesses for speed and efficiency on a simple real-world e-commerce task. Opus 4.6 was used for each of the 3 evaluations, and the results show that Opus 4.6 was AT LEAST 2X FASTER when using Vessel Browser for web navigation in place of Claude Chrome.

Results (in order, fastest to slowest)

1. Claude Code + Vessel Browser: 3 minutes and 10s

2. Hermes Agent + Vessel Browser: 4 minutes and 13s

3. Claude Code + Claude Chrome: 7 minutes and 57s

Vessel Browser is open source, designed explicitly for agents from the ground-up (it is not a fork of a human browser with AI features bolted on), and supports a local MCP server for agent control, or BYOK custom OAI endpoints. Check it out for yourself!

posted an update 14 days ago

Post

2283

RESULTS ARE IN!

- Videos of each evaluation: https://www.youtube.com/playlist?list=PLkDBfeR-zsShiZ2HpcscFDH-36uDwsl5W
- Link to repo: https://github.com/unmodeled-tyler/vessel-browser
-

reactedto Shrijanagain's post with 🔥 16 days ago

Post

6822

SOME NEW HINDI + ENGLISH DATASETS

🔗
- sKT-Ai-Labs/HIN
- sKT-Ai-Labs/SKT-MIX
- sKT-Ai-Labs/ST-H

Download and Use And Train Models

You Can Alsoo Use ST-x-LIGHTING Module For Faster Training

pip install ST-x-LIGHT-V11

2 replies

published a Space 17 days ago

Vessel Browser

🚀

updated a collection 17 days ago

My Current Open Source Daily Drivers

Collection

What I'm using the most right now. • 3 items • Updated 5 days ago

Tyler Williams PRO

AI & ML interests

Recent Activity

Organizations

unmodeled-tyler's activity

Vessel Browser