I've really been enjoying GLM-5.1 - It's what I've been using for the majority of my agent-based work these days. Absolutely zero complaints from me, and it got me off the $100/mo Claude Max plan so I call that a win lol
Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.
Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.
But what actually are these environments in practiceβ And how do you build them effectivelyβ
Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course.
What you'll learn
πΉ Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain πΉ How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts πΉ Common patterns: How to build single-turn, multi-turn, and tool-use environments
πΉ Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master πΈ Build the game Environment πΈ Use it to generate synthetic data for SFT warm-up πΈ Group-based Reinforcement Learning
If you're interested in building "little worlds" where LLMs can learn, this course is for you.
We(@KeeganC and @chimbiwide) have released our newest NPC roleplaying model: chimbiwide/Gemma4NPC-E4B , based on Gemma4-E4B with an improved training dataset. We welcome any feedback and suggestions, please leave a comment! We are releasing the E2B model soon, stay tuned!
π΅ MP3 Player - Drop your music, hit play. No install
MP3 Player - brings that energy back - straight in your browser.
- Drop your files - MP3, WAV, FLAC, AAC, OGG, AIFF, WMA β it reads them all - Build your playlist - add tracks one by one or batch-load a whole folder - Retro LCD display - scrolling track info, elapsed time, the full throwback - Full controls - play, pause, skip, shuffle, repeat - Mobile-first - big tactile buttons, works on phone like an iPod in your pocket
No install. No GPU needed on your end. Just upload and play.
ConfCrawler π·οΈ β never miss a conference deadline again
Keeping track of submission deadlines across CV, NLP, robotics, and ML conferences is a mess. ConfCrawler aggregates them in one place so you can actually plan your research calendar.
What's in it: - Deadlines for major conferences (CVPR, ICCV, NeurIPS, ICRA, ACL, etc.) - Updated regularly - Filterable by field / month
Built this out of personal frustration while juggling multiple submission cycles. Hope it saves someone else the tab-hoarding. π https://confcrawler.vercel.app/ feedback welcome β open to adding more conferences if yours isn't listed!
With the release of Gemma 4, I launched a new Space called MEDPAI β a medical imaging analysis tool that combines object detection with multimodal AI. Here's how it works:
Upload a CT scan or X-ray Computer vision models detect and annotate findings Gemma 4 33B generates a report or answers your questions about the image
Currently available detectors: dental analysis and bone fracture detection. More models are in the pipeline β follow the Space to stay updated! alibidaran/MEDPAI
Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.
The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.
What's in v1.0: Deep Hugging Face integration, low infrastructure burden What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.
I fine-tuned Qwen2.5 with GRPO to actually think before it answers β not just pattern-match.
Most LLMs mimic reasoning. This one builds a real cognitive path:
π Plan β understand the task π Monitor β reason step by step β Evaluate β verify before answering
Every response follows a strict structured protocol: <think> <planning> ... <monitoring> ... <evaluation> ... </think> Then a clean, reasoning-free <output>.
The model self-checks its own structure. If a section is missing or malformed β the response is invalid.
This isn't chain-of-thought slapped on top. The reasoning protocol is baked in via RL.
Finally just wrapped up a comparative analysis of my new open source AI browser, Vessel, against Claude Chrome from Anthropic.
The test evaluates both web navigation harnesses for speed and efficiency on a simple real-world e-commerce task. Opus 4.6 was used for each of the 3 evaluations, and the results show that Opus 4.6 was AT LEAST 2X FASTER when using Vessel Browser for web navigation in place of Claude Chrome.
Results (in order, fastest to slowest)
1. Claude Code + Vessel Browser: 3 minutes and 10s
2. Hermes Agent + Vessel Browser: 4 minutes and 13s
3. Claude Code + Claude Chrome: 7 minutes and 57s
Vessel Browser is open source, designed explicitly for agents from the ground-up (it is not a fork of a human browser with AI features bolted on), and supports a local MCP server for agent control, or BYOK custom OAI endpoints. Check it out for yourself!
Finally just wrapped up a comparative analysis of my new open source AI browser, Vessel, against Claude Chrome from Anthropic.
The test evaluates both web navigation harnesses for speed and efficiency on a simple real-world e-commerce task. Opus 4.6 was used for each of the 3 evaluations, and the results show that Opus 4.6 was AT LEAST 2X FASTER when using Vessel Browser for web navigation in place of Claude Chrome.
Results (in order, fastest to slowest)
1. Claude Code + Vessel Browser: 3 minutes and 10s
2. Hermes Agent + Vessel Browser: 4 minutes and 13s
3. Claude Code + Claude Chrome: 7 minutes and 57s
Vessel Browser is open source, designed explicitly for agents from the ground-up (it is not a fork of a human browser with AI features bolted on), and supports a local MCP server for agent control, or BYOK custom OAI endpoints. Check it out for yourself!