@akashkathole on Hugging Face: "🚀 Just shipped reconcile_gst2b_env at OpenEnv Hackathon 2026 (Meta x Scaler…"

Post

123

🚀 Just shipped reconcile_gst2b_env at OpenEnv Hackathon 2026 (Meta x Scaler India).

An RL environment for the monthly GST tax reconciliation that 14M Indian businesses do by hand. Trained Qwen3-4B SFT + GRPO with custom Tier 2c length-shaping reward modification. Headline: n=5 mean composite reward 0.305, +69% over prompted baseline.

5 documented failure modes including a novel research finding: the SAME composite reward design that defends against 6 red-team attacks ALSO makes a 3-step shortcut score higher than 50 steps of honest training. Empirically proven on-site (step-350 mean > step-375 mean).

Live demo + repo + writeup linked below.

🔗 huggingface.co/spaces/akashkathole/reconcile_gst2b_env
🎥 youtube.com/watch?v=K-sZ8c1TMjw
📝 BLOG.md in the Space

akashkathole/reconcile_gst2b_env

Join the conversation