Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
akashkatholeΒ 
posted an update 27 days ago
Post
123
πŸš€ Just shipped reconcile_gst2b_env at OpenEnv Hackathon 2026 (Meta x Scaler India).

An RL environment for the monthly GST tax reconciliation that 14M Indian businesses do by hand. Trained Qwen3-4B SFT + GRPO with custom Tier 2c length-shaping reward modification. Headline: n=5 mean composite reward 0.305, +69% over prompted baseline.

5 documented failure modes including a novel research finding: the SAME composite reward design that defends against 6 red-team attacks ALSO makes a 3-step shortcut score higher than 50 steps of honest training. Empirically proven on-site (step-350 mean > step-375 mean).

Live demo + repo + writeup linked below.

πŸ”— huggingface.co/spaces/akashkathole/reconcile_gst2b_env
πŸŽ₯ youtube.com/watch?v=K-sZ8c1TMjw
πŸ“ BLOG.md in the Space

akashkathole/reconcile_gst2b_env
In this post