12 4 8

Rakshit Aralimatti

RakshitAralimatti

AI & ML interests

Poor GPU Guy

Recent Activity

upvoted an article 3 days ago

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

published an article 3 days ago

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

posted an update 3 days ago

🤔 Ever wondered how OpenAI’s massive GPT‑OSS‑20B runs on just 16 GB of memory or how GPT‑OSS‑120B runs on a single H100 GPU? Seems impossible, right? The secret is Native MXFP4 Quantization it's a 4-bit floating-point format that’s making AI models faster, lighter, and more deployable than ever. 🧠 What’s MXFP4? MXFP4, or Microscaling FP4, is a specialized 4-bit floating‑point format (E2M1) standardized by the Open Compute Project under the MX (Microscaling) specification. It compresses groups of 32 values using a shared 8-bit scale (E8M0), dramatically lowering memory usage while preserving the dynamic range perfect for compact AI model deployment. 💡 Think of it like this: Instead of everyone ordering their own expensive meal (full-precision weights), a group shares a family meal (shared scaling). It’s cheaper, lighter, and still gets the job done. ✍️ I’ve broken all of this down in my first Medium blog: What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware Link - https://medium.com/@rakshitaralimatti2001/4-bit-alchemy-how-mxfp4-makes-massive-models-like-gpt-oss-feasible-for-everyone-573d6630b56c HF - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-me

View all activity

Organizations

Posts 2

Post

163

🤔 Ever wondered how OpenAI’s massive GPT‑OSS‑20B runs on just 16 GB of memory or how GPT‑OSS‑120B runs on a single H100 GPU?

Seems impossible, right?

The secret is Native MXFP4 Quantization it's a 4-bit floating-point format that’s making AI models faster, lighter, and more deployable than ever.

🧠 What’s MXFP4?

MXFP4, or Microscaling FP4, is a specialized 4-bit floating‑point format (E2M1) standardized by the Open Compute Project under the MX (Microscaling) specification. It compresses groups of 32 values using a shared 8-bit scale (E8M0), dramatically lowering memory usage while preserving the dynamic range perfect for compact AI model deployment.

💡 Think of it like this:

Instead of everyone ordering their own expensive meal (full-precision weights), a group shares a family meal (shared scaling). It’s cheaper, lighter, and still gets the job done.

✍️ I’ve broken all of this down in my first Medium blog:

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware
Link - https://medium.com/@rakshitaralimatti2001/4-bit-alchemy-how-mxfp4-makes-massive-models-like-gpt-oss-feasible-for-everyone-573d6630b56c

HF - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-me

View all Posts