Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

jsulz 
posted an update 1 day ago
view post
Post
3238
It's been a bit since I took a step back and looked at xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB

Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

These are hard numbers to put into context, but let's try:

The latest run of the Common Crawl from commoncrawl was 471 TB.

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.

We're moving to a new phase in the process, so stay tuned.

This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.

I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)

Let me know if there's anything you're interested in; happy to dig in!
·
AdinaY 
posted an update about 22 hours ago
view post
Post
1061
Hunyuan-A13B 🔥 New MoE LLM by TencentHunyuan

tencent/Hunyuan-A13B-Instruct

✨80B total / 13B active params
✨256K context window
✨Dual-mode reasoning: fast & slow thinking
✨Efficient inference (GQA + quantization)
Abhaykoul 
posted an update 3 days ago
view post
Post
3858
Introducing Dhanishtha 2.0: World's first Intermediate Thinking Model

Dhanishtha 2.0 is the world's first LLM designed to think between the responses. Unlike other Reasoning LLMs, which think just once.

Dhanishtha can think, rethink, self-evaluate, and refine in between responses using multiple <think> blocks.
This technique makes it Hinghlt Token efficient it Uses up to 79% fewer tokens than DeepSeek R1
---

You can try our model from: https://helpingai.co/chat
Also, we're gonna Open-Source Dhanistha on July 1st.

---
For Devs:
🔑 Get your API key at https://helpingai.co/dashboard
from HelpingAI import HAI  # pip install HelpingAI==1.1.1
from rich import print

hai = HAI(api_key="hl-***********************")

response = hai.chat.completions.create(
    model="Dhanishtha-2.0-preview",
    messages=[{"role": "user", "content": "What is the value of ∫0∞𝑥3/𝑥−1𝑑𝑥 ?"}],
    stream=True,
    hide_think=False # Hide or show models thinking
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)
  • 2 replies
·
fdaudens 
posted an update about 14 hours ago
view post
Post
521
Three big AI copyright updates this week alone. Tracking it all is getting almost impossible!

That’s why @BrigitteTousi and I built this interactive tracker to keep you up to date fdaudens/ai-copyright-lawsuits

(Prototyped in minutes with DeepSite!)
FlameF0X 
posted an update 1 day ago
view post
Post
1352
SnowflakeCore-G1 development update: We're building a 24-layer transformer with 32K context and 1024 embedding dimensions - pretty ambitious! Even running at batch_size=1 with heavy gradient accumulation, we're hitting memory walls at 300GB RAM. Scaling up to ~1TB will take some time, but the architecture is looking promising. Thanks for following along with the journey! 😅
  • 1 reply
·
samihalawa 
posted an update 3 days ago
view post
Post
2359
🔥BEST DEBUG PROMPT FOR CLAUDE_CODE
😲FIXES ANY REPO

How can I prompt you so there will not be any More bug or thing? So list the 20 kind of errors and bugs more common in a codebase like this one and create a comprehensive table list of each of them and all the main files and fucnionality and fix until cross all out. Don't add complexity. Ignore security. Only fatal problems
Proceed in the most exhaustive and comprehensive way possible. Don't miss a line of code. SYSTEMATICALLY FIX EVERYTHING

yeonseok-zeticai 
posted an update 3 days ago
view post
Post
3004
🚀 Real-Time On-Device AI Agent with Polaris-4B — Run It Yourself, No Cloud, No Cost

We just deployed a real-time on-device AI agent using the Polaris-4B-Preview model — one of the top-performing <6B open LLMs on Hugging Face.

📱 What’s remarkable?
This model runs entirely on a mobile device, without cloud, and without any manual optimization. It was built using ZETIC.MLange, and the best part?

➡️ It’s totally automated, free to use, and anyone can do it.
You don’t need to write deployment code, tweak backends, or touch device-specific SDKs. Just upload your model — and ZETIC.MLange handles the rest.

🧠 About the Model
- Model: Polaris-4B-Preview
- Size: ~4B parameters
- Ranking: Top 3 on Hugging Face LLM Leaderboard (<6B)
- Tokenizer: Token-incremental inference supported
- Modifications: None — stock weights, just optimized for mobile

⚙️ What ZETIC.MLange Does
ZETIC.MLange is a fully automated deployment framework for On-Device AI, built for AI engineers who want to focus on models — not infrastructure.

Here’s what it does in minutes:
- 📊 Analyzes model structure
- ⚙️ Converts to mobile-optimized format (e.g., GGUF, ONNX)
- 📦 Generates a runnable runtime environment with pre/post-processing
- 📱 Targets real mobile hardware (CPU, GPU, NPU — including Qualcomm, MediaTek, Apple)
- 🎯 Gives you a downloadable SDK or mobile app component — ready to run
And yes — this is available now, for free, at https://mlange.zetic.ai

🧪 For AI Engineers Like You, If you want to:
- Test LLMs directly on-device
- Run models offline with no latency
- Avoid cloud GPU costs
- Deploy to mobile without writing app-side inference code

Then this is your moment. You can do exactly what we did, using your own models — all in a few clicks.

🎯 Start here → https://mlange.zetic.ai

📬 Want to try Polaris-4B on your own app? [email protected], or just visit https://mlange.zetic.ai , it is opened as free!

Great work @Chancy , @Zhihui , @tobiaslee !
a-r-r-o-w 
posted an update about 21 hours ago
view post
Post
944
As you might have already heard, FLUX.1-Kontext-dev is now released and taken the generative community by storm!

In case you haven't come across it, you can get started with Kontext using 🤗 diffusers. See the official [model]( black-forest-labs/FLUX.1-Kontext-dev) and [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#flux).

Want to know how inference companies like Fal & Replicate are able to run the model so fast and in under 2 seconds per image? Check out this [gist](https://gist.github.com/a-r-r-o-w/d08c37e8bd3e9c26b4ce80360be148c6) for some details!
  • 1 reply
·
Jaward 
posted an update 1 day ago
fdaudens 
posted an update 1 day ago
view post
Post
1017
This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.

🧠 Text, image, audio, and video - handled locally.
⚡️Only needs 2B in GPU memory to run
🤯 First sub-10B model to hit 1300+ Elo
✅ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.

Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.

google/gemma-3n-685065323f5984ef315c93f4
  • 1 reply
·