AI & ML interests

None defined yet.

Recent Activity

eliebakΒ 
posted an update 24 days ago
view post
Post
4589
Kimi K2 tech report is full of gems as always. Here are my notes on it:

> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.

With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.

> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.

The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
AmeeeeeΒ 
posted an update about 2 months ago
view post
Post
1638
πŸ€— Here’s a fun educational video I made to show how Sheets and AI can upgrade your structured content.

Better tables and clearer messages with just a little help from open-source AI!

aisheets/sheets
AmeeeeeΒ 
posted an update 2 months ago
view post
Post
1765
With Sheets, try a new way to create structured content with the help of AI!

No installs. No login. Just open a link and 🀩

This app lets you create a dataset by importing a file or starting from a prompt.

What’s different about SHEETS?
πŸ”Ž Web search integration to ground answers in real-world data
πŸ“š In-context learning from validated sources
πŸ”— Transparent sourcing β€” every result is linked
🧩 Runs on multiple open-source models

Fight hallucinations and start creating content you can rely on.

dvilasueroΒ 
posted an update 2 months ago
view post
Post
2789
Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.


In a nutshell:

πŸ“ Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
πŸ–ΌοΈ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
πŸ’― Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!
Β·
davanstrienΒ 
posted an update 2 months ago
view post
Post
3359
Inspired by Hugging Face's official MCP server, I've developed a complementary tool that exposes my semantic search API to enhance discovery across the HF platform.

Key capabilities:

- AI-powered semantic search for models and datasets
- Parameter count analysis via safetensors metadata
- Trending content discovery
- Find similar models/datasets functionality
- 11 tools total for enhanced ecosystem navigation

The semantic search goes beyond simple keyword matching, understanding context and relationships between different models and datasets.

Example query: "Find around 10 reasoning Hugging Face datasets published in 2025 focusing on topics other than maths and science. Show a link and a short summary for each dataset." (results in video!)

https://github.com/davanstrien/hub-semantic-search-mcp
  • 1 reply
Β·
loubnabnlΒ 
posted an update 3 months ago