22 19 1

Yi Cui

onekq

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

posted an update 1 day ago

If you also tuned into Altman's second congress hearing (first in 2023) along with other AI executives, my takeaway is two words: New Deal (by FDR almost a century ago). The causal link is quite fascinating and worthy of a few blogposts or deep research queries, but I won't have more time for this (I really wish so), so here goes. * AI workload loves GPUs because they allocate more transistors than CPUs for computing, and pack them by high-bandwidth memory * More computing in the small physical space -> more power draw and more heat dissipation * more heat dissipation -> liquid cooling * new cooling and heavier power draw -> bigger racks (heavier and taller) * bigger racks -> (re)building data centers * new data centers with higher power demand (peak and stability) -> grid update and nuclear power

posted an update 4 days ago

The new Mistral medium model is very impressive for its size. Will it be open sourced given the history of Mistral? Does anyone have insights? https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard

updated a Space 4 days ago

onekq-ai/WebApp1K-models-leaderboard

View all activity

Organizations

onekq's activity

posted an update 1 day ago

Post

347

If you also tuned into Altman's second congress hearing (first in 2023) along with other AI executives, my takeaway is two words: New Deal (by FDR almost a century ago).

The causal link is quite fascinating and worthy of a few blogposts or deep research queries, but I won't have more time for this (I really wish so), so here goes.

* AI workload loves GPUs because they allocate more transistors than CPUs for computing, and pack them by high-bandwidth memory
* More computing in the small physical space -> more power draw and more heat dissipation
* more heat dissipation -> liquid cooling
* new cooling and heavier power draw -> bigger racks (heavier and taller)
* bigger racks -> (re)building data centers
* new data centers with higher power demand (peak and stability) -> grid update and nuclear power

posted an update 4 days ago

Post

2227

The new Mistral medium model is very impressive for its size. Will it be open sourced given the history of Mistral? Does anyone have insights?

onekq-ai/WebApp1K-models-leaderboard

posted an update 5 days ago

Post

3200

This time Gemini is very quick with API support on its 2.5 pro May release. The performance is impressive too, now it is among top contenders like o4, R1, and Claude.

onekq-ai/WebApp1K-models-leaderboard

replied to clem's post 7 days ago

Biggest pain point is still inference providers. Even decent labs like Ai2 or THUDM need to lobby for that. My leaderboard is for web developers but I can only evaluate the most visible models with token API support. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard

Maybe some players have GPUs but keep the results to themselves. We can only hope they will reciprocate for what they benefit from this community.

reacted to clem's post with ❤️ 7 days ago

Post

3960

What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?

6 replies

posted an update 10 days ago

Post

589

Okay, Grok 3 has API support too, and beats Gemini 2.5, but is behind QwQ 32b and DeepSeek v3

onekq-ai/WebApp1K-models-leaderboard

replied to their post 11 days ago

yes yes.

Maybe you can run a leaderboard of models indexed by freedom 🤗

posted an update 11 days ago

Post

1726

I didn't noticed that Gemini 2.5 (pro and flash) has been silently launched for API preview. Their performance is solid, but below QwQ 32B and the latest DeepSeek v3.

onekq-ai/WebApp1K-models-leaderboard

2 replies

replied to their post 12 days ago

I doubted there will be a Qwen3-coder. The direction changed. Alibaba is a corporation. You can imagine the number of executive sponsors for this release. Stock performance is at stake now. Price of success.

replied to their post 12 days ago

You meant the non-thinking mode? If so, add /no_think in your prompt

replied to their post 12 days ago

Noted. It thinks too long which is the problem. R1 and QwQ also took longer but are acceptable.

When I tested Qwen3, the difference of two modes is between an hour and a day (maybe longer)

replied to their post 12 days ago

posted an update 13 days ago

Post

1787

I tested Qwen3 235b and 32b and they are both worse than Qwen2.5 32b.
onekq-ai/WebApp1K-models-leaderboard

I used non-thinking mode because the thinking mode is too slow 🐢🐢🐢 to be usable in any way.

Sigh ...

12 replies

reacted to anakin87's post with 👍 13 days ago

Post

3329

𝗜 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗮 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 𝘁𝗼 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗲𝘃𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗚𝗥𝗣𝗢! 👑 🗓️

✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

I experimented with GRPO lately.

I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...

I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.

Choosing an original problem forced me to:
🤔 Think about the problem setting
🧬 Generate data
🤏 Choose the right base model
🏆 Design reward functions (and experiencing reward hacking)
🔄 Run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding 😄 experience.

I learned a lot of things, that I want to share with you. 👇
✍️ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
💻 Code: https://github.com/anakin87/qwen-scheduler-grpo
🤗 Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837

2 replies

replied to their post 13 days ago

Ah thanks! this works

posted an update 14 days ago

Post

491

The Qwen3 235B (MoE) is awfully slow 🐢🐢🐢.

I heard it is able to switch between reasoning and non-reasoning, but for my question, it always goes straight to the reasoning mode without an override switch. I tried Fireworks, DeepInfra, and OpenRouter, and they are all the same.

What is your experience with Qwen3?

2 replies

reacted to ZennyKenny's post with 👍 14 days ago

Post

2721

I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.

The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.

Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

posted an update 15 days ago

Post

1999

AxB stand for Approximately xB or Activating xB (for a Mixture-of-Expert model), this is really interesting naming 😅

Qwen/Qwen3-235B-A22B
Qwen/Qwen3-30B-A3B

1 reply

replied to CadenHolman's post 19 days ago

nice. what model is behind it.

reacted to CadenHolman's post with 👀 19 days ago

Post

1812

We’re excited to launch CodeDebugger.ai, a free, privacy-first tool that helps developers debug code instantly using AI.

What it does:

Paste your code (PHP, JavaScript, HTML, SQL, and more)

Get AI-generated bug reports and improvement suggestions

No sign-up, no tracking — each result link expires in 24 hours

Why we built it: Every developer hits walls. Whether you're stuck on a syntax bug or need another set of eyes, CodeDebugger.ai offers instant feedback powered by OpenAI models — all without compromising your privacy.

Privacy-first by design:

No login required

Code is deleted after 24 hours

No analytics, no tracking, no cookies

Try it now:
https://CodeDebugger.ai

2 replies