ZHOU

TOBI-X

AI & ML interests

None yet

Recent Activity

upvoted a collection 6 days ago

Evals

upvoted a paper 11 days ago

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

upvoted a collection 16 days ago

Multilingual-MATH

View all activity

Organizations

None yet

upvoted a collection 6 days ago

Evals

Collection

7 items • Updated Nov 20, 2025 • 1

upvoted a paper 11 days ago

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Paper • 2512.21094 • Published 12 days ago • 24

upvoted a collection 16 days ago

Multilingual-MATH

Collection

MATH datasets translated by Gemini-2.5-pro. • 3 items • Updated Nov 11, 2025 • 1

upvoted a paper about 1 month ago

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published Nov 17, 2025 • 136

upvoted 3 papers 3 months ago

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Paper • 2510.08696 • Published Oct 9, 2025 • 14

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 75

upvoted 2 papers 5 months ago

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6, 2025 • 52

upvoted a paper 9 months ago

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Paper • 2504.02605 • Published Apr 3, 2025 • 48

upvoted 3 papers 10 months ago

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20, 2025 • 77

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 144

reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs

Paper • 2503.11751 • Published Mar 14, 2025 • 17

upvoted a collection 10 months ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 178

upvoted a paper 11 months ago

BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models

Paper • 2502.07346 • Published Feb 11, 2025 • 53

upvoted a collection about 1 year ago

MoEs papers reading list

Collection

60 items • Updated Nov 4, 2024 • 145

ZHOU

AI & ML interests

Recent Activity

Organizations

TOBI-X's activity