14 21 7

Yuzhe Gu

vanilla1116

https://guyuzhe.site/

Liqu1d-G

AI & ML interests

LLM; Reasoning; Hallucination; Self-Improvement

Recent Activity

commented on a paper 19 days ago

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

authored a paper 24 days ago

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

authored a paper 24 days ago

Intern-S1: A Scientific Multimodal Foundation Model

View all activity

Organizations

upvoted 3 papers 24 days ago

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Paper • 2512.10534 • Published 24 days ago • 31

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published 24 days ago • 45

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published 24 days ago • 33

upvoted a paper about 1 month ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Paper • 2512.05111 • Published Dec 4, 2025 • 47

upvoted a paper 3 months ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 109

upvoted a paper 5 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 259

upvoted 2 papers 6 months ago

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Paper • 2507.16814 • Published Jul 22, 2025 • 21

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17, 2025 • 48

upvoted 3 papers 9 months ago

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

Paper • 2504.13835 • Published Apr 18, 2025 • 38

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

Paper • 2503.24388 • Published Mar 31, 2025 • 29

upvoted 3 papers 10 months ago

upvoted a paper 11 months ago

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published Feb 10, 2025 • 58

upvoted a paper 12 months ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20, 2025 • 109

upvoted a collection over 1 year ago

InternLM2-Reward

Collection

InternLM2 Reward Models • 3 items • Updated 6 days ago • 4

upvoted 3 papers over 1 year ago

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29, 2024 • 43

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Paper • 2407.04693 • Published Jul 5, 2024 • 3

Yuzhe Gu

AI & ML interests

Recent Activity

Organizations

vanilla1116's activity