25 71 256

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

liked a dataset about 11 hours ago

nebius/SWE-agent-trajectories

upvoted a paper 2 days ago

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

liked a dataset 7 days ago

nvidia/Nemotron-Cascade-RL-SWE

View all activity

Organizations

upvoted a paper 2 days ago

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

Paper • 2512.17419 • Published 6 days ago • 9

upvoted an article 9 days ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

9 days ago

•

upvoted 2 papers 19 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 28 days ago • 135

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published 22 days ago • 46

upvoted 2 papers 22 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 23 days ago • 229

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 24 days ago • 93

upvoted a collection 30 days ago

Olmo 3 Post-training

Collection

All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated 1 day ago • 46

upvoted a paper 30 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published about 1 month ago • 60

upvoted a paper about 2 months ago

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 59

upvoted a paper 2 months ago

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14 • 26

upvoted 4 papers 3 months ago

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23 • 68

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16 • 51

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 190

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180

upvoted 6 papers 4 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15 • 8

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 39

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models