25 71 255

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

upvoted a paper 1 day ago

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

liked a dataset 5 days ago

nvidia/Nemotron-Cascade-RL-SWE

liked a dataset 6 days ago

princeton-nlp/SWE-bench

View all activity

Organizations

upvoted a paper 1 day ago

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

Paper • 2512.17419 • Published 4 days ago • 8

liked a dataset 5 days ago

nvidia/Nemotron-Cascade-RL-SWE

Viewer • Updated 7 days ago • 110k • 342 • 20

liked a dataset 6 days ago

princeton-nlp/SWE-bench

Viewer • Updated Mar 3 • 21.5k • 16.9k • 131

upvoted an article 8 days ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

8 days ago

•

liked 2 datasets 15 days ago

TuringEnterprises/Turing-Open-Reasoning

Viewer • Updated 17 days ago • 50 • 20.1k • 177

Anthropic/AnthropicInterviewer

Viewer • Updated 15 days ago • 1.25k • 11.7k • 339

upvoted 2 papers 18 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 27 days ago • 131

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published 21 days ago • 46

upvoted 2 papers 21 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 21 days ago • 225

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 22 days ago • 93

liked a dataset 22 days ago

nvidia/ToolScale

Viewer • Updated 6 days ago • 4.06k • 3.29k • 162

liked a model 22 days ago

deepseek-ai/DeepSeek-V3.2

Text Generation • 685B • Updated 22 days ago • 90.9k • • 1.01k

upvoted a collection 28 days ago

Olmo 3 Post-training

Collection

All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated about 1 hour ago • 45

liked a dataset 28 days ago

allenai/Dolci-Think-RL-32B

Viewer • Updated Nov 20 • 102k • 961 • 16

upvoted a paper 28 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 29 days ago • 60

liked 4 datasets about 1 month ago

upvoted a paper about 2 months ago

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 59

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models